<img src="https://pbs.twimg.com/media/E0m6c9FX0AIX0p1.png" style="height:150px" align="left"/> <br><br>

<br/>
<br/>
<br/>
<br/>
<br/>

### **Exploring GEOROC 2.0:** <br/> Data Access, Formatting, and Visualization for Geochemical Analysis

***

The GEOROC Database (Geochemistry of Rocks of the Oceans and Continents) is a comprehensive collection of published analyses <br/> of igneous and metamorphic rocks and minerals. It contains major and trace element concentrations, radiogenic and nonradiogenic <br/> isotope ratios as well as analytical ages for whole rocks, glasses, minerals and inclusions. Metadata include geospatial and other <br/> sample information, analytical details and references.

The GEOROC Database was established at the Max Planck Institute for Chemistry in Mainz (Germany). In 2021, the database was <br/> moved to Göttingen University, where it continues to be curated as part of the DIGIS project of the Department of Geochemistry and <br/> Isotope Geology at the Geoscience Centre (GZG) and the University and State Library (SUB). Development for GEOROC 2.0 <br/> includes a new data model for greater interoperability, options to contribute data, and improved access to the database.

As part of the DIGIS project, a new API interface has been created for the GEOROC database, allowing easy access to its contents <br/>
with simple programming skills. Users can now query the database and retrieve data using the new API, making it more accessible <br/>
and useful for researchers and other interested parties. This notebook demonstrates the basic capabilities of GEOROC data access <br/> via the new DIGIS API. 

For feedback, questions and further information contact [Digis-Info](mailto:digis-info@uni-goettingen.de) directly.

This notebook demonstrates the basic capabilities of common data cleaning methods in geoscience.

***

Juni 2023, Dr. Marthe Klöcking, Timm M. Wirtz

<img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" style="height:50px" align="left"/> <br><br>

https://creativecommons.org/licenses/by-nc-sa/4.0/

***

### <a name="table"></a>Table of contents

* [Import modules](#import)
* [Store your API key](#store)
        * [Hint](#hint)
* [Select the data that should be in the dataset](#select)
        * [Hint](#hint2)
* [Check if there is a connection to the API](#check)
* [Select the filters to search for desired data](#filters)
* [Extract the necessary SampleID's](#extract1)
* [Extract all the data using the SampleID's](#extract2)
* [Structure and pivot the dataset](#strucdata)
* [Plot the data on a map](#plotmap)
* [Plot FeO* against S](#plotfeotS)
* [Plot the data in the TAS diagram](#plottas)
* [References](#refs)
* [QR code for accessing this Jupyter notebook](#qrcode)
        * [Hint](#hint3)

***

#### <a name="import"></a>Import modules

**requests**: This module allows you to send HTTP requests and handle responses in Python. It is commonly used for making API <br/> calls and retrieving data from web servers.

**json**: The JSON module provides functions to work with JSON data. It allows you to serialize Python objects into JSON strings <br/>  and deserialize JSON strings into Python objects.

**pandas**: The pandas library is a powerful tool for data manipulation and analysis. It provides data structures and functions <br/>  for efficiently working with structured data, such as tables or spreadsheets.

**geopandas**: Geopandas is an extension of the pandas library that adds support for geospatial data analysis. It provides data <br/>  structures and functions for working with geographic data, such as maps, spatial joins, and geometric operations.

**seaborn**: Seaborn is a data visualization library based on matplotlib. It provides a high-level interface for creating attractive <br/>  and informative statistical graphics.

**matplotlib.pyplot**: Matplotlib is a comprehensive plotting library in Python. It provides a wide range of functionalities for <br/>  creating static, animated, and interactive visualizations.

**matplotlib.patches.Ellipse**: The Ellipse module in matplotlib.patches allows you to draw ellipses on plots. It is useful for <br/>  representing uncertainties or confidence intervals in data visualizations.

**contextily**: Contextily is a Python library that adds web map tiles as a background to matplotlib plots. It makes it easy to <br/>  include basemaps from popular mapping providers, such as OpenStreetMap or Stamen, in your visualizations.

**adjust_text**: The adjust_text module provides functions for automatically adjusting the position of text labels in a plot to <br/>  prevent overlaps. It is particularly useful when you have many data points and want to ensure that the labels are readable.

**widgets**: The widgets module in ipywidgets provides interactive user interface elements, such as sliders, buttons, and <br/> dropdown menus,  for Jupyter notebooks. It allows you to create interactive data exploration and visualization tools.

**Layout**: The Layout module in ipywidgets allows you to customize the layout and styling of the widgets. It provides options for <br/>  controlling the size, alignment, and spacing of the widgets in the user interface.

**IntProgress, HBox, Label**: These are specific widget classes in ipywidgets. IntProgress is a widget for displaying a progress bar, <br/>  HBox is a container widget that arranges its children horizontally, and Label is a widget for displaying text or a description.

**Button**: The Button widget in ipywidgets allows you to create clickable buttons in the user interface. It is commonly used for <br/>  triggering actions or functions when clicked.

In [None]:
import requests
import json
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import contextily as ctx
from adjustText import adjust_text
from ipywidgets import widgets
from ipywidgets import Layout
from ipywidgets import IntProgress, Layout, HBox, Label
from ipywidgets import Button
from functions.widgets import *
from functions.georoc_api import GeoRocAPI
from functions.app import MyApp
from functions.utils import *

***

#### <a name="store"></a> Store your API key

In [None]:
my_app = MyApp()

##### <a name="hint"></a>Hint:
- It is necessary to deposit the API and press the button so that it can be transferred.

***

#### <a name="select"></a> Select the data that should be in the dataset

In [None]:
display(grid)

##### <a name="hint2"></a>Hint:
- It is necessary to specify the appropriate selection. Please use here:  **SampleName**, **Longitute**, **Latitude**, **Item_Name**, **Item_Groupe**, **Units** and **Values**.

***

#### <a name="check"></a>Check if there is a connection to the API

In [None]:
# Define a function to check the API connection
def check_api_connection():
    # Set the API endpoint to "ping"
    endpoint = "ping"
    
    # Make an API query to the specified endpoint
    response = my_app.api.api_query(endpoint)
    
    # Check if the response is not None
    if response is not None:
        print("Connection to API server successful!\n")
    else:
        print("Failed to connect to API server!\n")
        exit(1)

# Call the check_api_connection function
check_api_connection()

#### <a name="filters"></a> Select the filters to search for desired data

In [None]:
display(grid2)

[Back to Table of contects](#table)

***

#### <a name="extract1"></a>Extract the necessary SampleID's

In [None]:
# Call the get_filtered_samples method of the my_app.api object with the specified parameters
filtered_samples_combined = my_app.api.get_filtered_samples(
    limit=limit_widget.value,
    offset=offset_widget.value,
    location1=location1_widget.value,
    location2=location2_widget.value,
    location3=location3_widget.value,
    setting=setting_widget.value,
    latitude=latitude_widget.value,
    longitude=longitude_widget.value,
    rocktype=rocktype_widget.value,
    rockclass=rockclass_widget.value,
    mineral=mineral_widget.value,
    material=material_widget.value,
    inclusiontype=inclusiontype_widget.value,
    sampletech=sampletech_widget.value,
    element=element_widget.value,
    elementtype=elementtype_widget.value,
    value=value_widget.value,
    title=title_widget.value,
    publicationyear=publicationyear_widget.value,
    doi=doi_widget.value,
    firstname=firstname_widget.value,
    lastname=lastname_widget.value,
    agemin=agemin_widget.value,
    agemax=agemax_widget.value,
    geoage=geoage_widget.value,
    geoageprefix=geoageprefix_widget.value,
    lab=lab_widget.value
)

# Check if the key "Data" exists in the filtered_samples_combined dictionary
if 'Data' in filtered_samples_combined:
    print("\n")
    # Print each item in the list of dictionaries under the key "Data"
    for item in filtered_samples_combined['Data']:
        print(str(item))
    print("\n")
else:
    print("No data found or unexpected data structure", "\n")

In [None]:
# Check if the key "Data" exists in the dictionary "filtered_samples_combined" and if its value is not empty
if "Data" in filtered_samples_combined and filtered_samples_combined["Data"]:
    # Extract the "SampleID" values from the list of dictionaries in the "Data" key
    sampling_feature_ids = [sample["SampleID"] for sample in filtered_samples_combined["Data"]]
    
    # Print the extracted SampleIDs
    print("\n", "The extracted SampleIDs are:")
    
    # Print the SampleIDs in groups of 10 with a line break after each group
    for i in range(0, len(sampling_feature_ids), 10):
        print(sampling_feature_ids[i:i+10])

    print("\n")
    
else:
    print("No data found or unexpected data structure", "\n")

[Back to Table of contects](#table)

***

#### <a name="extract2"></a>Extract all the data using the SampleID's

In [None]:
# Create an empty DataFrame to store all measurement data
measurement_data = pd.DataFrame()

In [None]:
# Create an integer progress bar widget
progress_bar = widgets.IntProgress(
    value=0,
    min=0,
    max=len(sampling_feature_ids),
    description='Loading:',
    bar_style='info',
    style={'bar_color': 'maroon'},
    orientation='horizontal'
)

# Display the progress bar
display(progress_bar)

# Iterate over the list of SamplingFeatureIDs
for index, sampling_feature_id in enumerate(sampling_feature_ids):
    # Get the selected keys from the checkboxes
    selected_keys = get_selected_keys(checkboxes)

    # Get the measurement data for the current SamplingFeatureID using the selected keys
    df = get_measurement_data(my_app.api.api_key, sampling_feature_id, selected_keys)

    # Check if the dataframe is not empty and not None
    if df is not None and not df.empty:
        # Add the SampleID to the dataframe
        df['SampleID'] = sampling_feature_id

        # Append the dataframe to the measurement_data DataFrame
        measurement_data = measurement_data._append(df, ignore_index=True)

    # Update the progress bar value
    progress_bar.value = index + 1

[Back to Table of contects](#table)

***

#### <a name="strucdata"></a>Structure and pivot the dataset

In [None]:
# Group and aggregate the measurement data
grouped_data = measurement_data.groupby(['SampleID', 'Item_Name'])['Values'].apply(list).reset_index()

# Pivot the DataFrame
pivot_df = grouped_data.pivot(index='SampleID', columns='Item_Name', values='Values')

# Now we convert the lists into individual elements (since there is only one entry per group)
for col in pivot_df.columns:
    pivot_df[col] = pivot_df[col].str[0]

# Extract 'Longitude', 'Latitude', 'Units', 'Item_Group', and 'SampleName' for each 'SampleID'
additional_columns_df = measurement_data.drop_duplicates(subset='SampleID')[['SampleID', 'Longitude', 'Latitude', 'Units', 'Item_Group', 'SampleName']]

# Merge the data with the original data to add the additional columns
final_df = pd.merge(pivot_df.reset_index(), additional_columns_df, on='SampleID', how='left')

# Reset the index
final_df.reset_index(drop=True, inplace=True)

In [None]:
# Set the maximum number of rows to display in pandas DataFrame
pd.set_option('display.max_rows', None)
# Set the maximum number of columns to display in pandas DataFrame
pd.set_option('display.max_columns', None)

# Display the final_df DataFrame
final_df

[Back to Table of contects](#table)

***

#### <a name="plotmap"></a>Plot the data on a map

In [None]:
def plot_coordinates_on_map(final_df):
    # Create a geopandas dataframe
    gdf = gpd.GeoDataFrame(
        final_df,
        geometry=gpd.points_from_xy(final_df.Longitude, final_df.Latitude))
    gdf.crs = 'EPSG:4326'  # Set the initial CRS (Coordinate Reference System) to WGS84

    # Reproject the data to match the CRS used by contextily
    gdf_web_mercator = gdf.to_crs(epsg=3857)

    # Create the base map
    fig, ax = plt.subplots(figsize=(10, 10))

    # Plot markers on the map with sample IDs
    for _, row in gdf_web_mercator.iterrows():
        ax.scatter(
            row['geometry'].x,
            row['geometry'].y,
            color='red',
            edgecolors='black',
            marker='^',
            s=25
        ) 
        
    # Set the map extent to match the data
    ax.set_xlim(gdf_web_mercator.total_bounds[0] - 60000, gdf_web_mercator.total_bounds[2] + 60000)
    ax.set_ylim(gdf_web_mercator.total_bounds[1] - 70000, gdf_web_mercator.total_bounds[3] + 12000)

    # Add satellite imagery from OpenStreetMap
    ctx.add_basemap(ax, source=ctx.providers.Esri.OceanBasemap, zoom='auto')

    # Add title and axis labels
    ax.set_title('Map of the island of Hawaii', fontsize=16)
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')

    # Plot sample IDs with adjustText
    texts = []
    for _, row in gdf_web_mercator.iterrows():
        texts.append(
            plt.text(
                row['geometry'].x,
                row['geometry'].y,
                str(row['SampleID']),
                fontsize=8,
                color='black',
                ha='center')
        )

    # Adjust the text positions
    adjust_text(texts)

    description = "Dredge haul locations are shown by the dot symbols for each vulcano."
    plt.figtext(0.5, 0.02, description, ha='center', fontsize=10)

    # Save and show the map
    plt.savefig("plot_hawaii.png")
    plt.show()

# Plot coordinates on the map using the measurement_data DataFrame
plot_coordinates_on_map(final_df)

[Back to Table of contects](#table)

***

#### <a name="plotfeotS"></a> Plot FeO* against S

In [None]:
# List of columns to exclude
exclude_columns = ['Longitude', 'Latitude', 'Units', 'Item_Group', 'H2O', 'CH4', 'CL', 'CO1', 'CO2', 'D18O', 'DD']
# List of columns to include
included_columns = [col for col in final_df.columns if col not in exclude_columns]

# Lists of markers and colors for scatter plot
markers = ['o'] * 3 + ['o'] * 10 + ['^'] * 23 + ['s'] * 3 + ['s'] * 3
colors = ['b'] * 3 + ['r'] * 10 + ['g'] * 23 + ['y'] * 3 + ['c'] * 3

# Sample IDs for ellipses
sampleids_ellipse1 = [511]
sampleids_ellipse2 = [482]
sampleids_ellipse3 = [476]

# Initialize variables for the ellipses outside of the loop
ellipse_center_1 = None
ellipse_width_1 = 0
ellipse_height_1 = 0
ellipse_angle_1 = 0

ellipse_center_2 = None
ellipse_width_2 = 0
ellipse_height_2 = 0
ellipse_angle_2 = 0

ellipse_center_3 = None
ellipse_width_3 = 0
ellipse_height_3 = 0
ellipse_angle_3 = 0

# Set the figure size
plt.figure(figsize=(10, 8))

# Loop through the rows of the final_df DataFrame
for i in range(len(final_df)):
    # Plot the scatterplot
    sns.scatterplot(x=final_df['FEOT'].iloc[i:i+1], y=final_df['S'].iloc[i:i+1],
                    marker=markers[i], color=colors[i])

    # Get the sample ID for the current row
    sampleid = final_df['SampleID'].iloc[i]

    # Add ellipses for specific sample IDs
    if sampleid in sampleids_ellipse1:
        if ellipse_center_1 is None:
            # Set the center, width, height, and angle for ellipse 1
            ellipse_center_1 = (final_df['FEOT'].iloc[i], final_df['S'].iloc[i])
            ellipse_width_1 = 2.5
            ellipse_height_1 = 0.05
            ellipse_angle_1 = 0

        # Create and add ellipse 1 to the plot
        ellipse_1 = Ellipse(ellipse_center_1, 
                            ellipse_width_1, 
                            ellipse_height_1, 
                            angle=ellipse_angle_1, 
                            edgecolor='black', 
                            facecolor='none', 
                            linestyle='dashed'
                            )
        plt.gca().add_patch(ellipse_1)

    if sampleid in sampleids_ellipse2:
        if ellipse_center_2 is None:
            # Set the center, width, height, and angle for ellipse 2
            ellipse_center_2 = (final_df['FEOT'].iloc[i], final_df['S'].iloc[i])
            ellipse_width_2 = 1.5
            ellipse_height_2 = 0.1
            ellipse_angle_2 = 0.1

        # Create and add ellipse 2 to the plot
        ellipse_2 = Ellipse(ellipse_center_2, 
                            ellipse_width_2, 
                            ellipse_height_2, 
                            angle=ellipse_angle_2, 
                            edgecolor='black', 
                            facecolor='none', 
                            linestyle='dotted'
                            )
        plt.gca().add_patch(ellipse_2)

    if sampleid in sampleids_ellipse3:
        if ellipse_center_3 is None:
            # Set the center, width, height, and angle for ellipse 3
            ellipse_center_3 = (final_df['FEOT'].iloc[i], final_df['S'].iloc[i])
            ellipse_width_3 = 1.0
            ellipse_height_3 = 0.13
            ellipse_angle_3 = 0.8

        # Create and add ellipse 3 to the plot
        ellipse_3 = Ellipse(ellipse_center_3, 
                            ellipse_width_3, 
                            ellipse_height_3, 
                            angle=ellipse_angle_3, 
                            edgecolor='black', 
                            facecolor='none', 
                            linestyle='dashdot'
                            )
        plt.gca().add_patch(ellipse_3)

# Add legend, title, and axis labels to the plot
plt.legend(handles=[ellipse_1, ellipse_2, ellipse_3], labels=['MAUNA LOA', 'KILAUEA', 'LOIHI'], loc='upper left')
plt.title("FeO* vs. S")
plt.xlabel("FeO*")
plt.ylabel("S")

# Optionally, restrict the range of the x-axis and y-axis
plt.xlim(9, 13)
plt.ylim(0, 0.2)

# Save the plot as an image
plt.savefig('plot_FEOT_vs_S.png')

# Display the plot
plt.show()

[Back to Table of contects](#table)

***

#### <a name="plottas"></a> Plot the data in the TAS diagram

In [None]:
# Adding a new column 'K2O+NA2O' to the final_df DataFrame
final_df['K2O+NA2O'] = final_df['K2O'] + final_df['NA2O']

# Creating a list of markers
markers = ['o'] * 3 + ['o'] * 10 + ['^'] * 23 + ['s'] * 3 + ['s'] * 3
# Creating a list of colors
colors = ['b'] * 3 + ['r'] * 10 + ['g'] * 23 + ['y'] * 3 + ['c'] * 3

# Setting the figure size
plt.figure(figsize=(7, 6))

# Plotting scatterplots for each row in final_df
for i in range(len(final_df)):
    sns.scatterplot(x=final_df['SIO2'].iloc[i:i+1], y=final_df['K2O+NA2O'].iloc[i:i+1],
                    marker=markers[i], color=colors[i], s=45)

# Adding a line and text annotations to the plot
x_values = [48, 50.3]
y_values = [3.15, 3.8]
plt.plot(x_values, y_values, color='k', linewidth=0.5)
plt.text(49.2, 3.6, 'ALKALIC', ha='right', fontsize=14)
plt.text(51.5, 3.6, 'THOLEIITIC', ha='right', fontsize=14)

# Adding title and axis labels to the plot
plt.title("$SiO_2$ vs. $K_2O+Na_2O$")
plt.xlabel("$S1O_2$ [Wt%]")
plt.ylabel("$K_2O+Na_2O$ [Wt%]")

# Optionally, restricting the range of the axes
plt.xlim(48, 53)
plt.ylim(2.4, 3.8)

# Displaying the plot
plt.show()

[Back to Table of contects](#table)

***

### <a name="refs"></a>References

[1] Garcia, M. O., Muenow, D. W., Aggrey, K. E., and O'Neil, J. R. (1989), Major element, volatile, and stable isotope geochemistry <br/> of Hawaiian submarine tholeiitic   glasses, J. Geophys. Res., 94( B8), 10525– 10538, doi:10.1029/JB094iB08p10525. 

[2] requests - Python HTTP library for humans. [Online]. Available: https://requests.readthedocs.io

[3] json - This is part of Python's standard library. Python Software Foundation. Python Language Reference, version 3.x. <br/> Available at http://www.python.org

[4] pandas - Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science <br/> Conference, 51-56 (2010) [Online]. Available: https://pandas.pydata.org

[5] geopandas - GeoPandas developers. GeoPandas: Python tools for geographic data [Online]. Available: https://geopandas.org

[6] seaborn - Michael Waskom, Olga Botvinnik, Drew O’Kane, Paul Hobson, Joel Ostblom, Saulius Lukauskas, ... & Tom Augspurger. <br/> (2020, October 4). mwaskom/seaborn: v0.11.0 (Version v0.11.0). Zenodo. http://doi.org/10.5281/zenodo.4019147

[7] matplotlib - John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, <br/> 90-95 (2007), DOI:10.1109/MCSE.2007.55

[8] contextily - Darribas, D., Arribas-Bel, D., Nshan, B., & van den Bosch, M. (2020). contextily: context geo-tiles in Python. <br/> Journal of Open Source Software, 5(55), 2302. https://doi.org/10.21105/joss.02302

[9] adjustText - Ilya Flyamer. (2016). adjustText: A small library for automatically adjusting text position in matplotlib plots to minimize <br/> overlaps. Zenodo. http://doi.org/10.5281/zenodo.4922517

[10] ipywidgets - Project Jupyter. (2017). ipywidgets: Interactive HTML widgets for Jupyter notebooks and the IPython kernel. <br/> Zenodo. https://doi.org/10.5281/zenodo.836874

[Back to Table of contects](#table)

***

### <a name="qrcode"></a>QR code for accessing this Jupyter notebook on Binder

<img src="https://raw.githubusercontent.com/tmwProjects/Georoc_jupyter/main/BINDER_JUPYTER_QR.png" style="height:610px" align="left"/> <br><br>

##### <a name="hint3"></a>Hint:
- In some cases, Binder may not work the first time it is started. Then simply restart the link again.

***

### <a name="follow"></a>Follow us

[Visit our Website](https://georoc.mpch-mainz.gwdg.de/georoc/)

![Twitter Follow](https://img.shields.io/twitter/follow/DIGISgeo?style=social)

[Back to Table of contects](#table)

***