<img src="https://pbs.twimg.com/media/E0m6c9FX0AIX0p1.png" style="height:150px" align="left"/> <br><br>

<br/>
<br/>
<br/>
<br/>
<br/>

### Common data cleaning methods

***

The GEOROC Database (Geochemistry of Rocks of the Oceans and Continents) is a comprehensive collection of published analyses <br/> of igneous and metamorphic rocks and minerals. It contains major and trace element concentrations, radiogenic and nonradiogenic <br/> isotope ratios as well as analytical ages for whole rocks, glasses, minerals and inclusions. Metadata include geospatial and other <br/> sample information, analytical details and references.

The GEOROC Database was established at the Max Planck Institute for Chemistry in Mainz (Germany). In 2021, the database was <br/> moved to Göttingen University, where it continues to be curated as part of the DIGIS project of the Department of Geochemistry and <br/> Isotope Geology at the Geoscience Centre (GZG) and the University and State Library (SUB). Development for GEOROC 2.0 <br/> includes a new data model for greater interoperability, options to contribute data, and improved access to the database.

As part of the DIGIS project, a new API interface has been created for the GEOROC database, allowing easy access to its contents <br/>
with simple programming skills. Users can now query the database and retrieve data using the new API, making it more accessible <br/>
and useful for researchers and other interested parties. This notebook demonstrates the basic capabilities of GEOROC data access <br/> via the new DIGIS API. 

For feedback, questions and further information contact [Digis-Info](mailto:digis-info@uni-goettingen.de) directly.

This notebook demonstrates the basic capabilities of common data cleaning methods in geoscience.

***

April 2023, Dr. Marthe Klöcking, Timm M. Wirtz

<img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" style="height:50px" align="left"/> <br><br>

https://creativecommons.org/licenses/by-nc-sa/4.0/

***

### <a name="table"></a>Table of contents

* [Installing additional software and modules](#install)
        * [Hints](#hints)
    * [Change the Kernel](#Kernel)
* [Making API requests to access the database](#request)
        * [Hint](#hint)
    * [Store your API-Key here](#yourkey)
            * [Hint](#hint2)
    * [API call with messages](#apic)
    * [Multiple filter function](#ffunc)
    * [Extract data from JSON function](#dataf)
    * [Choose your parameters for the filters](#params)
            * [Hint](#hint3)
    * [Extract SamplingFeatureIDs](#esfid)
    * [Extract measurement data](#emdata)
* [Plotting the extracted data on a map](#plot)
* [References](#refs)
* [QR code for accessing this Jupyter notebook](#qrcode)

***

Import modules

In [35]:
import requests
import json
import io
from contextlib import redirect_stdout
import pandas as pd
from ipywidgets import widgets
from ipywidgets import Layout
from ipywidgets import IntProgress, Layout, HBox, Label
from functions.widgets import *
from functions.api import *
from functions.config import *
from functions.utils import *

***

In [46]:
display(api_key_widget)
display(confirm_button)

Text(value='VGltbVdpcnR6OlZHbHRiVmRwY25SNlgwUkpSMGxUWDBGUVNWOHhOamM0TWpjME16WTA=', description='API Key:', lay…

Button(description='Confirm API Key', style=ButtonStyle())

In [47]:
display(grid)

GridBox(children=(Checkbox(value=False, description='Sample_Num'), Checkbox(value=False, description='unique_i…

In [48]:
# Check the connection to the API server
check_api_connection()

API query successful for endpoint:ping 

Connection to API server successful!



In [49]:
display(grid2)

GridBox(children=(Text(value='2', description='Limit:', placeholder='Enter a value for Limit'), Text(value='',…

In [2]:
filtered_samples_combined = get_filtered_samples(
    limit=limit_widget.value, 
    offset=offset_widget.value, 
    location1=location1_widget.value,
    location2=location2_widget.value,
    location3=location3_widget.value,
    setting=setting_widget.value,
    latitude=latitude_widget.value,
    longitude=longitude_widget.value,
    rocktype=rocktype_widget.value,
    rockclass=rockclass_widget.value,
    mineral=mineral_widget.value,
    material=material_widget.value,
    inclusiontype=inclusiontype_widget.value,
    sampletech=sampletech_widget.value,
    element=element_widget.value,
    elementtype=elementtype_widget.value,
    value=value_widget.value,
    title=title_widget.value,
    publicationyear=publicationyear_widget.value,
    doi=doi_widget.value,
    firstname=firstname_widget.value,
    lastname=lastname_widget.value,
    agemin=agemin_widget.value,
    agemax=agemax_widget.value,
    geoage=geoage_widget.value,
    geoageprefix=geoageprefix_widget.value,
    lab=lab_widget.value
)

print(filtered_samples_combined, "\n")

NameError: name 'get_filtered_samples' is not defined

In [None]:
# Extract SamplingFeatureIDs
if "Data" in filtered_samples_combined and filtered_samples_combined["Data"]:
    sampling_feature_ids = [sample["SampleID"] for sample in filtered_samples_combined["Data"]]
    print("\n", f"The extracted SampleIDs are:", sampling_feature_ids, "\n")
else:
    print("No data found or unexpected data structure", "\n")

In [None]:
# Create an empty DataFrame to store all measurement data
measurement_data = pd.DataFrame()

In [None]:
# Erstelle die Progressbar
progress = IntProgress(value=0, min=0, max=len(sampling_feature_ids), layout=Layout(width="auto"))
progress_label = Label('Starting...')
progress_box = HBox([progress, progress_label])
display(progress_box)

# Iterate over the list of SamplingFeatureIDs
for sampling_feature_id in sampling_feature_ids:
    print(f"Fetching measurement data for SamplingFeatureID: {sampling_feature_id}")

    # Get the measurement data for the current SamplingFeatureID
    df = get_measurement_data(api_key_widget.value, sampling_feature_id)

    # Check if the dataframe is not empty and not None
    if df is not None and not df.empty:
        # Append the dataframe to the measurement_data DataFrame
        measurement_data = measurement_data.append(df, ignore_index=True)

        # Print the dataframe
        print(f"Data for SamplingFeatureID {sampling_feature_id}:\n", df)
    elif df is None:
        print(f"Error occurred while fetching data for SamplingFeatureID {sampling_feature_id}")
    else:
        print(f"No measurement data found for SamplingFeatureID {sampling_feature_id}")

    # Aktualisiere die Progressbar und das Label
    progress.value += 1
    progress_label.value = f"Processing: {sampling_feature_id} ({progress.value}/{progress.max})"

In [None]:
# Save the combined measurement_data DataFrame
measurement_data.to_csv('measurement_data.csv', index=False)

### <a name="refs"></a>References

[1] GMT 6: Wessel, P., Luis, J. F., Uieda, L., Scharroo, R., Wobbe, F., Smith, W. H. F., & Tian, D. (2019). The Generic Mapping Tools version 6. Geochemistry, Geophysics, Geosystems, 20, 5556–5564. https://doi.org/10.1029/2019GC008515

[2] Uieda, Leonardo, Tian, Dongdong, Leong, Wei Ji, Schlitzer, William, Grund, Michael, Jones, Max, Fröhlich, Yvonne, Toney, Liam, Yao, Jiayuan, Magen, Yohai, Jing-Hui, Tong, Materna, Kathryn, Belem, Andre, Newton, Tyler, Anant, Abhishek, Ziebarth, Malte, Quinn, Jamie, & Wessel, Paul. (2023). PyGMT: A Python interface for the Generic Mapping Tools (v0.9.0). Zenodo. https://doi.org/10.5281/zenodo.7772533

[3]  Thomas A Caswell, Antony Lee, Elliott Sales de Andrade, Michael Droettboom, Tim Hoffmann, Jody Klymak, John Hunter, Eric Firing, David Stansby, Nelle Varoquaux, Jens Hedegaard Nielsen, Benjamin Root, Ryan May, Oscar Gustafsson, Phil Elson, Jouni K. Seppänen, Jae-Joon Lee, Darren Dale, hannah, … Charlie Moad. (2023). matplotlib/matplotlib: REL: v3.7.1 (v3.7.1). Zenodo. https://doi.org/10.5281/zenodo.7697899

[4] Anenburg, M., & Williams, M. J. (2021). Quantifying the Tetrad Effect, Shape Components, and Ce–Eu–Gd Anomalies in Rare Earth Element Patterns. Mathematical Geosciences. doi: https://doi.org/10.1007/s11004-021-09959-5

[5] The pandas development team. (2023). pandas-dev/pandas: Pandas (v2.0.0). Zenodo. https://doi.org/10.5281/zenodo.7794821

[6] Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

***

### <a name="qrcode"></a>QR code for accessing this Jupyter notebook on Binder

<img src="https://raw.githubusercontent.com/tmwProjects/Georoc_jupyter/main/BINDER_JUPYTER_QR.png" style="height:610px" align="left"/> <br><br>

##### <a name="hint"></a>Hint:
- In some cases, Binder may not work the first time it is started. Then simply restart the link again.

[Back to Table of contects](#table)

***

### <a name="follow"></a>Follow us

[Visit our Website](https://georoc.mpch-mainz.gwdg.de/georoc/)

![Twitter Follow](https://img.shields.io/twitter/follow/DIGISgeo?style=social)

[Back to Table of contects](#table)

***