# Tutorial 1: Paleoclimate Proxies

**Week 1, Day 4, Paleoclimate**

**Content creators:** Sloane Garelick

**Content reviewers:** Brodie Pearson

**Content editors:** Yosmely Bermúdez, Agustina Pesce, Zahra Khodakaramimaghsoud

**Production editors:** TBD

**Our 2023 Sponsors:** TBD

###**Code and Data Sources**

Code for this tutorial is based on existing notebooks from LinkedEarth that [convert LiPD files to a Pandas dataframe](https://github.com/LinkedEarth/notebooks/blob/master/PAGES2k/01.lipd2df.ipynb) and [create a map of the PAGES2k network](https:///github.com/LinkedEarth/notebooks/blob/master/PAGES2k/02.plot_map.ipynb).

The following data is used in this tutorial:


*   PAGES2k Consortium. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). https://doi.org/10.1038/sdata.2017.88







# **Tutorial 1 Objectives**

In this tutorial, you'll learn about different types of paleoclimate proxies (physical characteristics of the environment that can stand in for direct measurements) and how they can be used to reconstruct past variations in Earth's climate on various spatial and temporal timescales. In the process of exploring examples of proxy types and datasets, you'll also learn some fundamental skills for working with [Pyleoclim](https://pyleoclim-util.readthedocs.io/en/master/), a Python package designed for the analysis of paleoclimate data.


By the end of this tutorial you will be able to:

*   Understand some types of paleoclimate proxies and archives that exist
*   Create a global map of locations of proxy paleoclimate records in a specific data network 



# Setup

In [None]:
# # Install libraries
# !pip install Pandas
# !pip install pooch
# !pip install matplotlib as plt

In [None]:
# !pip install --no-binary shapely shapely --force # Add this to use cartopy. in this way it doesn't crush
# !pip install cartopy

In [None]:
# !pip install LiPD

In [None]:
# Import libraries
import os
import pandas as pd
import numpy as np
import pooch # to donwload the  PAGES2K data
import matplotlib.pyplot as plt

import lipd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shapereader

## Common helper functions

###  Convert the PAGES2K LiDP files into a pandas.DataFrame


####  Convert the PAGES2K LiDP files into a pandas.DataFrame


#####  Convert the PAGES2K LiDP files into a pandas.DataFrame


######  Convert the PAGES2K LiDP files into a pandas.DataFrame


#######  Convert the PAGES2K LiDP files into a pandas.DataFrame


########  Convert the PAGES2K LiDP files into a pandas.DataFrame


#########  Convert the PAGES2K LiDP files into a pandas.DataFrame


##########  Convert the PAGES2K LiDP files into a pandas.DataFrame


###########  Convert the PAGES2K LiDP files into a pandas.DataFrame


############  Convert the PAGES2K LiDP files into a pandas.DataFrame


#############  Convert the PAGES2K LiDP files into a pandas.DataFrame


##############  Convert the PAGES2K LiDP files into a pandas.DataFrame


###############  Convert the PAGES2K LiDP files into a pandas.DataFrame


################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#################  Convert the PAGES2K LiDP files into a pandas.DataFrame


##################  Convert the PAGES2K LiDP files into a pandas.DataFrame


###################  Convert the PAGES2K LiDP files into a pandas.DataFrame


####################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#####################  Convert the PAGES2K LiDP files into a pandas.DataFrame


######################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#######################  Convert the PAGES2K LiDP files into a pandas.DataFrame


########################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#########################  Convert the PAGES2K LiDP files into a pandas.DataFrame


##########################  Convert the PAGES2K LiDP files into a pandas.DataFrame


###########################  Convert the PAGES2K LiDP files into a pandas.DataFrame


############################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#############################  Convert the PAGES2K LiDP files into a pandas.DataFrame


##############################  Convert the PAGES2K LiDP files into a pandas.DataFrame


###############################  Convert the PAGES2K LiDP files into a pandas.DataFrame


################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


##################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


###################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


####################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


#####################################  Convert the PAGES2K LiDP files into a pandas.DataFrame


In [None]:
# @title Convert the PAGES2K LiDP files into a pandas.DataFrame

# Function to convert the PAGES2K LiDP files in a pandas.DataFrame
def lipd2df(lipd_dirpath, pkl_filepath=None, col_str=[
            'paleoData_pages2kID',
            'dataSetName', 'archiveType',
            'geo_meanElev', 'geo_meanLat', 'geo_meanLon',
            'year', 'yearUnits',
            'paleoData_variableName',
            'paleoData_units',
            'paleoData_values',
            'paleoData_proxy']):
    """
    Convert a bunch of PAGES2k LiPD files to a `pandas.DataFrame` to boost data loading.

    If `pkl_filepath` isn't `None`, save the DataFrame as a pikle file.

    Parameters:
    ----------
        lipd_dirpath: str
          Path of the PAGES2k LiPD files
        pkl_filepath: str or None
          Path of the converted pickle file. Default: `None`
        col_str: list of str
          Name of the variables to extract from the LiPD files

    Returns:
    -------
        df: `pandas.DataFrame`
          Converted Pandas DataFrame
    """

    # Save the current working directory for later use, as the LiPD utility will change it in the background
    work_dir = os.getcwd()
    # LiPD utility requries the absolute path
    lipd_dirpath = os.path.abspath(lipd_dirpath)
    # Load LiPD files
    lipds = lipd.readLipd(lipd_dirpath)
    # Extract timeseries from the list of LiDP objects
    ts_list = lipd.extractTs(lipds)
    # Recover the working directory
    os.chdir(work_dir)
    # Create an empty pandas.DataFrame with the number of rows to be the number of the timeseries (PAGES2k records),
    # and the columns to be the variables we'd like to extract
    df_tmp = pd.DataFrame(index=range(len(ts_list)), columns=col_str)
    # Loop over the timeseries and pick those for global temperature analysis
    i = 0
    for ts in ts_list:
        if 'paleoData_useInGlobalTemperatureAnalysis' in ts.keys() and \
            ts['paleoData_useInGlobalTemperatureAnalysis'] == 'TRUE':
            for name in col_str:
                try:
                    df_tmp.loc[i, name] = ts[name]
                except:
                    df_tmp.loc[i, name] = np.nan
            i += 1
    # Drop the rows with all NaNs (those not for global temperature analysis)
    df = df_tmp.dropna(how='all')
    # Save the dataframe to a pickle file for later use
    if pkl_filepath:
        save_path = os.path.abspath(pkl_filepath)
        print(f'Saving pickle file at: {save_path}')
        df.to_pickle(save_path)
    return df

## Convert PAGES2k LiPD files to a Pandas dataframe

As we've now seen from introductory video, there are various types of paleoclimate archives (e.g., sediment cores, corals, speleothems, tree rings, etc.) and proxies (e.g., isotopes, pollen, organic biomarkers, etc.). There are many existing paleoclimate reconstructions spanning a variety of timescales and from global locations. Given the temporal and spatial vastness of existing paleoclimate records, it can be challenging to know what paleoclimate data already exists and where to find it. A useful solution is compiling all existing paleoclimate records for a single climate variable (e.g., temperature, greenhouse gas concentration, precipitation, etc.) and over a specific time period (e.g., Holocene to present). 

One example of this is the **PAGES2k network**, which is a community-sourced database of temperature-sensitive proxy records. The database consists of 692 records from 648 locations, that are from a variety of archives (e.g., trees, ice, sediment, corals, speleothems, etc.) and span the Common Era (1 CE to present, i.e., the past ~2,000 years). You can read more about the PAGES2k network, in [PAGES 2k Consortium (2017)](https://www.nature.com/articles/sdata201788).

In this tutorial, we will explore the types of proxy records in the PAGES2k network and create a map of proxy record locations.

The PAGES2k network is stored in a specific file format known as Linked Paleo Data format (LiPD). LiPD files contain time series information in addition to supporting metadata (e.g., root metadata, location). Pyleoclim leverages this additional information using LiPD-specific functionality.

Data stored in the .lpd format can be loaded directly into Pyleoclim as a Lipd object. If the data_path points to one LiPD file, pyleo.Lipd will load the specific record, while if data_path points to a folder of lipd files, pyleo.Lipd will load the full set of records.

The first thing we need to do it to download the data and transform it into a DataFrame.

In [None]:
# Set the name to save the PAGES2K data
fname = "pages2k_data"

if not os.path.exists(fname):

    # Download the data
    lipd_file_paht = pooch.retrieve(
        url="https://ndownloader.figshare.com/files/8119937",
        known_hash=None,
        path="./",
        fname=fname,
        processor=pooch.Unzip()
    )

In [None]:
# Convert all the lipd file in a DataFrame
fname = "pages2k_data"

pages2k_data = lipd2df(lipd_dirpath=os.path.join(".", f"{fname}.unzip", "LiPD_Files"), pkl_filepath=None)

The PAGES2k data is now stored as a dataframe and we can view the data.

In [None]:
# Print the PAGES2K data
pages2k_data.head()

##Plotting a map of proxy reconstruction locations

Now that we have converted the data into a Pandas dataframe, we can create a map. We are going to plot the PAGES2k network on a map to understand the spatial distribution of the temperature records and the types of proxies that were measured.

Before genereting the plot, we have to define the colours and the marker types that we want to use in the plot. We also need to set a list with the different `archive_type` names that appear in the data frame.

In [None]:
# Set a list of markers and colors for the different archive_type
markers = ['p', 'p', 'o', 'v', 'd', '*', 's', 's', '8', 'D', '^']
colors = [
    np.array([ 1., 0.83984375, 0.]),
    np.array([ 0.73828125, 0.71484375, 0.41796875]),
    np.array([ 1., 0.546875, 0.]),
    np.array([ 0.41015625, 0.41015625, 0.41015625]),
    np.array([ 0.52734375, 0.8046875 , 0.97916667]),
    np.array([ 0., 0.74609375, 1.]),
    np.array([ 0.25390625, 0.41015625, 0.87890625]),
    np.array([ 0.54296875, 0.26953125, 0.07421875]),
    np.array([ 1, 0, 0]),
    np.array([ 1., 0.078125  , 0.57421875]),
    np.array([ 0.1953125, 0.80078125, 0.1953125])
]

We are now going to create a plot that will allow us to see the PAGES2k network on a map.

In [None]:
# Create the plot

fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.Robinson())

# Add plot title
plt.title(f'PAGES2k Network (n={len(pages2k_data)})', fontsize=20, fontweight='bold')

# Set the base map
# ----------------
ax.set_global()
# Add coast lines
ax.coastlines()
# Add land fratures using gray color
ax.add_feature(cfeature.LAND, facecolor='gray', alpha=0.3)
ax.gridlines(edgecolor='gray', linestyle=':')


# Plot the different archive types
# -------------------------------
# Extract the name of the different archive types
archive_types = pages2k_data.archiveType.unique()
# Plot the archive_type using a forloop
for i, type_i in enumerate(archive_types):
    df = pages2k_data[pages2k_data['archiveType']==type_i]
    # Count the number of appearances of the same archive_type
    count = df['archiveType'].count()
    # Generate the plot
    ax.scatter(
        df['geo_meanLon'],
        df['geo_meanLat'],
        marker=markers[i],
        c=colors[i],
        edgecolor='k',
        s=50,
        transform=ccrs.Geodetic(),
        label=f'{type_i} (n = {count})',
    )
# Add legend to the plot
ax.legend(
    scatterpoints=1,
    bbox_to_anchor=(0, -0.4),
    loc='lower left',
    ncol=3,
    fontsize=15,
)

plt.show()

Now you can see the global distribution and temperature proxy type of the 692 records in the PAGES2k network!

What do you notice about the map?

*   Which temperature proxy is the most and least abundant in this database?
*   In what region do you observe the most and least temperature records?


We can see the spatial distribution of paleoclimate temperature records spanning the past 2,000 years, but the next step is to extract and analyze the temperature time series of these reconstructions.
