<img src='https://github.com/LinkedEarth/Logos/raw/master/PYLEOCLIM_logo_HORZ-01.png' width="800">

# 1. Introduction to Jupyter and Pyleoclim

## Preamble

For this hackathon, you will be using a Jupyter Lab environment. This notebook is designed to teach you how to use this environment, some basic Python terminology, and the data structures used by Pyleoclim. It assumes some familiarity with Python (scientific Python in particular). If a refresher is needed, we recommend these two notebooks from the [Introduction to Data Mining](https://www-users.cs.umn.edu/~kumar001/dmbook/index.php) book:
- [Intoduction to Python](http://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html)
- [Introduction to Numpy and Pandas](http://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html)

## Working with datasets within a Jupyter Environment

### Downloading datasets from the internet

The `pip install` command is used to install software packages in Python. In this case, we are using it to install a package called `wget` which allows us to fetch datasets from the internet.

To run a cell in a notebook environment, either click on play in the bar above or use the keyboard shrotcut `shift+enter`. If you are new to this environment, there is an excellent orientation available [here](https://dzone.com/articles/getting-started-with-jupyterlab).

In [None]:
!pip install wget
!pip install demjson --upgrade  # addresses this setuptools/demjson incompatibility: https://github.com/dmeranda/demjson/issues/40

Let's use `wget` to get the Southern Oscillation Index dataset.

In [None]:
!wget https://raw.githubusercontent.com/LinkedEarth/paleoHackathon/main/data/soi_data.csv

The text file is saved in the same folder as this Notebook.

Let's look at our data. Here, you will be using the [`Pandas`](https://pandas.pydata.org) Python package, which is widely used in data science. Python in an object-oriented language, which means that it is designed to operate around data, or objects, rather than functions. Once an object is defined, operations can be applied to it.

Let's have a look at the code cell below:
- The first line imports the `pandas` package for use inside Python. 
- The second line loads the data into a `Pandas DataFrame` object. 

The parameter `skiprows` informs pandas to skip the first row, which is used as a title. The parameter `header` tells pandas that the second row contains the header information for the table. Note that indexing in Python starts at 0, so line 1 has index 0.

- The last line of code display the table

In [None]:
import pandas as pd
df = pd.read_csv('soi_data.csv',skiprows=0, header=1)
display(df)

### Uploading a dataset from your local machine

You can upload a dataset from your machine onto Jupyter Lab. A tutorial is available [here](https://www.youtube.com/watch?v=1bd2QHqQSH4).

**Exercise**
1. Create a csv file on your local machine from the [Cobb et al. (2001) Palmyra record](https://www.ncei.noaa.gov/pub/data/paleo/coral/east_pacific/cobb2001_noaa.txt) on the NOAA repository. *Note*: You will have to clean up the text for import with Pandas. We suggest creating a csv by copying/pasting the data into a new Excel file and saving as csv.
2. Upload the file to your Jupyter Lab instance using the upload arrow icon at the top of the left menu bar
3. Open the dataset in pandas in a new object (df2)

In [None]:
# Write your code here

### Opening a dataset from a URL

`Pandas` supports the use of a URL to open a csv file without the need to download it first. All you need to do is provide the link to the dataset in `path`.

In [None]:
df3 = pd.read_csv('https://raw.githubusercontent.com/LinkedEarth/Pyleoclim_util/master/example_data/soi_data.csv',skiprows=0, header=1)
display(df3)

## Working with Pyleoclim

Pyleoclim is a Python package dedicated to the analysis of paleoclimate data. The full documentation is available [here](https://pyleoclim-util.readthedocs.io/en/stable/index.html). We actively maintain two versions of the documentation:
* stable: refers to the released version of Pyleoclim from Pypi. As its name indicates, it is considered stable and has been tested.
* latest: refers to the in development version. Although this version may have more up-to-date features, it is not tested and can result in unexpected behavior. 

You can toggle stable/latest in the bottom left corner of the screen.

Let's import the package.

In [None]:
import pyleoclim as pyleo

### Working with Series objects

The object at the heart of the package is the [`Series` object](https://pyleoclim-util.readthedocs.io/en/stable/core/ui.html#series-pyleoclim-series), which describes the fundamentals of a time series. 

Let's create a `Series` object based on the SOI data previously uploaded. To do so, one needs to invoke the `Series` class in Pyleoclim and define the properties of a `Series`, namely:
* `time`: Time values for the time series
* `value`: Paleo values for the time series
* `time_name` (optional): Name of the time vector, (e.g., 'Time', 'Age'). This is used to label the x-axis on plots
* `time_unit` (optional): The units of the time axis (e.g., 'years')
* `value_name` (optional): The name of the paleo variable (e.g., 'Temperature')
* `value_unit` (optional): The units of the paleo variable (e.g., 'deg C')
* `label` (optional): Name of the time series (e.g., 'Nino 3.4')
* `clean_ts` (optional): If True (default), remove NaNs and set an increasing time axis.

In [None]:
ts=pyleo.Series(time=df3['Year'],value=df3['Value'],time_name='Years CE',value_name='SOI')

You now have created an object called `ts` that is an instance of a `Pyleoclim Series`. You can affect the behavior of this object by applying one of the methods available for `Series` (A complete list can be found [here](https://pyleoclim-util.readthedocs.io/en/stable/core/ui.html#series-pyleoclim-series). Click on the specific function in the table to get more details.).

You can also use `tab` to autocomplete in Python. For instance, try writing `ts.` then tab to see which methods are available for a `Series` object.

If, at any point, you want to know the type of the object you're working with, you can use the function `type`:

In [None]:
type(ts)

To look at any Pyleoclim object, you can use the `__dict__` method.

In [None]:
ts.__dict__

To only return the keys, use:

In [None]:
ts.__dict__.keys()

To navigate to a value (hint you may want to check the type of the value before returning)

In [None]:
print(type(ts.__dict__['time']))
print(type(ts.__dict__['time_name']))
print(ts.__dict__['time_name'])

Let's plot our timeseries. 

In [None]:
ts.plot()

You can change the plot by setting different values for the optional arguments. A list of arguments for this function is available [here](https://pyleoclim-util.readthedocs.io/en/stable/core/Series/plot.html#pyleoclim.core.ui.Series.plot).

Let's use black for the plot.

In [None]:
ts.plot(color='k')

**Exercise**

Create a `Pyleoclim Series` from the Palmyra record previously loaded into a Pandas dataframe and plot the record in red. 

In [None]:
# Write your code here

Other functions return a different `Series` object after the transformation has been applied. For instance, let's apply a [detrending](https://pyleoclim-util.readthedocs.io/en/stable/core/Series/detrend.html#pyleoclim.core.ui.Series.detrend) scheme to our SOI series. 

In [None]:
ts_detrend =  ts.detrend()
ts_detrend.plot()

Since there is no long-term trend in the SOI data, the detrending didn't change the data considerably.

**Exercise**

1. We used the default method for detrending. According to the documentation of this function, what is the default?
2. Create a new Series object, called `ts_detrend2` using the `savitzky-golay` method.  

In [None]:
# Write your code here

Some functions return other objects that can be manipulated using their own methods. For instance, spectral analysis on a timeseries will return a `PSD object`, with its own plot function.

In [None]:
psd = ts.spectral(method='lomb_scargle')
psd.plot()

**Exercise**

1. Run spectral analysis on the SOI series using the MTM method. Pyleoclim will return an error.
2. Based on this error, use a Series method to pre-process the data accordingly.

In [None]:
# Write your code here

In Python, you can also link methods in one line. For instance, detrending and spectral analysis can be achieved by the command:

In [None]:
psd_detrend = ts.detrend().spectral(method='lomb_scargle').plot() 

Note that Python processes the request in order so the `plot` method applies to the PSD object object created through `spectral` rather than the original Series object.

### Working with MultipleSeries objects

In some instances (e.g. PCA analysis), one may wish to work with multiple series at the same time. Enter the `MultipleSeries object`, which is basically a list of `Series` object.

Let's load a difference dataframe that contains two different ENSO indices:

In [None]:
df_nino = pd.read_csv('../data/wtc_test_data_nino.csv', header=0)
df_nino.head()

**Exercise** 

Create two `Pyleoclim Series` object called `air` and `nino` respectively, corresponding to the last two columns in the DataFrame. Enter as much metadata information as you gather from [this page](https://www.mathworks.com/help/wavelet/ug/compare-time-frequency-content-in-signals-with-wavelet-coherence.html;jsessionid=e96608d0259a5c414c2a348ee3a1)

In [None]:
## your code here ##

To create a `MultipleSeries` object, pass the two series you just created as a list:

In [None]:
ms = pyleo.MultipleSeries([air,nino])

Let's plot them!

In [None]:
ms.plot()

Because the two series have such different values, it doesn't make sense to plot them on the same axis. Instead, we may want to use a stackplot.

In [None]:
ms.stackplot()

**Exercise**

Create two `Pyleoclim Series` for core [MD98-2170](https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/stott2004/stott2004.txt) from Stott et al. (2004) storing information about sea surface temperature (sst) and d18Ow and load them as a `Pyleoclim MultipleSeries` object. Create a stackplot of the records.


You will be working with the LiPD version of this record in Notebook2. 

In [None]:
##your code here##

In Notebook 2, you will learn to create these objects from LiPD files. Notebook 3-8 will make use of that object for various analyses.

A special case of a `MultipleSeries` is an `EnsembleSeries` who are like multiple series but usually represents the same quantity. Such an object can be created through age modeling for instance. In Python, special cases of an object can be encoded as children. Children inherit all the methods from their parent in addition to having their own special methods. In this case, `EnsembleSeries` is a child of `MultipleSeries`. We will be dealing with `EnsembleSeries` in Notebook2. 

### Working with other objects in Pyleoclim

As mentioned previously, other objects may be created over the course of analysis. For most applications, you will not have to create these objects by instantiating them directly. In the next few notebooks, we will work with these various objects in the course of scientific workflows.

Why work with some many objects? One of the advantage of object-oriented programming is that it provides separation of duties. It is also extensible, as objects can be extended to include new attributes and behaviors. However, it means that many objects must often be created to modulate the behavior. Think about the plot() function for a `Series object` and a `PSD object`. Fundamentally, they use the same matplotlib libraries; however, they return different plots appropriate for the object over which they are applied. 