# Tutorial for the SUMup Dataset Explorer (SUMMEDup) tool

Created by: Megan Thompson-Munson (metm9666@colorado.edu)

Last updated: 27 January 2022

## 1) Set-up and data downloading

### Google Colab Users: Uncomment and run the following cell

To comment and uncomment, select the text you wish to modify and use the shortcut ```CTRL + /```.

In [None]:
# # Hide unnecessary output
# %%capture

# # Mount drive so files on Google Drive can be accessed by this script.
# # When this cell is run, it may output a link and pause running. 
# # Click the link, sign in, copy the code, and paste in the box that appears.
# from google.colab import drive
# drive.mount('/content/drive/')
 
#  # Set working directory
# %cd '/content/drive/MyDrive/SUMMEDup/'

# # Install required libraries and use correct versions of imgaug and shapely
# !pip install simplekml
# !pip install cartopy
# !pip uninstall imgaug -y
# !pip install imgaug==0.2.5
# !pip uninstall shapely -y
# !pip install shapely --no-binary shapely

### All Users: Run the following cells to import SUMMEDup and download SUMup (if not already downloaded)

In [None]:
# Import the sumup library, which contains all other necessary libraries (e.g., numpy, matplotlib)
import summedup as su

In [None]:
# Only run the cell below if you do not have SUMup downloaded

In [None]:
%%capture
%%bash
wget https://arcticdata.io/metacat/d1/mn/v2/object/urn%3Auuid%3A2512397e-effe-4b9d-9c5d-49c4b6ffdac6
mv urn\:uuid\:2512397e-effe-4b9d-9c5d-49c4b6ffdac6 sumup_density_2020_v060121.nc

## 2) Use `ReadNetcdf` to create a dataframe from the NetCDF file

This function simply reads in the NetCDF and turns it into a dataframe of the raw data.

In [None]:
# Create dataframe called 'dfSumup'
dfRaw = su.ReadNetcdf('sumup_density_2020_v060121.nc')

# Show dataframe
dfRaw

## 3) Use `Reformat` to add a unique core index to the dataframe and standardize units

This function creates a dataframe with the same data but in a more useful format, and accomplishes the following tasks:
* Fixes dates where only the year or only the year and month are given
* Adds a unique "CoreID" to each core and sorts dataframe by ice sheet
    * Antarctica: CoreID 0-886
    * Greenland: CoreID 887-1689
* Calculates the midpoint and thickness for each measurement
* Standardizes units
* Fixes any erroneous data points



In [None]:
# Create new dataframe from raw data
dfData = su.Reformat(dfRaw)

# Show new dataframe
dfData

## 4) Use `GetInfo` to create a dataframe of information about each core and sort by a value

This function reads in the full processed dataframe and outputs a dataframe of information about the measurements. It also allows you to sort the dataframe by several different values:
* `'CoreID'`
* `'Citation'`
* `'Method'`
* `'Timestamp'`
* `'Latitude'`
* `'Longitude'`
* `'Elevation (m)'`
* `'Core Depth (m)'`


In [None]:
# Create a dataframe of locations sorted by depth
# dfPoints = su.GetInfo(df=dfData, sort='Core Depth (m)')
dfPoints = su.GetInfo(df=dfData, sort='CoreID')

# Show new dataframe
dfPoints

## 5) Use `FilterPoints` to search within the dataframe for cores that meet given conditions

This function reads in the dataframe of information about the measurements and lets you select certain conditions. Essentially, it's a filter for the metadata so you can find entried in the dataframe that meet desired conditions. Available filters include:
* `icesheet`: Default is `'both'`. Input either `'Antarctica'` or `'Greenland'`.
* `citation`: Default includes all citations. Input an integer to filter for that citation. All citations can be found in the dataset readme.
* `method`: Default includes all methods. Input an integer to filter for that method. All method codes can be found in the dataset readme.
* `startDate` and `endDate`: Default includs all dates. Input date in the format `'YYYY-MM-DD'`.
* `minLat` and `maxLat`: Default includes all latitudes. Input a value to filter latitudes.
* `minLon` and `maxLon`: Default includes all longitudes. Input a value to filter longitudes.
* `minElev` and `maxElev`: Default includes all elevations. Input a value to filter elevations.
* `minDepth` and `maxDepth`: Default includes all core depths. Input a value to filter depths.

All filters are optional.

In [None]:
# Try uncommenting each of these to filter by different values

# dfFiltered = su.FilterPoints(df=dfPoints,startDate='2000-01-01')
# dfFiltered = su.FilterPoints(df=dfPoints,icesheet='Greenland',minElev=1000,maxElev=1500)
# dfFiltered = su.FilterPoints(df=dfPoints,minDepth=20,method=4)
dfFiltered = su.FilterPoints(df=dfPoints,icesheet='Antarctica',endDate='1989-12-31',minDepth=10)

# Print new dataframe
dfFiltered

## 6) Use `PlotLocs` to plot locations of SUMup observations on both ice sheets

You can color the points by a value or a single color.

At the minimum, provide a dataframe of locations: `su.PlotLocs(df=dfPoints)`

Other optional arguments:
* `color_by`: Select a color (e.g., `'blue'`) or a value to color by (e.g., `dfPoints['Max Depth (m)']`)
* `color_map`: If you selected a value to color by, you can choose a color map for the shading (e.g., `'plasma'`)
* `vmin` and `vmax`: If you selected a value to color by, you can set the minimum and maximum values for the color bar
* `'save'`: Default is `'no'`. Choose `'yes'` to save the figure in the `figures` folder.

Colors: https://matplotlib.org/stable/gallery/color/named_colors.html

Color maps: https://matplotlib.org/stable/gallery/color/colormap_reference.html

In [None]:
# Try uncommenting each of these to make different plots, or change the arguments to anything you'd like

# su.PlotLocs(df=dfPoints)
# su.PlotLocs(df=dfPoints, color_by='limegreen')
# su.PlotLocs(df=dfPoints, color_by=dfPoints['Method'], color_map='tab20b')
su.PlotLocs(df=dfPoints, color_by=dfPoints['Core Depth (m)'], color_map='plasma', vmin=0, vmax=100, save='no')


## 7) Use `SavePoints` to save location data as `.csv` or `.kmz`

Provide a dataframe, the file type, and whether you'd like it separated into files by ice sheet.
*   `ftype`: choose `'csv'` or `'kmz'`
*   `by_icesheet`: choose `'yes'` or `'no'`

The output file(s) will be saved in the `output` folder. It can also be called on directly from the web version of [Google Earth](https://earth.google.com/web/). 



In [None]:
# Save a single kml file with both ice sheets included
su.SavePoints(df=dfPoints, ftype='kmz', by_icesheet='no')

## 8) Use `PlotDensity` to create a density profile figure

This function reads in the processed dataframe and given CoreID's and then plots density profiles.

At the minimum provide a dataframe a 1 to 6 CoreIDs: `su.PlotDensity(df=df, CoreID=[180,839,1294]`

Other optional arguments:
* `color`: Default color is `'m'` for a single plot, and `['m','c','k','y','r','b']` for a plot with multiple lines. Use brackets if you specify a color (e.g., `color=['red']`)
* `compare`: Default is `'no'`, but you can choose `'yes'` to show the profiles on the same plot rather than individual ones. Note that if you choose `'yes'`, all of the CoreID's should be from the same ice sheet
    * Antarctica: 0-886
    * Greenland: 887-1689
* `'save'`: Default is `'no'`. Choose `'yes'` to save the figure in the `figures` folder.

Note that many of the measurements in Antarctica are just one or two measurements of surface density so the plots are less interesting. Try plotting ones with CoreID's less than 26.

In [None]:
# Try uncommenting these to view density profiles and different ways to visualize them

# su.PlotDensity(df=dfData, CoreID=[1412])
# su.PlotDensity(df=dfData, CoreID=[1135], color=['b'], save='yes')
# su.PlotDensity(df=dfData, CoreID=[10,23,191,201], compare='yes', save='yes')
# su.PlotDensity(df=dfData, CoreID=[913,914,915], color=['slateblue'])
su.PlotDensity(df=dfData, CoreID=[2,6,8], color=['mediumslateblue','indigo','deeppink'], compare='yes')