# Export WALIS data for ShinyAppp

This notebook contains the scripts to download the full WALIS database and prepare a CSV file for the R Shiny App hosted at: https://warmcoasts.shinyapps.io/WALIS_Visualization/

## Dependencies and packages
This notebook calls various scripts that are included in the \scripts folder. The following is a list of the python libraries needed to run this notebook.

In [1]:
#Main packages
import pandas as pd
import pandas.io.sql as psql
import geopandas
import pygeos
import numpy as np
import mysql.connector
from datetime import date
import xlsxwriter as writer
import math
from scipy import optimize
from scipy import stats

#Plots
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable

#Jupyter data display
import tqdm
from tqdm.notebook import tqdm_notebook
from IPython.display import *
import ipywidgets as widgets
from ipywidgets import *

#Geographic 
from shapely.geometry import Point
from shapely.geometry import box
import cartopy as ccrs
import cartopy.feature as cfeature

#System
import os
import glob
import shutil

#pandas options for debugging
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#Set a date string for exported file names
date=date.today()
dt_string = date.strftime("_%d_%m_%Y")

# Ignore warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.filterwarnings('ignore')

## Import database
Connect to the online MySQL database containing WALIS data and download data into a series of pandas data frames.

In [2]:
## Connect to the WALIS database server
%run -i scripts/connection.py

## Import data tables and show progress bar
with tqdm_notebook(total=len(SQLtables),desc='Importing tables from WALIS') as pbar:
 for i in range(len(SQLtables)): 
   query = "SELECT * FROM {}".format(SQLtables[i])
   walis_dict[i] = psql.read_sql(query, con=db)
   query2 = "SHOW FULL COLUMNS FROM {}".format(SQLtables[i])
   walis_cols[i] = psql.read_sql(query2, con=db) 
   pbar.update(1)

path = os.getcwd()
CHECK_FOLDER = os.path.isdir('Output')
Output = 'Output'
if not CHECK_FOLDER:
 Output_path=os.path.join(path,Output)
 os.mkdir(Output_path)
else:
 Output_path=os.path.join(path,Output)

CHECK_FOLDER = os.path.isdir('Output/Shiny_input')    
Shiny_app = 'Shiny_input'
if not CHECK_FOLDER:
 Data_path=os.path.join(Output_path,Shiny_app)
 os.mkdir(Data_path)
else:
 Data_path=os.path.join(Output_path,Shiny_app) 

Importing tables from WALIS:   0%|          | 0/19 [00:00<?, ?it/s]

The following scripts make connections between the data and produce the summary file, which will be processed in the next step.

In [3]:
%run -i scripts/select_user.py
%run -i scripts/multi_author_query.py
%run -i scripts/substitutions.py
%run -i scripts/make_summary.py
Summary.to_csv('Output/Shiny_input/Summary.csv',index = False,encoding='utf-8-sig')

Extracting values for: WALIS Admin

The database you are exporting contains:
3999 RSL datapoints from stratigraphy
463 RSL datapoints from single corals
76 RSL datapoints from single speleothems
30 RSL indicators
19 Elevation measurement techniques
11 Geographic positioning techniques
28 Sea level datums
2717 U-Series ages (including RSL datapoints from corals and speleothems)
583 Amino Acid Racemization samples
213 Electron Spin Resonance ages
597 Luminescence ages
120 Chronostratigraphic constraints
160 Other age constraints
2107 References
We are substituting values in your dataframes....
querying by user
Putting nice names to the database columns....
Done!!
making summary table....
Done!


# Make data analysis
This section takes the "Summary.csv" file and performs some basic data analysis on it.

## RSL percentiles
Then, the script takes information on relative sea level values and calculates RSL percentiles in the following way.
1. If the RSL Indicator is a "Single Coral": the percentiles are obtained from a gamma function interpolated considering the upper limit of living range inserted in the database as, respectively, the 2.3 and 97.7 percentiles of the distribution.
2. If the RSL Indicator is a "Sea Level Indicator" or "Single Speleothem": the percentiles on paleo RSL are calculated from the gaussian distribution represented by the field "Paleo RSL (m)" and its associated uncertainty (1-sigma).
3. If the RSL Indicator is a "Terrestrial Limiting" or "Marine Limiting", the RSL percentiles are not calculated.

## Age percentiles
The following script takes information on age values and calculates age percentiles according to the table below. The following modifications are done on the original data:
 
 - If a percentile goes below zero, it is set to zero.
 - If Lower age > Upper age, the two values are reversed.
 - If there is no age, the corresponding record is deleted.
 
 | Dating technique | Pre-selection | Lower age | Age (ka) 0.1 perc | Age (ka) 2.3 perc | Age (ka) 15.9 perc | Age (ka) 50 perc | Age (ka) 84.1 perc | Age (ka) 97.7 perc | Age (ka) 99.5 perc | Upper age |
|-|-|-|-|-|-|-|-|-|-|-|
| U-series / coral | Recalculated age used if available. If not, Reported age is used | NaN | Average age - 3 Sigma age | Average age - 2 Sigma age | Average age - 1 Sigma age | Average age | Average age + 1 Sigma age | Average age + 2 Sigma age | Average age + 3 Sigma age | NaN |
| U-series / speleothem | Recalculated age used if available. If not, Reported age is used | NaN | Average age - 3 Sigma age | Average age - 2 Sigma age | Average age - 1 Sigma age | Average age | Average age + 1 Sigma age | Average age + 2 Sigma age | Average age + 3 Sigma age | NaN |
| U-series / mollusks or algae | Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| AAR / Age reported | | NaN | Average age - 3 Sigma age | Average age - 2 Sigma age | Average age - 1 Sigma age | Average age | Average age + 1 Sigma age | Average age + 2 Sigma age | Average age + 3 Sigma age | NaN |
| AAR / Only MIS reported | Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| ESR / Age reported | | NaN | Average age - 3 Sigma age | Average age - 2 Sigma age | Average age - 1 Sigma age | Average age | Average age + 1 Sigma age | Average age + 2 Sigma age | Average age + 3 Sigma age | NaN |
| ESR / Only MIS reported | Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| Luminescence / Age reported | | NaN | Average age - 3 Sigma age | Average age - 2 Sigma age | Average age - 1 Sigma age | Average age | Average age + 1 Sigma age | Average age + 2 Sigma age | Average age + 3 Sigma age | NaN |
| Luminescence / Only MIS reported | Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| Stratigraphic constraint / Age reported| Upper and lower age derived from the reported age | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| Stratigraphic constraint / Only MIS reported| Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| Other age constraint / Age reported| Upper and lower age derived from the reported age | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |
| Other age constraint / Only MIS reported| Upper and lower age derived from the MIS to which the sample is associated with | Lower age |<--|--|--| Uniform distribution |--|--|-->| Upper age |

In [5]:
%run -i scripts/percentiles_from_summary.py

Age substitutions done!
RSL indicators from stratigraphy done!
Starting calculations of gamma distribution for corals (will take a while...)
Done!
RSL indicators for speleothems done!
The dataframe has 0 points with no Elevation
The dataframe has 0 points with no Elevation error
The dataframe has 8 points with no age
Data with no Elevation information or age has been discarded
Your file has been saved!


# Suggested acknowledgments
WALIS is the result of the work of several people, within different projects. For this reason, we kindly ask you to follow these simple rules to properly acknowledge those who worked on it:

1. Cite the original authors - Please maintain the original citations for each datapoint, to give proper credit to those who worked to collect the original data in the field or in the lab.
2. Acknowledge the database contributor - The name of each contributor is listed in all public datapoints. This is the data creator, who spent time to make sure the data is standardized and (as much as possible) free of errors.
3. Acknowledge the database structure and interface creators - The database template used in this study was developed by the ERC Starting Grant "WARMCOASTS" (ERC-StG-802414) and is a community effort under the PALSEA (PAGES / INQUA) working group.

Example of acknowledgments: The data used in this study were *[extracted from / compiled in]* WALIS, a sea-level database interface developed by the ERC Starting Grant "WARMCOASTS" (ERC-StG-802414), in collaboration with PALSEA (PAGES / INQUA) working group. The database structure was designed by A. Rovere, D. Ryan, T. Lorscheid, A. Dutton, P. Chutcharavan, D. Brill, N. Jankowski, D. Mueller, M. Bartz, E. Gowan and K. Cohen. The data points used in this study were contributed to WALIS by *[list names of contributors here]*.