# EDA on Argentine lakes and reservoirs dataset

*Short description: EDA on dataset of 103 argentine lake and reservoir data from Argentina. Dataset from paper by R. Quirós (1988)*

## Table of Contents

1. [Description of the problem](#description)
2. [Preliminary steps](#prelim) 

![lacar](lacar.jpg)

## Description of the problem <a class="anchor" id="description"></a>

Dsecription of the problem/questions I want to answer

## Preliminary steps <a class="anchor" id="prelim"></a>

First of all, we import the libraries that will be using during this analysis and load the dataset into the **data** variable:

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
from plotly.io import write_image

filepath = "./lake_res_data.csv"
data = pd.read_csv(filepath)
data = data.drop(columns='ID')

Next, we will explore some characteristics of the dataset using relevant pandas methods such as **info()**, **describe()** and **head()**:

In [2]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   NAME    103 non-null    object 
 1   TYPE    103 non-null    object 
 2   AREA    103 non-null    float64
 3   ZMEAN   103 non-null    float64
 4   ALT     103 non-null    int64  
 5   LAT     103 non-null    float64
 6   TEMP    103 non-null    float64
 7   SDT     103 non-null    float64
 8   TP      103 non-null    int64  
 9   TON     103 non-null    int64  
 10  CHL-a   103 non-null    float64
dtypes: float64(6), int64(3), object(2)
memory usage: 9.0+ KB


In [3]:
data.describe()

Unnamed: 0,AREA,ZMEAN,ALT,LAT,TEMP,SDT,TP,TON,CHL-a
count,103.0,103.0,103.0,103.0,103.0,103.0,103.0,103.0,103.0
mean,73.37,25.61068,583.776699,37.202718,11.866019,4.322718,221.854369,100.368932,25.772621
std,234.395916,36.272095,481.896192,5.60313,5.239585,4.940594,823.340713,125.403427,54.056045
min,0.09,0.7,2.0,24.12,3.0,0.03,1.0,6.0,0.16
25%,4.35,3.1,159.5,34.375,6.0,0.675,9.0,21.0,0.83
50%,12.0,8.1,550.0,37.88,14.0,2.0,30.0,45.0,6.7
75%,44.3,33.15,844.5,42.31,16.0,7.25,125.5,126.0,23.75
max,1984.0,166.0,3250.0,45.9,20.4,19.0,7912.0,762.0,405.3


In [4]:
data.head()

Unnamed: 0,NAME,TYPE,AREA,ZMEAN,ALT,LAT,TEMP,SDT,TP,TON,CHL-a
0,Rodeo,lake,0.09,3.6,1446,24.12,16.0,0.85,15,63,15.4
1,Comedero,lake,0.12,4.0,1446,24.12,16.0,1.2,18,36,22.5
2,La Ciénaga,reservoir,2.8,9.3,1212,24.47,17.5,1.0,25,45,7.4
3,Las Maderas,reservoir,9.6,31.3,1185,24.45,18.0,1.4,23,35,13.3
4,Campo Alegre,reservoir,3.2,14.4,1200,24.63,19.0,1.2,58,62,23.7


As we can see, the data is structured in 12 columns. The meaning of each column is described below:
+ **ID**: ID of the datapoint.
+ **NAME**: Name of the waterbody.
+ **TYPE**: Type of the waterbody.
+ **AREA**: Area, in $km^2$.
+ **ZMEAN**: Mean depth, in $m$.
+ **ALT**: Elevation, in $m$.
+ **LAT**: Latitude, $°S$.
+ **TEMP**: Annual mean air temperature, in $°C$.
+ **SDT**: Secchi disk transparency, in $m$.
+ **TP**: Total phosphorus, in $mg.m^{-3}$.
+ **TON**: Total organic nitrogen, in $\mu.M$
+ **CHL-a**: Total chlorophyll-a, in $mg.m^{-3}$.

In [5]:
data['VOL'] = data['AREA'] * data['ZMEAN']
data['DEPTH'] = np.where((data['ZMEAN'] > 5), 'Deep', 'Shallow')

In [6]:
def apply_custom_layout(figure,
                        figure_title,
                        xaxis_title,
                        yaxis_title,
                        font='Ubuntu',
                        bg_color='#FFFFFF',
                        axis_color='#BCCCDC',
                        grid_color='#BCCCDC',
                        spike_color='#999999',
                        title_fontsize=20,
                        legend_fontsize=16,
                        axis_fontsize=18,
                        tick_fontsize=14,
                        show_major_xgrid=True,
                        display_minor_xaxis=True,
                        dtick_xaxis=None,
                        show_minor_xgrid=True,
                        show_major_ygrid=False,
                        show_xspikes=True):
    """Applies desired custom formatting to figure layout.

    Parameters
    ----------
    figure : plotly.graph_objects.Figure instance
        Figure whose layout will be updated
    figure_title : str
        Desired figure title
    xaxis_title : str
        Desired X-axis title
    yaxis_title : str
        Desired Y-axis title
    font : str, optional
        Font to be used on title, axis, ticks and legend, by default 'Garamond'
    bg_color : str, optional
        Background color, by default '#FFFFFF'
    axis_color : str, optional
        Axis (not axis title) color, by default '#BCCCDC'
    grid_color : str, optional
        Grid color, by default '#BCCCDC'
    spike_color : str, optional
        Spike color, by default '#999999'
    title_fontsize : int, optional
        Title fontsize, by default 20
    legend_fontsize : int, optional
        Legend fontsize, by default 16
    axis_fontsize : int, optional
        Axis title fontsize, by default 18
    tick_fontsize : int, optional
        Tick fontsize, by default 14
    show_major_xgrid : bool, optional
        If True displays grid on major ticks for X-axis, by default True
    display_minor_xaxis : bool, optional
        If True displays grid on minor ticks for X-axis, by default True
    dtick_xaxis : str, optional
        Minor X-axis ticks steps, by default 'M3'
        The default 'M3' means minor ticks every three months. For more info go
        to: https://plotly.com/python/reference/layout/xaxis/#layout-xaxis-dtick
    show_minor_xgrid : bool, optional
        If True displays grid on major ticks for X-axis, by default True
    show_major_ygrid : bool, optional
        If True displays grid on major ticks for Y-axis, by default False
    show_xspikes : bool, optional
        If True displays spikes on hover for X-axis, by default True
    """
    # Updates the layout of the figure
    figure.update_layout(
        font_family=font,  # Font to be used in all elements, unless overridden
        plot_bgcolor=bg_color,  # Background color
        title_text=figure_title,  # Title text
        title_font_size=title_fontsize,  # TItle fontsize
        legend={  # Places legend horizontally, top right of the graph
            'font_size': legend_fontsize,  # Legend fontsize
            'orientation': 'h',
            'yanchor': 'bottom',
            'y': 1.02,
            'xanchor': 'right',
            'x': 1
        },
        hovermode='x',
        hoverdistance=1,  # Distance to show hover label of data point
        spikedistance=1000,  # Distance to show spike
        xaxis={
            'title': xaxis_title,  # Text to display in X-axis
            'title_font_size': axis_fontsize,  # X-axis fontsize
            'linecolor': axis_color,  # Color of X-axis (not X-axis text)
            'showgrid': show_major_xgrid,  # Show grid of X-axis major ticks
            'gridcolor': grid_color,  # Color of major X-axis grid
            'tickfont_size': tick_fontsize,  # Fontsize for X-axis ticks
            'minor': {
                'dtick': dtick_xaxis,  # Distance of minor X-axis ticks
                'ticks': 'inside',  # X-axis minor ticks in or out from axis
                'showgrid': show_minor_xgrid  # Show grid of X-axis minor ticks
            } if display_minor_xaxis is True else None,
            'showspikes': show_xspikes,  # Show spike line for X-axis
            # Format spike
            'spikethickness': 2,
            'spikedash': 'dot',  # Spike linetype
            'spikecolor': spike_color,  # Spike color
            'spikemode': 'across'
        },
        yaxis={
            'title': yaxis_title,  # Text to display in Y-axis
            'title_font_size': axis_fontsize,  # Y-axis fontsize
            'linecolor': axis_color,  # Color of Y-axis (not Y-axis text)
            'showgrid': show_major_ygrid,  # Show grid of Y-axis major ticks
            'gridcolor': grid_color,  # Color of major Y-axis grid
            'tickfont_size': tick_fontsize,  # Fontsize for Y-axis ticks
        })

In [7]:
fig1 = px.scatter(data_frame=data, x='LAT', y='ZMEAN', color='SDT', color_continuous_scale='algae', hover_name='NAME', labels={
                     'LAT': 'Latitude [°]',
                     'SDT': 'Secchi Disk Transparency [m]',
                     'ZMEAN': 'Mean Depth [m]',
                     'ALT': 'Altitude'
                 })
fig_title = 'Effect of Latitude and Mean Depth in Secchi Disk Transparency'
xaxis_title='Latitude [°]'
yaxis_title='Mean Depth [m]'

apply_custom_layout(fig1, fig_title, xaxis_title, yaxis_title, font='Lora', show_xspikes=False)
fig1.update_layout(hovermode='closest',  coloraxis_colorbar_orientation='h')
fig1.write_json("plot_name.json")
fig1.show()


In [8]:
#sns.scatterplot(x=data['LAT'], y=data['ZMEAN'], hue=data['TEMP'])
fig = px.scatter_3d(data_frame=data, x='LAT', y='ZMEAN', z='ALT', color='SDT', color_continuous_scale='sunsetdark', hover_name='NAME', 
                 labels={
                     'LAT': 'Latitude [°]',
                     'SDT': 'Secchi Disk Transparency [m]',
                     'ZMEAN': 'Mean Depth [m]',
                     'ALT': 'Altitude'
                 })
fig_title = 'Effect of Latitude and Altitude in Secchi Disk Transparency'
xaxis_title='Latitude [°]'
yaxis_title='Secchi Disk Transparency [m]'

#apply_custom_layout(fig, fig_title, xaxis_title, yaxis_title, font='Garamond')

fig.show()