# Depth

By **Franklin Oliveira**

-----

This notebook outputs Depth charts for the `carcinos` collection. 

Database: <font color='blue'>'Planilha geral Atualizada FINAL 5_GERAL_sendo trabalhada no Google drive.xlsx'</font>.

In [1]:
import datetime
import numpy as np
import pandas as pd

from collections import defaultdict

# pacotes para visualização rápida
import seaborn as sns
import matplotlib.pyplot as plt

# pacote para visualização principal
import altair as alt

# habilitando renderizador para notebook
# alt.renderers.enable('notebook')
alt.renderers.enable('default')


# desabilitando limite de linhas
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Importing data...

Importing pre-treated data in `1-data_treatment.ipynb`. In this notebook, I'm doing only some minnor adjustments for visualization purposes only. For a full traceback of data treatment, please see the `1-data_treatment` notebook.

In [2]:
# treated_db was the previous db I used to write all this code
NewTable = pd.read_csv('./data/treated_db.csv', sep=';', encoding='utf-8', low_memory=False)

## Filtering

At least for now, we'll be considering only specimens of order decapoda (deeply revised by the Museum's crew)

In [3]:
decapoda = NewTable[NewTable['order'] == 'Decapoda'].copy()

<br>

<font size=5>**Color palette**</font>

Colors (per infraorder): 

- <font color='#e26d67'><b>Ascacidae</b></font>
- <font color='#007961'><b>Anomura</b></font>
- <font color='#7a2c39'><b>Achelata</b></font>
- <font color='#b67262'><b>Axiidea</b></font>
- <font color='#ee4454'><b>Brachyura</b></font>
- <font color='#3330b7'><b>Caridea</b></font>
- <font color='#58b5e1'><b>Gebiidea</b></font>
- <font color='#b8e450'><b>Stenopodídea</b></font>
- <font color='#a0a3fd'><b>Astacidae</b></font>
- <font color='#deae9e'><b>Polychelida</b></font>
- <font color='#d867be'><b>Grapsidae</b></font>
- <font color='#fece5f'><b>Xanthoidea</b></font>

In [4]:
# importing customized color palettes
from src.MNViz_colors import *

<br>

---

## Graphs

### Depth per family

x: Start Year (from Start Date)
y: number of catalogations per year

In [5]:
# subsetting
teste = decapoda[['min_depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'collector_full_name', 'country','state','locality', 'type_status']].copy()

# sorting
teste = teste.sort_values(['min_depth','family'])

# dropping na
teste.dropna(subset=['min_depth'], inplace=True)

# making sure altitude is a floating point number
teste['min_depth'] = teste['min_depth'].astype(float)

# extremes for scale
max_y = teste['min_depth'].max()
min_y = teste['min_depth'].min()

In [7]:
temp = alt.Chart(teste, title='Depth per Family', width=800, height=400).mark_circle().encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('min_depth', op='max', order='ascending')),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y])),
        color= alt.Color('family:N', title='Family', 
                     scale=alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                     legend= alt.Legend(columns=3, symbolLimit=102)),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
)

temp = temp.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# temp.save('./graphs/depth/depth_per_family.html')
# temp

<br>

### Customizing scale for Y axis

as suggested by the Museum team: 200m is a threshold that separates ocean stream.

In [9]:
# escala customizada para o eixo Y
custom_y_scale = []

i = 0

while i < max_y+1:
    if i % 20 == 0 and i <= 200:
        custom_y_scale.append(i)
    elif i % 100 == 0 and i <=500:
        custom_y_scale.append(i)
    elif i % 500 == 0:
        custom_y_scale.append(i)
    i+=1
    
custom_y_scale.append(custom_y_scale[-1] + 500)

# custom_y_scale

In [13]:
data = teste.copy()
data['type_status'] = data['type_status'].astype(str)

select_family = alt.selection_multi(fields=['family'], bind='legend')
select_type = alt.selection_multi(fields=['type_status'], bind='legend')

# background
back = alt.Chart(data, title='Depth per Family', width=800, height=1200).mark_point(filled=True,
                                                                color= 'lightgray', opacity=0.2).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('min_depth', op='max', order='ascending')),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y]),
              axis= alt.Axis(values=custom_y_scale, tickCount=len(custom_y_scale),
                             labelExpr="datum.value % 50 ? null : datum.label")),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
)

# front view
front = alt.Chart(data, title='Depth per Family', width=800, height=1200).mark_point(filled=True).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('min_depth', op='max', order='ascending')),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y]),
              axis= alt.Axis(values=custom_y_scale, tickCount=len(custom_y_scale),
                             labelExpr="datum.value % 50 ? null : datum.label")),
    color= alt.Color('family', title='Family', 
                     scale=alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                     legend= alt.Legend(columns=2, symbolLimit=102)),
    shape = alt.Shape('type_status:N', title='Types', 
                      legend= alt.Legend(columns=4),
                      scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                       range=['triangle', 'square', 'cross','circle'])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)

g = (back + front)

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/depth_per_family-custom_scale.html')
# g

<br>

### Interactive version with standard scale

In [17]:
# subsetting
teste = decapoda[['min_depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'collector_full_name', 'country','state','locality', 'type_status']].copy()

# sorting
teste = teste.sort_values(['min_depth','family'])

# dropping na
teste.dropna(subset=['min_depth'], inplace=True)

# making sure altitude is a floating point number
teste['min_depth'] = teste['min_depth'].astype(float)

# extremes for scale
max_y = teste['min_depth'].max()
min_y = teste['min_depth'].min()

In [40]:
data = teste.copy()
data['type_status'] = data['type_status'].astype(str)

select_family = alt.selection_multi(fields=['family'], bind='legend')
select_type = alt.selection_multi(fields=['type_status'], bind='legend')

### background
back = alt.Chart(data, title='Depth per Family', width=800, height=400).mark_point(filled=True,
                                                                  color='lightgray', opacity=0.2).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('min_depth', op='max', order='ascending')),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                           'qualifier', 'start_year','collector_full_name',
                           'country', 'state', 'locality', 'min_depth'])
)

### front view
temp = alt.Chart(data, title='Depth per Family', width=800, height=400).mark_point(filled=True).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('min_depth', op='max', order='ascending')),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y])),
    color= alt.Color('family', title='Family', 
                     scale=alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                     legend= alt.Legend(columns=3, symbolLimit=102)),
    shape= alt.Shape('type_status:N', title='Types', 
                     legend= alt.Legend(columns=4),
                     scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                     range=['triangle', 'square', 'cross','circle'])),
#     opacity= alt.condition(select_type, alt.value(1), alt.value(0)),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)

g = (back + temp)

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/depth_per_family-shape_types.html')
# g

<br>

### Interactive version considering specimens collected ABOVE sea level (column `depth`)

In [16]:
# family Pseudothelphusidae was collected above sea level
# teste[teste['depth'] < 0]['family']

In [20]:
# subsetting
teste = decapoda[['depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'collector_full_name', 'country','state','locality', 'type_status']].copy()

# sorting
teste = teste.sort_values(['depth','family'])

# dropping na
teste.dropna(subset=['depth'], inplace=True)

# making sure altitude is a floating point number
teste['depth'] = teste['depth'].astype(float)

# extremes for scale
max_y = teste['depth'].max()
min_y = teste['depth'].min()

In [24]:
data = teste.copy()
data['type_status'] = data['type_status'].astype(str)

select_family = alt.selection_multi(fields=['family'], bind='legend')
select_type = alt.selection_multi(fields=['type_status'], bind='legend')

### background
back = alt.Chart(data, title='Depth per Family', width=800, height=400).mark_point(filled=True,
                                                                  color='lightgray', opacity=0.2).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('depth', op='max', order='ascending')),
    y = alt.Y('depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                           'qualifier', 'start_year','collector_full_name',
                           'country', 'state', 'locality', 'depth'])
)

### front view
temp = alt.Chart(data, title='Depth per Family', width=800, height=400).mark_point(filled=True).encode(
    x = alt.X('family', type='nominal', title='Family', 
              sort= alt.EncodingSortField('depth', op='max', order='ascending')),
    y = alt.Y('depth', type='quantitative', title='Depth (in meters)',
              scale = alt.Scale(domain=[max_y, min_y])),
    color= alt.Color('family', title='Family', 
                     scale=alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                     legend= alt.Legend(columns=3, symbolLimit=102)),
    shape= alt.Shape('type_status:N', title='Types', 
                     legend= alt.Legend(columns=4),
                     scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                     range=['triangle', 'square', 'cross','circle'])),
#     opacity= alt.condition(select_type, alt.value(1), alt.value(0)),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family','genus','species', 'type_status',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)

g = (back + temp)

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/depth_per_family-shape_types-wAltitude.html')
# g

<br>

## Depth per genus

In [25]:
teste = decapoda[['min_depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species',  'collector_full_name', 'type_status',
                            'country', 'state', 'locality']].copy()

# making sure altitude is a floating point number
teste['min_depth'] = teste['min_depth'].astype(float)
teste['genus'] = teste['genus'].str.capitalize()

# sorting
teste = teste.sort_values(['min_depth','genus'])

# dropping na
teste = teste.dropna(subset=['min_depth'])

# extremes for y axis
max_y = teste['min_depth'].max()
min_y = teste['min_depth'].min()

In [28]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering depth group
db = teste.sort_values('min_depth', ascending=False)
db['type_status'] = db['type_status'].astype(str)  # parsing into string to make selector work for NaN cases

# encoding labels
x_labels = db[['genus', 'min_depth']].groupby('genus').max().reset_index().sort_values('min_depth')['genus'].unique()
y_labels = db['min_depth'].unique()
types = db['type_status'].unique()
families = db['family'].unique()

# background
back = alt.Chart(db, width= 1500, height=500,
                 title='Depth per Genus').mark_point(filled=True, color='lightgray', opacity=0.2).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[max_y, 0])),
   tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
)

# front view
front = alt.Chart(db, width= 1500, height=500,
                 title='Depth per Genus').mark_point(filled=True).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[max_y, 0])),
    color = alt.Color('family:N', title='Family',
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=3, symbolLimit=102)),
    shape= alt.Shape('type_status:N', title='Types', 
                     legend= alt.Legend(columns=4),
                     scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                     range=['triangle', 'square', 'cross','circle'])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)


g = back + front

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/genus/depth-per-genus.html')
# g

<br>

## Separating per Min Depth

<font color='red' size=4>Separating groups below and above the 200m sea threshold</font>


In [29]:
genus = teste['genus'].unique()

d = defaultdict()

for gen in genus:
    depth = teste[teste['genus'] == gen]['min_depth'].max()
    d[gen] = depth
        
d = pd.DataFrame(dict(d), index=[0]).transpose().reset_index()
d.columns = ['genus', 'max_depth']

In [30]:
# divisão entre marés (sugerido pelo pessoal do Museu)
threshold = 200

# maior profundidade (>= 500m)
grupo1 = d[d['max_depth'] > threshold]['genus']

# menor profundidade
grupo2 = d[d['max_depth'] <= threshold]['genus']

#### higher depth group

In [33]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering depth group
db = teste[teste['genus'].isin(grupo1)].sort_values('min_depth', ascending=False)
db['type_status'] = db['type_status'].astype(str)  # parsing into string to make selector work for NaN cases

# encoding labels
x_labels = db[['genus', 'min_depth']].groupby('genus').max().reset_index().sort_values('min_depth')['genus'].unique()
y_labels = db['min_depth'].unique()
types = db['type_status'].unique()
families = db['family'].unique()


# background
back = alt.Chart(db, width= 800, height=500,
                 title='Depth per Genus').mark_point(filled=True, color='lightgray', opacity=0.2).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[max_y, 0])),
   tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
)

# front view
front = alt.Chart(db, width= 800, height=500,
                 title='Depth per Genus').mark_point(filled=True).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[max_y, 0])),
    color = alt.Color('family:N', title='Family',
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=3, symbolLimit=102)),
    shape= alt.Shape('type_status:N', title='Types', 
                     legend= alt.Legend(columns=4),
                     scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                     range=['triangle', 'square', 'cross','circle'])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)


g = back + front

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/genus/depth_per_genus-higher-depth.html')
# g

#### lower depth

In [39]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering depth group
db = teste[teste['genus'].isin(grupo2)].sort_values('min_depth', ascending=False)
db['type_status'] = db['type_status'].astype(str)  # parsing into string to make selector work for NaN cases

# encoding labels
x_labels = db[['genus', 'min_depth']].groupby('genus').max().reset_index().sort_values('min_depth')['genus'].unique()
y_labels = db['min_depth'].unique()
types = db['type_status'].unique()
families = db['family'].unique()


# background
back = alt.Chart(db, width= 800, height=500,
                 title='Depth per Genus').mark_point(filled=True, color='lightgray', opacity=0.2).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[threshold, 0])),
   tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
)

# front view
front = alt.Chart(db, width= 800, height=500,
                 title='Depth per Genus').mark_point(filled=True).encode(
    x = alt.X('genus', type='nominal', title='Genus',
             sort=alt.EncodingSortField('min_depth', op='max', order="ascending"),
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('min_depth', type='quantitative', title='Depth (in meters)', 
              scale= alt.Scale(domain=[threshold, 0])),
    color = alt.Color('family:N', title='Family',
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=3, symbolLimit=102)),
    shape= alt.Shape('type_status:N', title='Types', 
                     legend= alt.Legend(columns=4),
                     scale= alt.Scale(domain=['Holótipo', 'Lectótipo','Parátipo', 'nan'],
                                     range=['triangle', 'square', 'cross','circle'])),
    tooltip = alt.Tooltip(['catalog_number', 'infraorder','family', 'genus','species',
                            'qualifier', 'start_year','collector_full_name',
                            'country', 'state', 'locality', 'min_depth'])
).add_selection(select_family, select_type).transform_filter(select_family).transform_filter(select_type)


g = back + front

g = g.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g.save('./graphs/depth/genus/depth_per_genus-lower-depth.html')
# g

<br>

**Thats it!**

-----