# Family counts per year

By **Franklin Oliveira**

-----
This notebook contains all code necessary to make the "type" charts from `carcinos` database. Here you'll find some basic data treatment and charts' code. 

Database: <font color='blue'>'Planilha geral Atualizada FINAL 5_GERAL_sendo trabalhada no Google drive.xlsx'</font>

In [1]:
import datetime
import numpy as np
import pandas as pd

from collections import defaultdict

# quick visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Waffle Charts
# from pywaffle import Waffle 
# docs: https://pywaffle.readthedocs.io/en/latest/examples/block_shape_distance_location_and_direction.html

# visualization
import altair as alt

# enabling notebook renderer
# alt.renderers.enable('notebook')
alt.renderers.enable('default')

# disabling rows limit
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Importing data...

In [2]:
NewTable = pd.read_csv('./data/treated_db.csv', sep=';', encoding='utf-8-sig', low_memory=False)

## Filtering

At least for now, we'll be considering only specimens of order decapoda (deeply revised by the Museum's crew)

In [3]:
decapoda = NewTable[NewTable['order'] == 'Decapoda'].copy()

<br>

<font size=5>**Color palette**</font>

Colors (per infraorder): 

- <font color='#e26d67'><b>Ascacidae</b></font>
- <font color='#007961'><b>Anomura</b></font>
- <font color='#7a2c39'><b>Achelata</b></font>
- <font color='#b67262'><b>Axiidea</b></font>
- <font color='#ee4454'><b>Brachyura</b></font>
- <font color='#3330b7'><b>Caridea</b></font>
- <font color='#58b5e1'><b>Gebiidea</b></font>
- <font color='#b8e450'><b>Stenopodídea</b></font>
- <font color='#a0a3fd'><b>Astacidae</b></font>
- <font color='#deae9e'><b>Polychelida</b></font>
- <font color='#d867be'><b>Grapsidae</b></font>
- <font color='#fece5f'><b>Xanthoidea</b></font>

In [4]:
# importing customized color palettes
from src.MNViz_colors import *

<br>


## Graphs

---
### Creating chart: counts per order per year

In [5]:
infraorders = decapoda.groupby(['start_year','infraorder', 'family']).count()['class'].reset_index().rename(columns={'class':'counts'})

infraorders.sort_values(['start_year','infraorder'], inplace=True)  # ordering

In [6]:
# dropping remaining NaN's
# infraorders = infraorders.dropna(subset=['infraorder'])

In [9]:
g1 = alt.Chart(infraorders, width=800, height=500, 
               title='Número de decapodas coletados por infraordem a cada ano').mark_circle().encode(
    x= alt.X('start_year', type='ordinal', title='Ano de Coleta'),
    y= alt.Y('infraorder', type='nominal', title='Infraordem',
            sort= alt.EncodingSortField(field='counts', op='sum', order='descending')),
    size = alt.Size('counts', scale=alt.Scale(range=[10,600]),
                    legend= alt.Legend(columns=6)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    color = alt.Color('family:N', title='Family', 
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=3, symbolLimit=102)),
    tooltip= alt.Tooltip(['start_year', 'counts'])
)

g1 = g1.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# saving graph
# g1.save('./graphs/infraorders_per_year.html')
# g1

### number of decapodas per family per year

In [10]:
teste = decapoda.groupby(['family','start_year']).count()['class'].reset_index().rename(
                                                                                    columns={'class':'counts'})

# teste['start_year'] = teste['start_year'].astype(int)  # there are non-integers remaining

<br>

**graph:** family per year

In [14]:
g1 = alt.Chart(teste,
               width=500, height=1000, title='Número de decapodas coletados a cada ano').mark_circle(
                                                                                size=60).encode(
    x= alt.X('start_year', type='ordinal', title='Sampling Year'),
    y= alt.Y('family', type='nominal', title='Family',
            sort= alt.EncodingSortField(field='counts', op='count', order='descending')),
    size= alt.Size('counts', title='Count', scale=alt.Scale(range=[6,400])),
    tooltip = alt.Tooltip(['family', 'start_year', 'counts'])
)

g1 = g1.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g1.save('./graphs/families_per_year.html')
# g1

In [15]:
teste = NewTable.groupby(['family','infraorder','start_year']).count()['class'].reset_index().rename(
                                                                                    columns={'class':'counts'})

# teste['start_year'] = teste['start_year'].astype(int) # there are non-integers remaining

In [20]:
g1 = alt.Chart(teste,
               width=500, height=1200, title='Número de decapoda coletados a cada ano').mark_circle(
                                                                                size=60).encode(
    x= alt.X('start_year', type='ordinal', title='Sampling Year'),
    y= alt.Y('family', type='nominal', title='Family',
            sort= alt.EncodingSortField(field='counts', op='sum', order='descending')),
    size= alt.Size('counts', title='Count', scale=alt.Scale(range=[6,400]),
                   legend= alt.Legend(columns=5)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    color = alt.Color('family:N', title='Family', 
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=2, symbolLimit=102)),
    tooltip = alt.Tooltip(['family', 'start_year', 'counts'])
)

g1 = g1.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# g1.save('./graphs/families_per_year-colorful-FULL.html')
# g1

In [21]:
fam_legend = teste['family'].unique()
fam_legend = [f for f in fam_legend if f in cores_familia_naive.keys()]
g1 = alt.Chart(teste[teste['family'].isin(fam_legend)],
               width=500, height=1200, title='Número de decapoda coletados a cada ano').mark_circle(
                                                                                size=60).encode(
    x= alt.X('start_year', type='ordinal', title='Sampling Year'),
    y= alt.Y('family', type='nominal', title='Family',
            sort= alt.EncodingSortField(field='counts', op='sum', order='descending')),
    size= alt.Size('counts', title='Count', scale=alt.Scale(range=[6,400]),
                   legend= alt.Legend(columns=5)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    color = alt.Color('family:N', title='Family', 
                      scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                       range=list(cores_familia_naive.values())),
                      legend= alt.Legend(columns=2, symbolLimit=102)),
    tooltip = alt.Tooltip(['family', 'start_year', 'counts'])
)

g1 = g1.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

g1.save('./graphs/families_per_year-colorful.html')
g1

<br>

**The end!**

-----