# Type charts

By **Franklin Oliveira**

-----
This notebook contains all code necessary to make the "type" charts from `carcinos` database. Here you'll find some basic data treatment and charts' code. 

Database: <font color='blue'>'Planilha geral Atualizada FINAL 5_GERAL_sendo trabalhada no Google drive.xlsx'</font>

In [20]:
import datetime
import numpy as np
import pandas as pd

from collections import defaultdict

# pacotes para visualização rápida
import seaborn as sns
import matplotlib.pyplot as plt

# pacote para visualização principal
import altair as alt

# habilitando renderizador para notebook
# alt.renderers.enable('notebook')
alt.renderers.enable('default')


# desabilitando limite de linhas
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Importing data...

In [21]:
NewTable = pd.read_csv('./data/treated_db.csv', sep=';', encoding='utf-8-sig', low_memory=False)

## Filtering

At least for now, we'll be considering only specimens of order decapoda (deeply revised by the Museum's crew)

In [22]:
decapoda = NewTable[NewTable['order'] == 'Decapoda'].copy()

<font color='red'>**p.s.:** parsing column type_status into string to make selectors work appropriately </font>

In [23]:
decapoda['type_status'] = decapoda['type_status'].astype(str)

<br>

<font size=5>**Color palette**</font>

Colors (per infraorder): 

- <font color='#e26d67'><b>Ascacidae</b></font>
- <font color='#007961'><b>Anomura</b></font>
- <font color='#7a2c39'><b>Achelata</b></font>
- <font color='#b67262'><b>Axiidea</b></font>
- <font color='#ee4454'><b>Brachyura</b></font>
- <font color='#3330b7'><b>Caridea</b></font>
- <font color='#58b5e1'><b>Gebiidea</b></font>
- <font color='#b8e450'><b>Stenopodídea</b></font>
- <font color='#a0a3fd'><b>Astacidae</b></font>
- <font color='#deae9e'><b>Polychelida</b></font>
- <font color='#d867be'><b>Grapsidae</b></font>
- <font color='#fece5f'><b>Xanthoidea</b></font>

In [24]:
# importing customized color palettes
from src.MNViz_colors import *

<br>


## Graphs

---

### Types (*per year*) per genus

x: Species1, cor: Type Status1, size: counts

In [25]:
# p.s.: the large majority is non-type
decapoda['type_status'].value_counts()

nan              8195
Parátipo           78
Holótipo           33
Paralectótipo       6
Alótipo             3
Lectótipo           2
Síntipo             2
Topótipo            2
Neótipo             1
Material tipo       1
Name: type_status, dtype: int64

In [26]:
# subsetting
teste = decapoda[['min_depth','family','order', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'type_status']].copy()

# grouping by type, year and order
temp = teste.groupby(['type_status','start_year', 'family']).count()['species'].reset_index().rename(columns={
    'species':'counts'
})

# p.s.: Cótipo, Co-tipo, Material tipo and Tipo are not types
temp = temp[~(temp['type_status'].isin(['Cótipo', 'Material tipo', 'Tipo', 'Co-tipo']))]

In [27]:
temp['type_status'].unique()

array(['Alótipo', 'Holótipo', 'Lectótipo', 'Neótipo', 'Paralectótipo',
       'Parátipo', 'Síntipo', 'Topótipo', 'nan'], dtype=object)

### Gráf. de Tipos

In [28]:
tipo = alt.Chart(temp, height=150, title='Tipos por ano').mark_circle().encode(
    x = alt.X('start_year:O', title='Ano de Coleta'),
    y = alt.Y('type_status:N', title= 'Type',
              sort=alt.EncodingSortField('counts', op='sum', order='descending')),
    color= alt.Color('family:N', title='Família',
                     scale=alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                     legend= alt.Legend(columns=8, symbolLimit=102,
                                       direction='horizontal', orient='bottom')), 
    size= alt.Size('counts', title='Contagem',
                   legend= alt.Legend(columns=10, orient='bottom')),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    tooltip= [alt.Tooltip('type_status', title='Tipo'),
              alt.Tooltip('start_year', title='Ano de Coleta'),
              alt.Tooltip('counts', title='Contagem')]
)

tipo = tipo.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# tipo.save('./graphs/tipo/tipos_por_ano-colors_per_family.html')
# tipo

## Types per Genus 

same graph as above, with gender on Y axis and colored by type

In [29]:
# subsetting
teste = decapoda[['min_depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'type_status']].copy()

# grouping by type, year and order
temp = teste.groupby(['type_status','start_year', 'genus', 'family']).count()['infraorder'].reset_index().rename(columns={
    'infraorder':'counts'
})

# p.s.: Cótipo, Co-tipo, Material tipo and Tipo are not types
temp = temp[~(temp['type_status'].isin(['Cótipo', 'Material tipo', 'Tipo', 'Co-tipo']))]

In [30]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering out non types
db = temp[temp['type_status'] != 'nan']

# encoding labels
x_labels = db.sort_values('start_year')['start_year'].unique()
y_labels = db['genus'].unique()
types = db['type_status'].unique()
counts = db['counts'].unique()
families = db['family'].unique()

tipo = alt.Chart(db, height=700, width= 400, title='Tipos por Gênero').mark_point(filled=False).encode(
    x = alt.X('start_year:O', title='Ano de Coleta',
              scale= alt.Scale(domain=x_labels)),
    y = alt.Y('genus:N', title= 'Gênero',
              sort=alt.EncodingSortField('counts', op='count', order='descending'),
              scale= alt.Scale(domain=y_labels)),
    color= alt.Color('family:N', title='Família',
                    scale= alt.Scale(domain=list(cores_familia_naive.keys()),
                                     range=list(cores_familia_naive.values())),
                    legend= alt.Legend(columns=3, symbolLimit=102, symbolType= 'circle')), 
    size= alt.Size('counts', title= 'Contagem',scale= alt.Scale(domain= counts, range=[10,100]),
                   legend= alt.Legend(columns=6)),
    order= alt.Order('counts', title='Contagem', sort='descending'),  # smaller points in front
    shape= alt.Shape('type_status:N', title='Tipos', legend=alt.Legend(columns=5),
                     scale= alt.Scale(domain= types)), 
    tooltip= [alt.Tooltip('type_status', title='Tipo'),
              alt.Tooltip('start_year', title='Ano de Coleta'),
              alt.Tooltip('counts', title='Contagem')]
).add_selection(select_type, select_family).transform_filter(select_type).transform_filter(select_family)

tipo = tipo.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# tipo.save('./graphs/tipo/tipos_por_genero.html')
# tipo

In [31]:
genus_order = list(temp.groupby(['genus']).min()['start_year'].reset_index().sort_values('start_year')['genus'])

In [32]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering out non types
db = temp[temp['type_status'] != 'nan'].sort_values('start_year')

# encoding labels
x_labels = db['start_year'].unique()
y_labels = db['genus'].unique()
types = db['type_status'].unique()
counts = db['counts'].unique()
families = db['family'].unique()

tipo = alt.Chart(db, height=700, width= 400, title='Tipos por Gênero').mark_point(filled=False).encode(
    x = alt.X('start_year:O', title='Ano de Coleta',
              scale= alt.Scale(domain=x_labels)),
    y = alt.Y('genus:N', title= 'Gênero',
              sort=genus_order,
              scale= alt.Scale(domain=y_labels)),
    color= alt.Color('family:N', title='Família',
                    scale= alt.Scale(domain=list(cores_familia_naive.keys()),
                                     range=list(cores_familia_naive.values())),
                    legend= alt.Legend(columns=3, symbolLimit=102, symbolType= 'circle')), 
    size= alt.Size('counts', title='Contagem', scale= alt.Scale(domain= counts, range=[10,100]),
                   legend= alt.Legend(columns=6)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    shape= alt.Shape('type_status:N', title='Tipos', legend=alt.Legend(columns=5),
                     scale= alt.Scale(domain= types)), 
    tooltip= [alt.Tooltip('type_status', title='Tipo'),
              alt.Tooltip('start_year', title='Ano de Coleta'),
              alt.Tooltip('counts', title='Contagem')]
).add_selection(select_type, select_family).transform_filter(select_type).transform_filter(select_family)

tipo = tipo.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# tipo.save('./graphs/tipo/tipos_por_genero-primeiro_ano.html')
# tipo

## Types per determiner

In [33]:
# subsetting
teste = decapoda[['min_depth','family', 'order','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'determiner_full_name', 'species', 'type_status']].copy()

# grouping by type, year and order
temp = teste.groupby(['type_status','start_year', 'determiner_full_name', 'family']).count()['order'].reset_index().rename(columns={
    'order':'counts'
})

# p.s.: Cótipo, Co-tipo, Material tipo and Tipo are not types
temp = temp[~(temp['type_status'].isin(['Cótipo', 'Material tipo', 'Tipo', 'Co-tipo']))]

In [34]:
determiner_order = list(temp.groupby(['determiner_full_name']).min(
    )['start_year'].reset_index().sort_values('start_year')['determiner_full_name'])

In [35]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering out non types
db = temp[temp['type_status'] != 'nan'].sort_values('start_year')

# encoding labels
x_labels = db.sort_values('start_year')['start_year'].unique()
y_labels = db['determiner_full_name'].unique()
types = db['type_status'].unique()
counts = db['counts'].unique()
families = db['family'].unique()

tipo = alt.Chart(db, height=800, width= 500, title='Tipos por Determinador').mark_point(filled=False).encode(
    x = alt.X('start_year:O', title='Ano de Coleta', 
             scale= alt.Scale(domain=x_labels)),
    y = alt.Y('determiner_full_name:N', title= 'Determinador',
              sort=determiner_order, 
              scale= alt.Scale(domain= y_labels)),
    color= alt.Color('family:N', title='Família',
                    scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                    legend= alt.Legend(columns=3, symbolLimit=102)), 
    size= alt.Size('counts:Q', title='Contagem', scale=alt.Scale(domain= counts, range=[10,100]),
                   legend= alt.Legend(columns=10)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    shape= alt.Shape('type_status:N', title='Tipos',
                    legend= alt.Legend(columns=5),
                    scale= alt.Scale(domain=types)), 
    tooltip= [alt.Tooltip('determiner_full_name', title='Determiner'),
              alt.Tooltip('type_status', title='Tipo'),
              alt.Tooltip('start_year', title='Ano de Coleta'),
              alt.Tooltip('counts', title='counts')]
).add_selection(select_type, select_family).transform_filter(select_type).transform_filter(select_family)

tipo = tipo.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# tipo.save('./graphs/tipo/tipos_por_determinador-primeiro_ano.html')
# tipo

<br>

## Types per family

In [36]:
# subsetting
teste = decapoda[['min_depth','family','infraorder', 'start_year', 'qualifier', 'catalog_number', 
                  'genus', 'species', 'type_status']].copy()

# grouping by type, year and order
temp = teste.groupby(['type_status','start_year', 'family']).count()['genus'].reset_index().rename(columns={
    'genus':'counts'
})

# p.s.: Cótipo, Co-tipo, Material tipo and Tipo are not types
temp = temp[~(temp['type_status'].isin(['Cótipo', 'Material tipo', 'Tipo', 'Co-tipo']))]

In [37]:
family_order = list(temp.groupby(['family']).min(
    )['start_year'].reset_index().sort_values('start_year')['family'])

In [38]:
select_type = alt.selection_multi(fields= ['type_status'], bind='legend')
select_family = alt.selection_multi(fields= ['family'], bind='legend')

# filtering out non types
db = temp[temp['type_status'] != 'nan'].sort_values('start_year')

# encoding labels
x_labels = db.sort_values('start_year')['start_year'].unique()
y_labels = db['family'].unique()
types = db['type_status'].unique()
counts = db['counts'].unique()
counts = list(range(counts.min(), counts.max() + 1))  # adjusting for missing counts
families = db['family'].unique()

tipo = alt.Chart(db, height=800, width= 500, title='Tipos por Família').mark_point(filled=False).encode(
    x = alt.X('start_year:O', title='Ano de Coleta',
              scale= alt.Scale(domain=x_labels)),
    y = alt.Y('family:N', title= 'Família',
              sort=family_order,
              scale= alt.Scale(domain=y_labels)),
    color= alt.Color('family:N', title='Família',
                    scale= alt.Scale(domain=list(cores_familia_naive.keys()), 
                                     range=list(cores_familia_naive.values())),
                    legend= alt.Legend(columns=3, symbolLimit=102)), 
    size= alt.Size('counts:Q', title='Contagem', scale=alt.Scale(domain= counts, range=[10,140]),
                   legend= alt.Legend(columns=10)),
    order= alt.Order('counts', sort='descending'),  # smaller points in front
    shape= alt.Shape('type_status:N', title='Tipos',
                    legend= alt.Legend(columns=5),
                    scale= alt.Scale(domain= types)), 
    tooltip= [alt.Tooltip('family', title='Família'),
              alt.Tooltip('type_status', title='Tipo'),
              alt.Tooltip('start_year', title='Ano de Coleta'),
              alt.Tooltip('counts', title='Contagem')]
).add_selection(select_type, select_family).transform_filter(select_type).transform_filter(select_family)

tipo = tipo.configure_title(fontSize=16).configure_axis(
    labelFontSize=12,
    titleFontSize=12
).configure_legend(
    labelFontSize=12,
    titleFontSize=12
)

# tipo.save('./graphs/tipo/tipos_por_familia.html')
# tipo

<br>

**The end!**

-----