# Database Exploration

By **Franklin Oliveira**

-----

This notebook contains some code written to I (Franklin) get accquainted with the `repteis` database. Here you'll find some basic data treatment and adjustments that presented necessary as I started to understand the nature of the information in file <font color='blue'>'Compilacao Livros Repteis - 2 a 10 - 2020_04_28.xls'</font>.

In [1]:
import datetime
import numpy as np
import pandas as pd

from collections import defaultdict

# pacotes para visualização rápida
import seaborn as sns
import matplotlib.pyplot as plt

# pacote para visualização principal
import altair as alt

# habilitando renderizador para notebook
# alt.renderers.enable('notebook')
alt.renderers.enable('default')


# desabilitando limite de linhas
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Importing data...

In [2]:
excel = pd.ExcelFile('Compilacao Livros Repteis - 2 a 10 - 2020_04_28.xls')
sheet_name = excel.sheet_names

print('The excel file contains the following sheets:', sheet_name)
print('\nDatabase is in sheet:', sheet_name[0])

The excel file contains the following sheets: ['Repteis-2020-02-11-csv']

Database is in sheet: Repteis-2020-02-11-csv


In [3]:
db = excel.parse(sheet_name[0], sep=';', encoding='utf-8-sig')

# p.s.: I'm parsing a pre-treated file provided by Asla
#db = pd.read_excel('db.csv', sep=',', encoding='utf-8-sig', low_memory=False)
print(f'The database has {db.shape[0]} rows and {db.shape[1]} columns.')

The database has 23119 rows and 114 columns.


In [4]:
# copying database to another variable to make a few changes keeping the original intact
repteis = db.copy()

<br>

<font color='red' size='5'>**p.s.:** We were told by the Curator that some names in this database may remain confidential. So, I'm gonna skip this step for now until we know how to treat this data appropriately </font>

### Name columns

`DeterminatorFirstName1` e `DeterminatorLastName1`

#### treating determiner's name 

In this step, we're going to put together 'determinator's' first and last name. 

`DeterminatorFirstName1` + `DeterminatorLastName1`

Just concatenating those two columns is not satisfactory because there's some dirt in these data columns. So, let's begin by treating them...

In [5]:
# repteis['DeterminatorFirstName1'].value_counts()
# repteis['DeterminatorLastName1'].value_counts()

In [6]:
def treat_names(name, pos='first'):
    '''
    Treat names keeping NaN as such.
    
    Arguments: 
        - name: name to be treated. 
        - pos (str): name position. One of ['first', 'last']
    '''
    if type(name) == str and pos == 'first':     # first name
        
        if len(name.split(' ')) > 1:             # treats composite names (+ 1 name)
            return str(name).strip().split(' ')[0].capitalize()
        else:
            return str(name).strip().capitalize()
    
    elif type(name) == str and pos == 'last':    # last name
        
        if len(name.split(' ')) > 1:             # treats composite last name (+ 1 surname)
            return str(name).strip().split(' ')[-1].capitalize()
        else:
            return str(name).strip().capitalize()   
    else:
        return name

applying function to the two column names

In [7]:
#crustaceas['Determiner First Name1'] = crustaceas['Determiner First Name1'].apply(treat_names)
# crustaceas['Determiner First Name1'].value_counts()

In [8]:
#crustaceas['Determiner Last Name1'] = crustaceas['Determiner Last Name1'].apply(lambda x:
#                                                                treat_names(x, pos='last'))
# crustaceas['Determiner Last Name1'].value_counts()

In [9]:
# creating column with First and Last name for identification
#crustaceas['Determiner First_and_Last Name'] = crustaceas['Determiner First Name1'] + ' ' + crustaceas['Determiner Last Name1']

<br>

## Adjusting columns names

### removing '\n'

In [10]:
repteis.columns = [str(col).replace(r'\n','') for col in repteis.columns]

#### closer look on some columns...

In [11]:
for col in repteis.columns:
    print('-', col)

- NumeroDeCatalogo
- NumeroDeCampo
- DataDeEntrada
- DataDaDeterminacao
- Kingdom
- Phylum
- Class
- Ordem
- Familia
- Genero_ent
- Qualificador_ent
- Especie_ent
- Subespecie_ent
- Type Status 1
- Current 1
- Determined Date 2
- Class2
- Order2
- Family2
- Genero_atual
- Qualificador_atual
- Especie_atual
- Subespecie_atual
- NotasTaxonomicas
- Current 2
- DeterminatorLastName1
- DeterminatorFirstName1
- DeterminatorMiddleInitial1
- DeterminatorLastName2
- DeterminatorFirstName2
- DeterminatorMiddleInitial2
- DeterminationRemark
- AssociatedTaxa
- TypeOf
- ColecaoEspecial
- DataColetaInicial
- DataColetaFinal
- Complemento
- NomeDaLocalidade
- Municipio
- EstadoOuProvincia
- Pais
- Continente
- LocalityRemark
- UtmDatum
- UtmZone
- UtmEasting
- UtmNorthing
- VerbatimLatitude
- VerbatimLongitude
- Lat
- Long
- PrecisaoDaCoordenada
- GrupoDeColeta
- MinAltitude
- MaxAltitude
- CollectingInformationRemark
- CollectorLastName1
- CollectorFirstName1
- CollectorMiddleName1
- CollectorLastNa

In [13]:
repteis['Especie_atual'].value_counts()

torquatus       1403
ocellifera       779
jararaca         635
hispidus         607
ameiva           558
                ... 
brogermianus       1
leeseri            1
annulifera         1
planiceps          1
glabelum           1
Name: Especie_atual, Length: 803, dtype: int64

<font size='5'>**Equivalência de colunas:** </font>

**Nome diferente:** <br>
- Species1: Especie_ent ou Especie_atual
- Species Author1: ?
- Type Status1: Type Status 1
- Qualifier1: Qualificador_ent Qualificador_atual
- Determiner First Name1: DeterminatorFirstName1
- Determiner Middle1: DeterminatorMiddleInitial1
- Determiner Last Name1: DeterminatorLastName1
- Determined Date1: DataDaDeterminacao

In [14]:
repteis[['Especie_atual', 'Type Status 1', 'Qualificador_atual', 'DeterminatorFirstName1',
       'DeterminatorMiddleInitial1', 'DeterminatorLastName1', 'DataDaDeterminacao']]

Unnamed: 0,Especie_atual,Type Status 1,Qualificador_atual,DeterminatorFirstName1,DeterminatorMiddleInitial1,DeterminatorLastName1,DataDaDeterminacao
0,geoffroanus,,-,L.,M.F.,Cunha,00/11/2018
1,suspectum,,,,,,
2,alba,,,,,,
3,corais,,,,,,
4,bottae,,,,,,
...,...,...,...,...,...,...,...
23114,ibiboboca,,,P.,,Passos,00/10/2016
23115,ibiboboca,,,P.,,Passos,00/10/2016
23116,quadricarinatus,,,P.,,Passos,00/10/2016
23117,bicarinatus,,,P.,,Pinna,00/10/2016


<br>

## preparing data for charts...

### Column: `Type Status1`

contains info on species type

In [15]:
repteis['Type Status 1'].value_counts().head()  # essa coluna está vazia!?

Series([], Name: Type Status 1, dtype: int64)

#### Let's begin cleaning this data and lowering all cases.

In [16]:
#repteis['Type Status 1'] = repteis['Type Status 1'].str.strip().str.lower().str.capitalize()

In [17]:
repteis['Type Status 1'].value_counts().head()

Series([], Name: Type Status 1, dtype: int64)

### preparing taxonomy columns

`Kingdom` - `Phylum` - `Class` - `Ordem` - `Familia` - `Genero_ent`- `Genero_atual` - `Especie_ent` - `Especie_atual`

**Colunas faltando:**
- `Subphylum1`
- `Subclass1`
- `Infraclass1`
- `Superorder1`
- `Suborder1` 
- `Infraorder1` 
- `Superfamily1`
- `Subfamily1` 
- `Tribe1`

In [18]:
taxon_columns = ['Kingdom', 'Phylum', 'Class', 'Ordem', 'Familia', 'Genero_ent',
                 'Genero_atual', 'Especie_ent', 'Especie_atual', 'Subespecie_ent',
                 'Subespecie_atual']  # selecting taxonomy columns

# defining function
def treat_str(x):
    return str(x).lower().capitalize().strip()

# applying treatment
for col in taxon_columns:
    print(f'Adjusting column {col}')
    repteis[col] = repteis[col].apply(treat_str)

Adjusting column Kingdom
Adjusting column Phylum
Adjusting column Class
Adjusting column Ordem
Adjusting column Familia
Adjusting column Genero_ent
Adjusting column Genero_atual
Adjusting column Especie_ent
Adjusting column Especie_atual
Adjusting column Subespecie_ent
Adjusting column Subespecie_atual


### adding `Genero` and `Especie`together (they completely identify each animal's species)

In [19]:
repteis['genero_e_especie_ent'] = repteis['Genero_ent'] + ' ' + repteis['Especie_ent']
repteis['genero_e_especie_atual'] = repteis['Genero_atual'] + ' ' + repteis['Especie_atual']

repteis['genero_e_especie_ent'] = repteis['genero_e_especie_ent'].str.lower().str.capitalize()
repteis['genero_e_especie_atual'] = repteis['genero_e_especie_atual'].str.lower().str.capitalize()

<br>

### Collecting date (year) and sclicing main DB to a smaller dataset 

Columns: `Determined Date1` - `Class1` - `Kingdom` and more...

In [20]:
# slicing main database (repteis)
Table = repteis[['DataDeEntrada','DataDaDeterminacao','DataColetaInicial','Class','Kingdom', 
                    'Genero_ent', 'Genero_atual', 'Especie_ent', 'Especie_atual', 'Type Status 1',
                    'DeterminatorFirstName1', 'DeterminatorLastName1', 'genero_e_especie_ent',
                    'genero_e_especie_atual','MinAltitude',
                    'Ordem', 'Familia', 'Phylum']].copy()

# OBS: Determined Date1 has many missing values... CHECK THAT
d = []
counter=0
for row in Table['DataColetaInicial']:
    if not str(row).find('/')==-1:
        dates_values = str(row).split("/")
        year = int(dates_values[-1])
        month = int(dates_values[1])
#        if (month>1) and (month<12):
            #store the year and month in a datetime datatype for later sorting
#            dateRecord = datetime.datetime(year,month,1) 
    else:
        year = Table.loc[counter, 'DataColetaInicial']
    
    # mais um condicional para tratar anos vazios ' '
    if year == ' ':
        year = np.NAN
        
    d.append({'ano_coleta':year,
              'class':Table.loc[counter,'Class'],
              'kingdom':Table.loc[counter,'Kingdom'], 'genero_ent':Table.loc[counter,'Genero_ent'],
              'genero_atual':Table.loc[counter,'Genero_atual'],
              'especie_ent':Table.loc[counter,'Especie_ent'],
              'especie_atual':Table.loc[counter,'Especie_atual'],
              'genero_e_especie_ent': Table.loc[counter,'genero_e_especie_ent'],
              'genero_e_especie_atual': Table.loc[counter,'genero_e_especie_atual'],
              'type_status':Table.loc[counter,'Type Status 1'], 
              'determinator_first_name':Table.loc[counter,'DeterminatorFirstName1'],
              'determinator_last_name':Table.loc[counter,'DeterminatorLastName1'],
              'altitude':Table.loc[counter,'MinAltitude'],
              'ordem':Table.loc[counter,'Ordem'],
              'familia':Table.loc[counter,'Familia'],
              'phylum': Table.loc[counter,'Phylum']
              })
    counter = counter+1

    
NewTable = pd.DataFrame(d)


### collecting determined year (p.s.: being careful to keep NaNs as they show up)
NewTable['ano_determinacao'] = np.nan

d1 = []
counter=0
for row in Table['DataDaDeterminacao']:
    try:  # if Determined Date1 is empty, keep it so 
        if np.isnan(row):
            year= np.NAN
    
    except:
        if not str(row).find('/')==-1:
            dates_values = str(row).split("/")
            year = int(dates_values[-1])
            month = int(dates_values[1])
#            if (month>1) and (month<12):
                #store the year and month in a datetime datatype for later sorting
#                dateRecord = datetime.datetime(year,month,1)    
    
    NewTable.loc[counter, 'ano_determinacao'] = year
    counter = counter+1

    
### collecting start year (p.s.: being careful to keep NaNs as they show up)  
NewTable['ano_entrada'] = np.nan
d1 = []
counter=0
for row in Table['DataDeEntrada']:
    try:  # if Start Date is empty, keep it so 
        if np.isnan(row):
            year= np.NAN
    
    except:
        if not str(row).find('/')==-1:
            dates_values = str(row).split("/")
            year = int(dates_values[-1])
            month = int(dates_values[1])
#            if (month>1) and (month<12):
                #store the year and month in a datetime datatype for later sorting
#                dateRecord = datetime.datetime(year,month,1)    

    NewTable.loc[counter, 'ano_entrada'] = year
    counter = counter+1

# NewTable['determined_year'] = pd.Series(year, index=NewTable.index)
NewTable.head(2)

Unnamed: 0,ano_coleta,class,kingdom,genero_ent,genero_atual,especie_ent,especie_atual,genero_e_especie_ent,genero_e_especie_atual,type_status,determinator_first_name,determinator_last_name,altitude,ordem,familia,phylum,ano_determinacao,ano_entrada
0,,Reptilia,Animalia,Nan,Phrynops,Nan,Geoffroanus,Nan nan,Phrynops geoffroanus,,L.,Cunha,,Testudines,Chelidae,Chordata,2018.0,
1,,Reptilia,Animalia,Heloderma,Heloderma,Suspectum,Suspectum,Heloderma suspectum,Heloderma suspectum,,,,,Squamata,Helodermatidae,Chordata,,


In [21]:
# checks if NaNs are in the same position 
result = (NewTable['ano_entrada'].isna() == repteis['DataDeEntrada'].isna()).sum() == NewTable.shape[0]

if result:
    print('ano_entrada info is valid.')
else:
    print("There's something wrong with NewTable. Check how you're collecting Start Year info.")

ano_entrada info is valid.


In [22]:
# year in which the holotipo was "firstly" found 
#NewTable['holotipo_year'] = NewTable['species_author'].str.extract('(\d+)')

<br>

### creating `years` columns in repteis

In [23]:
def catch_year(row):
    if not str(row).find('/')==-1:
        dates_values = str(row).split("/")
        year = int(dates_values[-1])
        month = int(dates_values[1])
        return year
    else:
        return np.NaN

In [24]:
repteis['ano_determinacao'] = repteis['DataDaDeterminacao'].apply(catch_year)
repteis['ano_coleta'] = repteis['DataColetaInicial'].apply(catch_year)
repteis['ano_entrada'] = repteis['DataDeEntrada'].apply(catch_year)

<br>

<font size=5>**Paleta de cores por Ordem**</font>

Abaixo está a imagem usada como inspiração (https://color.adobe.com/create/image)

<img src="./paleta_cores.jpeg" width='500px'>

Cores: 

- verde_escuro: #284021
- verde_claro: #88BF11
- amarelo: #D9CB0B
- laranja: #D99311
- laranja_escuro: #BF4417
- marrom-_laro: #BF8D7A

In [151]:
cores_ordem = {
    'Squamata': '#BF4417',
    'Testudines': '#D9CB0B', 
    'Crocodylia': '#284021',
    'Caudata': '#BF8D7A'
}

In [152]:
ordens = list(cores_ordem.keys())
cores = list(cores_ordem.values())

<br>

---

## Graphs

<font color='red'>**p.s.:** All the main charts were moved to individual notebooks. I've kept these two below only in case we need them.</font>

### Total amount of catalogations per year

x: Start Year (from Start Date)
y: number of catalogations per year

In [153]:
# counting catalog. per year
teste = repteis['ano_coleta'].value_counts()
teste = teste.reset_index().rename(columns={'index':'year', 'ano_coleta':'counts'})

In [154]:
# adjusting columns for graphs
teste['year'] = teste['year'].apply(lambda x:str(x).split('/')[0].split('.')[0]).astype(int)
teste = teste.groupby('year').sum().reset_index() # soma do total de bichos coletados por ano

In [155]:
# min e max para eixo X (year)
min_x = teste['year'].min()
max_x = teste['year'].max()

In [156]:
temp = alt.Chart(data= teste, width=800, title= 'Number of collected animals per year').mark_bar().encode(
    x= alt.X('year', type='ordinal', title='Ano de Coleta'),
    y= alt.Y('counts', type='quantitative', title='Contagem')
)

# temp.save('./graphs/coletas_por_ano.html')
temp

<br>

---

### Types (*per year*) <font color='red'>Coluna Tipos n tá preenchida. Trocando por Espécie...</font>

#### adjusting columns `determined_year` e `cataloged_year` to ```*int*``` format 

In [157]:
def str_with_nan2int(string):
    if not np.isnan(string):
        return int(string)
    else:
        return np.NAN

In [158]:
NewTable['ano_determinacao'] = NewTable['ano_determinacao'].apply(str_with_nan2int) #has NaN
NewTable['ano_coleta'] = NewTable['ano_coleta'].apply(str_with_nan2int) #has NaN
NewTable['ano_entrada'] = NewTable['ano_entrada'].apply(str_with_nan2int) #has NaN

#### removing duplicated rows as we took a subset of the main database

In [159]:
teste1 = NewTable.drop_duplicates().copy()  # removes duplicated rows (with same values in ALL columns)
print('Duplicated registers:',NewTable.shape[0] - NewTable.drop_duplicates().shape[0])
# teste1.head(2)

Duplicated registers: 13835


### Altitude per family

In [160]:
# p.s.: I'm grouping again because, previously, there were other charts in this notebook. 
# Now, I'm only keeping what's important for this chart (in case we need to get back to it)

# p.s.: try changing year column in groupby to start_year
# type_data = teste1.groupby(['altitude','familia']).count()['class'].reset_index().rename(
#                                                     columns={'class':'counts'})

# type_data.sort_values(['counts'], inplace=True) # sorting...

In [161]:
# teste = type_data
#teste = NewTable.groupby('altitude').count()['class'].reset_index().rename(columns={'class':'counts'})

In [162]:
teste = NewTable[['altitude','familia','ordem']].copy()

teste['altitude'] = teste['altitude'].str.extract('(\d+)')

# sorting
teste = teste.sort_values(['altitude','familia'])

In [164]:
temp = alt.Chart(teste, title='Altitude per family').mark_circle().encode(
    x = alt.X('familia', type='nominal', title='Family'),
    y = alt.Y('altitude', type='quantitative', title='Altitude (in meters)'),
    color= alt.Color('ordem', scale=alt.Scale(domain=ordens, range=cores)),
    tooltip = alt.Tooltip(['ordem', 'familia', 'altitude'])
)

# temp.save('./graphs/altitude_per_family.html')
temp

<br>

## Altitude per genus and species (actual)

In [165]:
teste = NewTable[['altitude','genero_e_especie_atual','ordem']].copy()

teste['altitude'] = teste['altitude'].str.extract('(\d+)')

# sorting
teste = teste.sort_values(['altitude','genero_e_especie_atual'])

<font color='red' size='5'>Ajustes na coluna ordem </font>

In [166]:
teste['ordem'].value_counts()

Squamata      21992
Testudines      844
Crocodylia      257
Nan              21
#n/d              2
Caudata           1
Squamta           1
Squama            1
Name: ordem, dtype: int64

In [167]:
# corrige Squama e Squamta
def corrige_squamata(string):
    if str(string).lower() == 'squama' or str(string).lower() == 'squamta':
        return 'Squamata'
    else:
        return str(string)
    
# corrige de #n/d para 'Nan'
def corrige_nd(string):
    if str(string) == "#n/d":
        return np.NAN
    elif str(string).lower() == 'nan':
        return np.NAN
    else:
        return string

In [168]:
teste['ordem'] = teste['ordem'].apply(corrige_squamata)
teste['ordem'] = teste['ordem'].apply(corrige_nd)

In [169]:
# restaram 4 ordens (exatamente o que queríamos!)
teste['ordem'].value_counts(dropna=False)

Squamata      21994
Testudines      844
Crocodylia      257
NaN              23
Caudata           1
Name: ordem, dtype: int64

Fazendo gráfico da altitude facetado por ordem

<font color='red' size='4'>**OBS:** Estou removendo NaN's - i.e., generos e espécies que não têm altitude registrada </font>

In [170]:
teste = teste.dropna(axis=0, subset=['altitude'])

In [175]:
# ordering x-axis per mean altitude - OUTLIER: ordem nula
temp = alt.Chart(teste[(teste['altitude'] != '7468') & (~teste['ordem'].isna())], title='Altitude per genus and species').mark_circle().encode(
    x = alt.X('genero_e_especie_atual', type='nominal', title='Genus and species (actual)',
             sort=alt.EncodingSortField('altitude', op="mean", order="ascending")),
    y = alt.Y('altitude', type='quantitative', title='Altitude (in meters)'),
    color = alt.Color('ordem', scale= alt.Scale(domain=ordens, range=cores)),
    tooltip = alt.Tooltip(['ordem', 'genero_e_especie_atual', 'altitude'])
)

# temp.save('./graphs/altitude/altitude_per_genus_and_species-facetado.html')

temp.facet(row="ordem:N")

<br>

<font color='red' size=4>Separando grupos de "maior variância" - mais variabilidade ou contagem de pontos para um mesmo gênero/espécie </font>

In [176]:
teste.head()

Unnamed: 0,altitude,genero_e_especie_atual,ordem
13206,0,Hemidactylus mabouia,Squamata
13207,0,Philodryas patagoniensis,Squamata
13352,1,Hemidactylus mabouia,Squamata
13351,1,Tropidurus torquatus,Squamata
22111,10,Ecpleopus gaudichaudii,Squamata


In [177]:
sort = teste.groupby('genero_e_especie_atual').count()['ordem'].reset_index().rename(columns={'ordem':'counts'})

In [178]:
threshold = 2

# mais variabilidade (threshold: counts >=5)
grupo1 = sort[sort['counts'] > threshold]['genero_e_especie_atual']

# menos variabilidade
grupo2 = sort[sort['counts'] <= threshold]['genero_e_especie_atual']

In [179]:
teste['altitude'] = teste['altitude'].astype(int)

In [181]:
# ordering x-axis per mean altitude
temp = alt.Chart(teste[(teste['genero_e_especie_atual'].isin(grupo1)) & (teste['altitude'] != 7468)], 
                 title='Altitude per genus and species',
                 width=900).mark_circle().encode(
    x = alt.X('genero_e_especie_atual', type='nominal', title='Genus and species (actual)',
             sort=alt.EncodingSortField('altitude', op='max', order="ascending")),
    y = alt.Y('altitude', type='quantitative', title='Altitude (in meters)'),
    color = alt.Color('ordem', scale= alt.Scale(domain=ordens, range=cores)),
    tooltip = alt.Tooltip(['ordem', 'genero_e_especie_atual', 'altitude'])
)

temp.save('./graphs/altitude/altitude_per_genus_and_species-maior-var.html')
temp

In [183]:
# ordering x-axis per mean altitude
temp = alt.Chart(teste[(teste['genero_e_especie_atual'].isin(grupo2)) & (teste['altitude'] != 7468)], title='Altitude per genus and species',
                width=900).mark_circle().encode(
    x = alt.X('genero_e_especie_atual', type='nominal', title='Genus and species (actual)',
             sort=alt.EncodingSortField('altitude', op="max", order="ascending")),
    y = alt.Y('altitude', type='quantitative', title='Altitude (in meters)'),
    color = alt.Color('ordem', scale= alt.Scale(domain=ordens, range=cores)),
    tooltip = alt.Tooltip(['ordem', 'genero_e_especie_atual', 'altitude'])
)

# temp.save('./graphs/altitude/altitude_per_genus_and_species-menor-var.html')
temp

<br>

## Altitude per genus

In [184]:
teste = NewTable[['altitude','genero_atual','ordem']].copy()

teste['altitude'] = teste['altitude'].str.extract('(\d+)')

# sorting
teste = teste.sort_values(['altitude','genero_atual'])

In [191]:
teste = teste.dropna(subset=['altitude'])
teste['altitude'] = teste['altitude'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  teste['altitude'] = teste['altitude'].astype(int)


In [194]:
# ordering x-axis per mean altitude
temp = alt.Chart(teste, title='Altitude per genus',
                width= 900).mark_circle().encode(
    x = alt.X('genero_atual', type='nominal', title='Genus (actual)',
             sort=alt.EncodingSortField('altitude', op="max", order="ascending")),
    y = alt.Y('altitude', type='quantitative', title='Altitude (in meters)'),
    color = alt.Color('ordem', scale= alt.Scale(domain=ordens, range=cores)),
    tooltip = alt.Tooltip(['ordem', 'genero_atual', 'altitude'])
)

# temp.save('./graphs/altitude/altitude_per_genus.html')
temp

<br>

**Thats it!**

-----