# Projeto de Gráficos


## Sobre o projeto:

A equipe decidiu utlizar dados sobre migração, para demonstrar em um gráfico do tipo `Chord`.

## Base de Dados

Base de dados utilizada disponpivel em:

https://databank.worldbank.org/source/global-bilateral-migration

## Biblioteca utilizada

Utilizou-se a biblioteca `holoviews`

http://holoviews.org/reference/elements/bokeh/Chord.html



In [1]:
import pandas as pd
import holoviews as hv
from holoviews import opts, dim
from bokeh.sampledata.les_mis import data

In [2]:
hv.extension('bokeh')
hv.output(size=200)

## Definir países que serão utilizados no gráfico

Optou-se pelos países que fazem fronteira com o Brasil

In [3]:
fronteira_br = ["Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Guyana", "Paraguay", "Peru", "Suriname", "Uruguay", "Venezuela", "French Guiana"]
fronteira_br

['Argentina',
 'Bolivia',
 'Brazil',
 'Chile',
 'Colombia',
 'Ecuador',
 'Guyana',
 'Paraguay',
 'Peru',
 'Suriname',
 'Uruguay',
 'Venezuela',
 'French Guiana']

#### Venezuela não consta na base de dados latin_american_imigration_data.csv

In [4]:
fronteira_br.remove('Venezuela')
fronteira_br

['Argentina',
 'Bolivia',
 'Brazil',
 'Chile',
 'Colombia',
 'Ecuador',
 'Guyana',
 'Paraguay',
 'Peru',
 'Suriname',
 'Uruguay',
 'French Guiana']

## Tratamento dos dados

In [5]:
df = pd.read_csv('latin_american_immigration_data.csv')
df = df[(df['Country Origin Name'].isin(fronteira_br)) & (df['Country Dest Name'].isin(fronteira_br))]
df.head()

Unnamed: 0,Country Origin Name,Country Origin Code,Migration by Gender Name,Migration by Gender Code,Country Dest Name,Country Dest Code,1960 [1960],1970 [1970],1980 [1980],1990 [1990],2000 [2000]
278,Argentina,ARG,Female,FEM,Argentina,ARG,0.0,0.0,0.0,0.0,0.0
283,Argentina,ARG,Female,FEM,Bolivia,BOL,1988.0,4957.0,7356.0,8953.0,13804.0
284,Argentina,ARG,Female,FEM,Brazil,BRA,8994.0,8860.0,13111.0,13084.0,11974.0
286,Argentina,ARG,Female,FEM,Chile,CHL,6122.0,6728.0,9869.0,16423.0,22434.0
287,Argentina,ARG,Female,FEM,Colombia,COL,384.0,630.0,791.0,989.0,1175.0


In [6]:
fronteira_br_codes = {country: idx for idx, country in enumerate(fronteira_br)}
fronteira_br_codes

{'Argentina': 0,
 'Bolivia': 1,
 'Brazil': 2,
 'Chile': 3,
 'Colombia': 4,
 'Ecuador': 5,
 'Guyana': 6,
 'Paraguay': 7,
 'Peru': 8,
 'Suriname': 9,
 'Uruguay': 10,
 'French Guiana': 11}

In [7]:
def to_apply(country):
    return fronteira_br_codes[country]

In [8]:
df = df[['Country Origin Name', 'Country Dest Name', '1960 [1960]']]
df['Country Origin Name'] = df['Country Origin Name'].apply(to_apply)
df['Country Dest Name'] = df['Country Dest Name'].apply(to_apply)
df.rename(columns={
  'Country Origin Name': 'source',
  'Country Dest Name': 'target',
  '1960 [1960]': 'value',
}, inplace=True)
df.head()

Unnamed: 0,source,target,value
278,0,0,0.0
283,0,1,1988.0
284,0,2,8994.0
286,0,3,6122.0
287,0,4,384.0


In [9]:
df.isna().sum()
df = df.convert_dtypes(convert_integer=True)
df.dtypes

source    Int64
target    Int64
value     Int64
dtype: object

## Renderização do Gráfico

In [14]:
df_fronteira_br = pd.DataFrame(fronteira_br_codes.keys(), index = fronteira_br_codes.values(), columns=["Paises"])
df_fronteira_br = hv.Dataset(df_fronteira_br, 'index')
df_fronteira_br.data.count()

index     12
Paises    12
dtype: int64

In [16]:
chord = hv.Chord((df, df_fronteira_br))
chord.opts(
    opts.Chord(cmap='Category10', edge_cmap='Category10', edge_color=dim('source').str(), 
               labels='Paises', node_color=dim('index').str())
)