# **Gráficos con Altair de Diana Chacón Ocariz**
## Gráficos básicos

**Información importante**

* Documentación oficial de Altair: https://altair-viz.github.io/index.html

* Espacio de trabajo de Notion de la clase: https://bit.ly/tacosdedatos-notion

* Donde encontrar este _binder_: https://bit.ly.com/dataviz-demo


## Por explorar hoy
* La manera de codificar (_encoding_) datos en Altair
* Objetos en Altair (ejes, escalas, etc)
* Configuraciones 

In [2]:
import altair as alt # importar altair con el alias alt
import pandas as pd 
from vega_datasets import data as vega_data
data = vega_data.gapminder()

In [3]:
data.head()

Unnamed: 0,year,country,cluster,pop,life_expect,fertility
0,1955,Afghanistan,0,8891209,30.332,7.7
1,1960,Afghanistan,0,9829450,31.997,7.7
2,1965,Afghanistan,0,10997885,34.02,7.7
3,1970,Afghanistan,0,12430623,36.088,7.7
4,1975,Afghanistan,0,14132019,38.438,7.7


In [4]:
data.shape

(693, 6)

In [5]:
data.country.unique()

array(['Afghanistan', 'Argentina', 'Aruba', 'Australia', 'Austria',
       'Bahamas', 'Bangladesh', 'Barbados', 'Belgium', 'Bolivia',
       'Brazil', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica',
       'Croatia', 'Cuba', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Finland', 'France', 'Georgia', 'Germany', 'Greece',
       'Grenada', 'Haiti', 'Hong Kong', 'Iceland', 'India', 'Indonesia',
       'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan',
       'Kenya', 'South Korea', 'North Korea', 'Lebanon', 'Mexico',
       'Netherlands', 'New Zealand', 'Nigeria', 'Norway', 'Pakistan',
       'Peru', 'Philippines', 'Poland', 'Portugal', 'Rwanda',
       'Saudi Arabia', 'South Africa', 'Spain', 'Switzerland', 'Turkey',
       'United Kingdom', 'United States', 'Venezuela'], dtype=object)

## Barras

Quiero construir un gráfico de barras que muestre la población de un país desde 1955 a lo más reciente. 


In [122]:
mis_paises = 'Chile, Colombia, Francia, España y Venezuela'
paises = ['Venezuela', 'Colombia', 'France', 'Chile', 'Spain']

In [123]:
datos_paises = data[data['country'].isin(paises)]

datos_paises

Unnamed: 0,year,country,cluster,pop,life_expect,fertility
132,1955,Chile,3,6743269,56.074,5.486
133,1960,Chile,3,7585349,57.924,5.4385
134,1965,Chile,3,8509950,60.523,4.4405
135,1970,Chile,3,9368558,63.441,3.63
136,1975,Chile,3,10251542,67.052,2.803
137,1980,Chile,3,11093718,70.565,2.671
138,1985,Chile,3,12066701,72.492,2.65
139,1990,Chile,3,13127760,74.126,2.55
140,1995,Chile,3,14205449,75.816,2.21
141,2000,Chile,3,15153450,77.86,2.0


In [124]:
alt.Chart(datos_paises).mark_bar().encode(
    x = 'year',
    y = 'pop',
)

### Barras+



In [142]:
# Definir colores específicos para cada país
# Ver aquí paletas de colores https://coolors.co/palettes/trending
paises = ['Chile', 'Colombia', 'France', 'Spain', 'Venezuela']
colores = ['#ffbe0b', '#fb5607', '#ff006e', '#8338ec','#3a86ff']

In [143]:
# Parámetros generales

# Configuración ejes x e y
x = alt.X('year:O', title='', axis = alt.Axis(labelAngle=30, labelFontSize=12))
y = alt.Y('pop:Q', title = 'Población', axis = alt.Axis(format = ",.2s", grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10))


tooltip = [alt.Tooltip('country', title='País'), alt.Tooltip('pop', title='Población', format = ",.5s"), alt.Tooltip('year', title='Año')]

In [147]:
# Definimos la selección para hacer el gráfico interactivo
selection_country = alt.selection_multi(fields=['country'], empty='none', on='mouseover')

# El color por defecto es gris. Cuando se selecciona un país, 
# la parte del gráfico se colorea con el color del país.
# Se pueden seleccionar varios paises con shift + click
# Esto es posible gracias a selection_multi
# selection_single solo permite seleccionar un país
color = alt.condition(selection_country,
                    alt.Color('country:N', scale=alt.Scale(domain=paises, range=colores)), 
                    alt.value('lightgray'),) 


grafico = alt.Chart(datos_paises).mark_bar(cornerRadiusTopLeft=5, cornerRadiusTopRight=5,
                                ).encode(
    x = x,
    y = y,
    # No debe usarse fill (color de relleno)
    color = color,
    tooltip = tooltip,
).properties(
    title = f"Población de {mis_paises} de 1955 a 2005",
    width = 600,
).configure_title(
    fontSize = 16,
    anchor = 'middle',
    color = '#03045e',
).configure_view(
    strokeWidth=1
).add_selection(
    selection_country
)

grafico.save('poblacion.html')

In [134]:
color = alt.Color('country:N', scale=alt.Scale(domain=paises, range=colores))

barras = alt.Chart(datos_paises).mark_bar(cornerRadiusTopLeft=5, cornerRadiusTopRight=5,
    ).encode(
    x = x,
    y = y,
    # No debe usarse fill (color de relleno)
    color = color, 
    tooltip = tooltip,
).properties(
    title = f"Población de {mis_paises} de 1955 a 2005",
    width = 600,
)

textos = barras.mark_text(dy=-50, fill='black').encode(
    text = alt.Text('pop:Q', format=',.5s', title = 'Población'),
)

grafico_final = barras + textos

grafico_final.configure_title(
    fontSize = 16,
    anchor = 'middle',
    color = '#03045e',
).configure_view(
    strokeWidth=1
)


In [135]:
# Configuración ejes x e y
x = alt.X('year:O', title='', axis = alt.Axis(labelAngle=90, labelFontSize=12))
y = alt.Y('pop:Q', title = 'Población', axis = alt.Axis(format = ",.2s", grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10))

color = alt.Color('country:N', scale=alt.Scale(domain=paises, range=colores))

alt.Chart(datos_paises).mark_bar().encode(
    x = x,
    y = y,
    # Permite generar un gráfico por cada valor de columna
    column = alt.Column('country:N', title=''),
    # No debe usarse fill (color de relleno)
    color = color,
    tooltip = tooltip,
).properties(
    title = f"Población de {mis_paises} de 1955 a 2005",
    width = 150,
).configure_title(
    fontSize = 16,
    anchor = 'middle',
    color = '#03045e',
).configure_view(
    strokeWidth=1
)

## Líneas

In [136]:
alt.Chart(datos_paises).mark_line(point = True).encode(
    x = 'year',
    y = 'fertility',
)

### Líneas+

In [141]:
# Se trata de un gráfico de capas
# https://altair-viz.github.io/user_guide/compound_charts.html

# Definimos la selección para hacer el gráfico interactivo
# Cuando el mouse se acerca a la línea de un país, 
# esta aumenta de grosor
highlight = alt.selection(type='single', on='mouseover',
                          fields=['country'], nearest=True)


base = alt.Chart(datos_paises).mark_line().encode(
    x = alt.X('year:O', title='', axis = alt.Axis(labelAngle=0, labelFontSize=12)),
    y = alt.Y('life_expect:Q', title = 'Esperanza de vida', scale=alt.Scale(zero=False), 
              axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
    color = alt.Color('country:N', scale=alt.Scale(domain=paises, range=colores)), 
    tooltip = [alt.Tooltip('country', title='País'), 
               alt.Tooltip('life_expect', title='Esperanza de vida'), 
               alt.Tooltip('year', title='Año')]
)


# opacity=alt.value(0.5) permite que se muestren los puntos 
# opacity=alt.value(0) muestra líneas contínuas 
points = base.mark_circle().encode(
    opacity=alt.value(0.5)
).add_selection(
    highlight
).properties(
    title = f"Esperanza de vida en {mis_paises} de 1955 a 2005",
    width=600
)

# Permite cambiar el aspecto de las líneas cuando el ratón se acerca
# Se utiliza la negación de la selección ~highlight para evitar
# que el gráfico tenga las líneas resaltadas al cargar
lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3))
)

# Para los gráficos multicapas, pueden utilizarse
# cualquiera de las 2 sintaxis (+ y alt.layer)

#points + lines

grafico = alt.layer(
  points,
  lines
).interactive()

grafico.save('grafico.html')

## Diagrama de dispersión

In [15]:
cars = vega_data.cars()
cars.sample(20)

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
198,plymouth valiant,22.0,6,225.0,100.0,3233,15.4,1976-01-01,USA
386,plymouth horizon miser,38.0,4,105.0,63.0,2125,14.7,1982-01-01,USA
23,ford maverick,21.0,6,200.0,85.0,2587,16.0,1970-01-01,USA
271,ford futura,18.1,8,302.0,139.0,3205,11.2,1978-01-01,USA
102,buick electra 225 custom,12.0,8,455.0,225.0,4951,11.0,1973-01-01,USA
363,toyota corolla,32.4,4,108.0,75.0,2350,16.8,1982-01-01,Japan
382,amc concord dl,23.0,4,151.0,,3035,20.5,1982-01-01,USA
178,toyota corona,24.0,4,134.0,96.0,2702,13.5,1975-01-01,Japan
136,datsun b210,31.0,4,79.0,67.0,1950,19.0,1974-01-01,Japan
390,toyota corolla,34.0,4,108.0,70.0,2245,16.9,1982-01-01,Japan


In [16]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 406 entries, 0 to 405
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Name              406 non-null    object        
 1   Miles_per_Gallon  398 non-null    float64       
 2   Cylinders         406 non-null    int64         
 3   Displacement      406 non-null    float64       
 4   Horsepower        400 non-null    float64       
 5   Weight_in_lbs     406 non-null    int64         
 6   Acceleration      406 non-null    float64       
 7   Year              406 non-null    datetime64[ns]
 8   Origin            406 non-null    object        
dtypes: datetime64[ns](1), float64(4), int64(2), object(2)
memory usage: 28.7+ KB


In [17]:
alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon'
)

### Diagrama de dispersión+

In [18]:
# Definir colores específicos para cada año
origin = ['USA', 'Europe', 'Japan']
colors = ['#2ec4b6', '#e71d36', '#ff9f1c']


In [116]:
alt.Chart(cars).mark_point().encode(
    x = alt.X('Horsepower:Q', title='Potencia HP', axis = alt.Axis(labelAngle=30, labelFontSize=12)),
    y = alt.Y('Miles_per_Gallon:Q', title = 'Millas por galón', axis = alt.Axis(grid=True, titleAnchor='middle', titleAngle = 270, labelFontSize=10)),
    color = alt.Color('Origin:N', scale=alt.Scale(domain=origin, range=colors)), 
    tooltip = [alt.Tooltip('Name', title='Nombre'), 
               alt.Tooltip('Miles_per_Gallon', title='Millas por galón'), 
               alt.Tooltip('Horsepower', title='Caballos de fuerza'), 
               alt.Tooltip('Origin', title='Origen'), 
               alt.Tooltip('Year', title='Año')]
)

### Mapa

In [20]:
vega_data.list_datasets()

['7zip',
 'airports',
 'annual-precip',
 'anscombe',
 'barley',
 'birdstrikes',
 'budget',
 'budgets',
 'burtin',
 'cars',
 'climate',
 'co2-concentration',
 'countries',
 'crimea',
 'disasters',
 'driving',
 'earthquakes',
 'ffox',
 'flare',
 'flare-dependencies',
 'flights-10k',
 'flights-200k',
 'flights-20k',
 'flights-2k',
 'flights-3m',
 'flights-5k',
 'flights-airport',
 'gapminder',
 'gapminder-health-income',
 'gimp',
 'github',
 'graticule',
 'income',
 'iowa-electricity',
 'iris',
 'jobs',
 'la-riots',
 'londonBoroughs',
 'londonCentroids',
 'londonTubeLines',
 'lookup_groups',
 'lookup_people',
 'miserables',
 'monarchs',
 'movies',
 'normal-2d',
 'obesity',
 'ohlc',
 'points',
 'population',
 'population_engineers_hurricanes',
 'seattle-temps',
 'seattle-weather',
 'sf-temps',
 'sp500',
 'stocks',
 'udistrict',
 'unemployment',
 'unemployment-across-industries',
 'uniform-2d',
 'us-10m',
 'us-employment',
 'us-state-capitals',
 'volcano',
 'weather',
 'weball26',
 'wheat',

In [21]:
mapa = vega_data.airports()
mapa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3376 entries, 0 to 3375
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   iata       3376 non-null   object 
 1   name       3376 non-null   object 
 2   city       3364 non-null   object 
 3   state      3364 non-null   object 
 4   country    3376 non-null   object 
 5   latitude   3376 non-null   float64
 6   longitude  3376 non-null   float64
dtypes: float64(2), object(5)
memory usage: 184.8+ KB


In [22]:
mapa.sample(20)

Unnamed: 0,iata,name,city,state,country,latitude,longitude
1155,CNO,Chino,Chino,CA,USA,33.974694,-117.636611
2520,OOA,Oskaloosa Municipal,Oskaloosa,IA,USA,41.226149,-92.493883
1495,FCI,Chesterfield County,Richmond,VA,USA,37.406537,-77.524987
1141,CLM,William R Fairchild Intl,Port Angeles,WA,USA,48.120194,-123.499694
3310,WWR,West Woodward,Woodward,OK,USA,36.436703,-99.520997
1106,CGS,College Park,College Park,MD,USA,38.980583,-76.922306
3353,Y93,Atlanta Municipal,Atlanta,MI,USA,45.000008,-84.133337
915,B19,Biddeford Municipal,Biddeford,ME,USA,43.464111,-70.472389
2757,RDK,Red Oak Municipal,Red Oak,IA,USA,41.010528,-95.259861
2122,LWL,Wells Municipal/Harriet,Wells,NV,USA,41.118533,-114.922266


### Mapa+

In [23]:
airports = vega_data.airports.url
states = alt.topo_feature(vega_data.us_10m.url, feature='states')

# US states background
background = alt.Chart(states).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    width=500,
    height=300
).project('albersUsa')

# airport positions on background
points = alt.Chart(airports).transform_aggregate(
    latitude='mean(latitude)',
    longitude='mean(longitude)',
    count='count()',
    groupby=['state']
).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('count:Q', title='Number of Airports'),
    color=alt.value('steelblue'),
    tooltip=['state:N','count:Q']
).properties(
    title='Number of airports in US'
)

background + points