# Pivotamiento en tablas y tablas de frecuencia


### Tablas pivotales
El pivotamiento en una tabla es la agregación de valores en función de claves aplicando cierta función sobre filas y columnas. Es el agrupamiento destinado a visualización de información relacionada entre filas y columnas clave.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
titanic = sns.load_dataset('titanic')

In [2]:
titanic.shape
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


Si se quiere hacer un análisis sobre la supervivencia de los pasajeros del Titanic referido al género, se crea una tabla pivotal de la columna `'survived'` respecto a `'sex' `, con la función de agregación **cuenta y media**:

In [3]:
titanic.groupby('sex')[['survived']].agg(['count','mean'])

Unnamed: 0_level_0,survived,survived
Unnamed: 0_level_1,count,mean
sex,Unnamed: 1_level_2,Unnamed: 2_level_2
female,314,0.742038
male,577,0.188908


Si queremos observar la combinación de clase y género para la supervivencia:

In [4]:
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


Pandas incorpora una función que realiza esta agregación automáticamente en el método `pivot_table`. La función de agregación por defecto es la media. Sus parámetros son:

    - data : DataFrame
    values : column to aggregate, optional
    - index : column, Grouper, array, or list of the previous
        If an array is passed, it must be the same length as the data. The list
        can contain any of the other types (except list).
        Keys to group by on the pivot table index.  If an array is passed, it
        is being used as the same manner as column values.
    - columns : column, Grouper, array, or list of the previous
        If an array is passed, it must be the same length as the data. The list
        can contain any of the other types (except list).
        Keys to group by on the pivot table column.  If an array is passed, it
        is being used as the same manner as column values.
    - aggfunc : function or list of functions, default numpy.mean
        If list of functions passed, the resulting pivot table will have
        hierarchical columns whose top level are the function names (inferred
        from the function objects themselves)
    - fill_value : scalar, default None
        Value to replace missing values with
    - margins : boolean, default False
        Add all row / columns (e.g. for subtotal / grand totals)
    - dropna : boolean, default True
        Do not include columns whose entries are all NaN
    - margins_name : string, default 'All'
        Name of the row / column that will contain the totals
        when margins is True.



In [5]:
titanic.pivot_table('survived', index='sex', columns='class')

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [6]:
#cambiamos la función de agregación y ponemos más funciones en índices
titanic.pivot_table('survived', index=['sex','alone'], columns='class',aggfunc='count')

Unnamed: 0_level_0,class,First,Second,Third
sex,alone,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,False,60,44,84
female,True,34,32,60
male,False,47,36,83
male,True,75,72,264


In [7]:
#también la podemos visualizar al revés
titanic.pivot_table('survived', index='class', columns=['sex','alone'],aggfunc='count')

sex,female,female,male,male
alone,False,True,False,True
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
First,60,34,47,75
Second,44,32,36,72
Third,84,60,83,264


Se pueden crear *bins* de la edad para agrupar y observar en función de esta

In [8]:
age = pd.cut(titanic['age'], [0, 18, 80])
age.head()

0    (18, 80]
1    (18, 80]
2    (18, 80]
3    (18, 80]
4    (18, 80]
Name: age, dtype: category
Categories (2, object): [(0, 18] < (18, 80]]

In [9]:
titanic.pivot_table('survived', ['sex', age], 'class')

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 80]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 80]",0.375,0.071429,0.133663


Con la función `qcut` se puede cortar por cuantiles una variable contínua creando bins discretos y agregar respecto a esta:

In [10]:
fare = pd.qcut(titanic['fare'], 2)
fare.head()

0          [0, 14.454]
1    (14.454, 512.329]
2          [0, 14.454]
3    (14.454, 512.329]
4          [0, 14.454]
Name: fare, dtype: category
Categories (2, object): [[0, 14.454] < (14.454, 512.329]]

In [11]:
#si hay valores faltantes devuelve NaN en la solución. Se pueden rellenar estos con un valor
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])

Unnamed: 0_level_0,fare,"[0, 14.454]","[0, 14.454]","[0, 14.454]","(14.454, 512.329]","(14.454, 512.329]","(14.454, 512.329]"
Unnamed: 0_level_1,class,First,Second,Third,First,Second,Third
sex,age,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,"(0, 18]",,1.0,0.714286,0.909091,1.0,0.318182
female,"(18, 80]",,0.88,0.444444,0.972973,0.914286,0.391304
male,"(0, 18]",,0.0,0.26087,0.8,0.818182,0.178571
male,"(18, 80]",0.0,0.098039,0.125,0.391304,0.030303,0.192308


Se pueden aplicar funciones de agregación diferentes por columnas:

In [12]:
titanic.pivot_table(index='sex', columns='class',
                    aggfunc={'survived':np.sum, 'fare':'mean'})

Unnamed: 0_level_0,survived,survived,survived,fare,fare,fare
class,First,Second,Third,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,91.0,70.0,72.0,106.125798,21.970121,16.11881
male,45.0,17.0,47.0,67.226127,19.741782,12.661633


La opción `margins` en las tablas pivotales calcula los porcentajes parciales ó marginales por grupos

In [13]:
titanic.pivot_table('survived', index='sex', columns='class')

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [14]:
titanic.pivot_table('survived', index='sex', columns='class', margins=True)

class,First,Second,Third,All
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,0.968085,0.921053,0.5,0.742038
male,0.368852,0.157407,0.135447,0.188908
All,0.62963,0.472826,0.242363,0.383838


La información que aparece en los marginales de 'class'  y 'sex' es:

In [15]:
titanic.groupby('class')['survived'].mean()

class
First     0.629630
Second    0.472826
Third     0.242363
Name: survived, dtype: float64

In [16]:
titanic.groupby('sex')['survived'].mean()

sex
female    0.742038
male      0.188908
Name: survived, dtype: float64

### Tablas de frecuencia

Se puede contar el número de elementos de cada tipo generando una tabla de frecuencias. Esto se conoce como *Cross-Tabulation* y se puede hacer con la función `crosstab`.

In [31]:
print(titanic.shape)
titanic[['sex','survived']].head(15).T

(891, 15)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
sex,male,female,female,female,male,male,male,male,female,female,female,female,male,male,female
survived,0,1,1,1,0,0,0,0,1,1,1,1,0,0,0


In [32]:
pd.crosstab(titanic.sex, titanic.survived, margins=True)

survived,0,1,All
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,81,233,314
male,468,109,577
All,549,342,891


In [36]:
#si normalizamos se divide entre el número total de instancias
pd.crosstab(titanic.sex, titanic.survived, margins=True,normalize=True)

survived,0,1,All
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.090909,0.261504,0.352413
male,0.525253,0.122334,0.647587
All,0.616162,0.383838,1.0


In [39]:
pd.crosstab([titanic.sex,titanic.embark_town], titanic.survived, margins=True)

Unnamed: 0_level_0,survived,0,1,All
sex,embark_town,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,Cherbourg,9,64,73
female,Queenstown,9,27,36
female,Southampton,63,140,203
male,Cherbourg,66,29,95
male,Queenstown,38,3,41
male,Southampton,364,77,441
All,,549,340,889
