<a href="https://colab.research.google.com/github/cristiandarioortegayubro/BDS/blob/main/pandas/bds_pandas_005_00.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
<img src="https://github.com/cristiandarioortegayubro/BDS/blob/main/images/Logo%20Pandas.png?raw=true">
</p>


 # **<font color="DeepPink">Agregación de datos y operaciones de grupo 🐼</font>**

<p align="justify">
👀 Categorizar un conjunto de datos y aplicar una función a cada grupo, ya sea con una agregación o con una transformación, puede ser un componente crítico de un flujo de trabajo de análisis de datos. Después de cargar, fusionar y preparar un conjunto de datos, es posible que se deba calcular estadísticas generar tablas dinámicas para los informes o las visualizaciones.
<br><br>

</p>

<p align="justify"> 👀 Por convención, así se importa <code>Pandas</code>:  </p>

In [None]:
import numpy as np
import pandas as pd

 # **<font color="DeepPink">Obtención de datos y creación del DataFrame</font>**

<p align="justify">
👀 Vamos a trabajar con un conjunto de datos correspondiente a un censo. La extracción de la información fue realizada por Barry Becker de la base de datos del censo de 1994 de Estados Unidos. Se extrajo un conjunto de registros razonablemente limpios, no del todo.
<br><br>
Este conjunto de datos se utiliza para generar un modelo de predicción que pueda determinar si una persona gana más de 50K al año, en virtud de sus caracteristicas. El área de estudio es social económica, y a continuación se hace una descripción de cada columna del conjunto de datos.
</p>

Las columnas son:

- Edad - Age
- Clase de Trabajo - Workclass
- Educacion - Education
- Educacion numerica - Education num
- Estado civil - Marital status
- Ocupacion - Occupation
- Relacion - Relationship
- Raza - Race
- Sexo - Sex
- Ganancia de capital - Capital gain
- Perdida de capital - Capital loss
- Horas por semana - Hours per week
- Pais nativo - Native country
- Clase - Class

In [None]:
adult_census = pd.read_csv("https://raw.githubusercontent.com/cristiandarioortegayubro/BDS/main/datasets/adult_census.csv")

 ## **<font color="DeepPink">Propiedades del DataFrame</font>**

<p align="justify"> 👀 Visualizamos nuestro <code>DataFrame</code>:  </p>

In [None]:
adult_census

Unnamed: 0,age,workclass,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
0,25,Private,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48837,27,Private,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
48838,40,Private,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
48839,58,Private,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
48840,22,Private,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K


 ### **<font color="DeepPink">Propiedad axes</font>**

<p align="justify"> 👀 Ejes del <code>DataFrame</code>...  </p>

In [None]:
adult_census.axes

[RangeIndex(start=0, stop=48842, step=1),
 Index(['age', 'workclass', 'education', 'education-num', 'marital-status',
        'occupation', 'relationship', 'race', 'sex', 'capital-gain',
        'capital-loss', 'hours-per-week', 'native-country', 'class'],
       dtype='object')]

 ### **<font color="DeepPink">Propiedad dtypes</font>**

<p align="justify"> 👀 Tipos de datos de las columnas del <code>DataFrame</code>...  </p>

In [None]:
adult_census.dtypes

age                int64
workclass         object
education         object
education-num      int64
marital-status    object
occupation        object
relationship      object
race              object
sex               object
capital-gain       int64
capital-loss       int64
hours-per-week     int64
native-country    object
class             object
dtype: object

 ### **<font color="DeepPink">Propiedad ndim</font>**

<p align="justify"> 👀 Dimensiones del <code>DataFrame</code>...  </p>

In [None]:
adult_census.ndim

2

 ### **<font color="DeepPink">Propiedad shape</font>**

<p align="justify"> 👀 Composición de las dimensiones del <code>DataFrame</code>...  </p>

In [None]:
adult_census.shape

(48842, 14)

 ### **<font color="DeepPink">Propiedad size</font>**

<p align="justify"> 👀 Tamaño del <code>DataFrame</code>...  </p>

In [None]:
adult_census.size

683788

In [None]:
(adult_census.shape[0])*(adult_census.shape[1])

683788

 ### **<font color="DeepPink">Propiedad values</font>**

<p align="justify"> 👀 Valores del <code>DataFrame</code> en matriz <code>Numpy</code>  </p>

In [None]:
adult_census.values

array([[25, ' Private', ' 11th', ..., 40, ' United-States', ' <=50K'],
       [38, ' Private', ' HS-grad', ..., 50, ' United-States', ' <=50K'],
       [28, ' Local-gov', ' Assoc-acdm', ..., 40, ' United-States',
        ' >50K'],
       ...,
       [58, ' Private', ' HS-grad', ..., 40, ' United-States', ' <=50K'],
       [22, ' Private', ' HS-grad', ..., 20, ' United-States', ' <=50K'],
       [52, ' Self-emp-inc', ' HS-grad', ..., 40, ' United-States',
        ' >50K']], dtype=object)

 ## **<font color="DeepPink">Métodos usuales del DataFrame</font>**

 ### **<font color="DeepPink">Método describe( )</font>**

<p align="justify"> 👀 Valores numéricos del <code>DataFrame</code>...  </p>

In [None]:
adult_census.describe().round(2).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,48842.0,38.64,13.71,17.0,28.0,37.0,48.0,90.0
education-num,48842.0,10.08,2.57,1.0,9.0,10.0,12.0,16.0
capital-gain,48842.0,1079.07,7452.02,0.0,0.0,0.0,0.0,99999.0
capital-loss,48842.0,87.5,403.0,0.0,0.0,0.0,0.0,4356.0
hours-per-week,48842.0,40.42,12.39,1.0,40.0,40.0,45.0,99.0


<p align="justify"> 👀 Valores categóricos del <code>DataFrame</code>...  </p>

In [None]:
adult_census.describe(include=object).round(2).T

Unnamed: 0,count,unique,top,freq
workclass,48842,9,Private,33906
education,48842,16,HS-grad,15784
marital-status,48842,7,Married-civ-spouse,22379
occupation,48842,15,Prof-specialty,6172
relationship,48842,6,Husband,19716
race,48842,5,White,41762
sex,48842,2,Male,32650
native-country,48842,42,United-States,43832
class,48842,2,<=50K,37155


 ### **<font color="DeepPink">Método info( )</font>**

In [None]:
adult_census.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48842 entries, 0 to 48841
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             48842 non-null  int64 
 1   workclass       48842 non-null  object
 2   education       48842 non-null  object
 3   education-num   48842 non-null  int64 
 4   marital-status  48842 non-null  object
 5   occupation      48842 non-null  object
 6   relationship    48842 non-null  object
 7   race            48842 non-null  object
 8   sex             48842 non-null  object
 9   capital-gain    48842 non-null  int64 
 10  capital-loss    48842 non-null  int64 
 11  hours-per-week  48842 non-null  int64 
 12  native-country  48842 non-null  object
 13  class           48842 non-null  object
dtypes: int64(5), object(9)
memory usage: 5.2+ MB


 # **<font color="DeepPink">Introducción a las agrupaciones</font>**

In [None]:
adult_census.head(3)

Unnamed: 0,age,workclass,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
0,25,Private,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K


<p align="justify"> 👀 Vamos a agrupar el <code>DataFrame</code> por edad...  </p>

In [None]:
edades = adult_census["age"].groupby(adult_census["age"])
edades

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7efe598c5540>

<p align="justify"> 👀 La variable edades es ahora un objeto especial. En realidad, todavía no ha calculado nada. La idea es que este objeto tenga toda la información necesaria para luego aplicar alguna operación..., como por ejemplo, contar la cantidad de edades...  </p>

In [None]:
edades.count()

age
17     595
18     862
19    1053
20    1113
21    1096
      ... 
86       1
87       3
88       6
89       2
90      55
Name: age, Length: 74, dtype: int64

<p align="justify"> 👀 La indexación de un objeto <code>GroupBy</code> creado a partir de un <code>DataFrame</code> con un nombre de columna tiene el efecto de crear un subconjunto para la agregación..., que en este caso es el cálculo de la media de las horas por semana, en virtud de las edades...</p>

In [None]:
adult_census.groupby("age")[["hours-per-week"]].mean(numeric_only=True).round(2)

Unnamed: 0_level_0,hours-per-week
age,Unnamed: 1_level_1
17,21.14
18,25.75
19,30.56
20,32.43
21,34.25
...,...
86,40.00
87,7.00
88,35.83
89,30.00


<p align="justify"> ⬆ Con esos datos se podría armar un histograma...</p>

<p align="justify"> 👀 Otro ejemplo agrupando por tipo de trabajo...</p>

In [None]:
workclass = adult_census.groupby("workclass")[["hours-per-week"]].mean(numeric_only=True).round(2)
workclass

Unnamed: 0_level_0,hours-per-week
workclass,Unnamed: 1_level_1
?,31.81
Federal-gov,41.51
Local-gov,40.85
Never-worked,28.9
Private,40.27
Self-emp-inc,48.57
Self-emp-not-inc,44.4
State-gov,39.09
Without-pay,33.95


 ## **<font color="DeepPink">Agrupando con funciones</font>**

<p align="justify"> 👀 El uso de funciones es una forma genérica de definir una asignación de grupo en comparación con un diccionario o una serie. Cualquier función que se pase como clave de grupo se llamará una vez por valor de índice o una vez por valor de columna...y los valores devueltos se usarán como nombres de grupo...</p>

In [None]:
workclass.groupby(len).mean().round(2)

Unnamed: 0_level_0,hours-per-week
workclass,Unnamed: 1_level_1
2,31.81
8,40.27
10,39.97
12,37.73
13,38.74
17,44.4


 ## **<font color="DeepPink">Agregaciones</font>**

<p align="justify"> 👀 Las agregaciones se refieren a cualquier transformación de datos que produzca valores escalares a partir de matrices... Muchas agregaciones comunes, tienen implementaciones optimizadas. A continuacion, el conjunto de métodos aplicable:</p>

- count( )
- cummin( )
- cummax( )
- cumsum( )
- cumprod( )
- first( )
- last( )
- mean( )
- median( )
- min( )
- max( )
- ohlc( )
- prod( )
- quantile( )
- rank( )
- size( )
- sum( )
- std( )
- var( )

<p align="justify"> 👀 Se pueden usar agregaciones de diseño propio...Para usar las funciones propias de agregación, se debe llamar a esas funcionescon el método <code>agg()</code>... de la siguiente forma:


```python
agrupado = df.groupby()
agrupado.agg(mi_funcion)
```



 # **<font color="DeepPink">Aplicación de funciones múltiples y en columnas</font>**

<p align="justify"> 👀 Vamos a ver un conjunto de datos con las siguientes columnas:
</p>

- total_bill es el total facturado expresado en dolares
- tip es la propina en dolares
- sex es el genero de quien paga
- smoker es si habían fumadores
- day es el dia de la semana
- time es para saber si es cena o almuerzo
- size es la cantidad de personas en el evento


In [None]:
tips = pd.read_csv("https://raw.githubusercontent.com/cristiandarioortegayubro/BDS/main/datasets/tips.csv")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


<p align="justify"> 👀 Vamos a agregar una columna para calcular que porcentaje tiene la propina sobre el pago total:


In [None]:
tips["tip_pct"] = (tips["tip"] / tips["total_bill"]).round(3)

In [None]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
0,16.99,1.01,Female,No,Sun,Dinner,2,0.059
1,10.34,1.66,Male,No,Sun,Dinner,3,0.161
2,21.01,3.50,Male,No,Sun,Dinner,3,0.167
3,23.68,3.31,Male,No,Sun,Dinner,2,0.140
4,24.59,3.61,Female,No,Sun,Dinner,4,0.147
...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,0.204
240,27.18,2.00,Female,Yes,Sat,Dinner,2,0.074
241,22.67,2.00,Male,Yes,Sat,Dinner,2,0.088
242,17.82,1.75,Male,No,Sat,Dinner,2,0.098


<p align="justify"> 👀 Ahora vamos a agrupar las propinas por <code>day</code> y por <code>smoker</code>:


In [None]:
grouped = tips.groupby(["day", "smoker"])

<p align="justify"> 👀 Vamos a agregar una <code>Serie</code> a un <code>DataFrame</code> usando <code>aggregate()</code> con la función deseada o llamando a un método como <code>mean()</code> o <code>std()</code>:

In [None]:
grouped_pct = grouped["tip_pct"]

In [None]:
grouped_pct.agg("mean").round(2)

day   smoker
Fri   No        0.15
      Yes       0.17
Sat   No        0.16
      Yes       0.15
Sun   No        0.16
      Yes       0.19
Thur  No        0.16
      Yes       0.16
Name: tip_pct, dtype: float64

<p align="justify"> 👀 Si pasamos una lista de métodos <code>mean()</code> o <code>std()</code>, obtenemos un <code>DataFrame</code> con esos nombres de columnas:

In [None]:
grouped_pct.agg(["mean", "std"]).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,std
day,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1
Fri,No,0.15,0.03
Fri,Yes,0.17,0.05
Sat,No,0.16,0.04
Sat,Yes,0.15,0.06
Sun,No,0.16,0.04
Sun,Yes,0.19,0.15
Thur,No,0.16,0.04
Thur,Yes,0.16,0.04


<p align="justify"> Nótese que el <code>DataFrame</code> es de índice multiple, con dos niveles de jerarquias.

<p align="justify"> 👀 Con un <code>DataFrame</code> hay más opciones, ya que se puede especificar una lista con funciones para aplicar a todas las columnas, o tambien se puede definir diferentes funciones para cada columna:

In [None]:
funciones = ["count", "mean", "max"]

In [None]:
resultados = grouped[["tip_pct", "total_bill"]].agg(funciones).round(3)

In [None]:
resultados

Unnamed: 0_level_0,Unnamed: 1_level_0,tip_pct,tip_pct,tip_pct,total_bill,total_bill,total_bill
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,max,count,mean,max
day,smoker,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Fri,No,4,0.152,0.188,4,18.42,22.75
Fri,Yes,15,0.175,0.263,15,16.813,40.17
Sat,No,45,0.158,0.292,45,19.662,48.33
Sat,Yes,42,0.148,0.326,42,21.277,50.81
Sun,No,57,0.16,0.253,57,20.507,48.17
Sun,Yes,19,0.187,0.71,19,24.12,45.35
Thur,No,45,0.16,0.266,45,17.113,41.19
Thur,Yes,17,0.164,0.241,17,19.191,43.11


<p align="justify"> Este es un <code>DataFrame</code> con índice y columnas multiples, con dos niveles de jerarquias tanto en los índices como en las columnas. Si solo quiero visualizar la columna <code>tip_pct</code>, entonces debemos usar el nombre de la columna como argumento, por ejemplo:

In [None]:
resultados["tip_pct"]

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,max
day,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,No,4,0.152,0.188
Fri,Yes,15,0.175,0.263
Sat,No,45,0.158,0.292
Sat,Yes,42,0.148,0.326
Sun,No,57,0.16,0.253
Sun,Yes,19,0.187,0.71
Thur,No,45,0.16,0.266
Thur,Yes,17,0.164,0.241


In [None]:
resultados["total_bill"]

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,max
day,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,No,4,18.42,22.75
Fri,Yes,15,16.813,40.17
Sat,No,45,19.662,48.33
Sat,Yes,42,21.277,50.81
Sun,No,57,20.507,48.17
Sun,Yes,19,24.12,45.35
Thur,No,45,17.113,41.19
Thur,Yes,17,19.191,43.11


<p align="justify"> 👀 Ahora queremos aplicar diferentes funciones para cada columna en un <code>DataFrame</code>, lo que lo logramos generando un diccionario. Para ello, en las claves del diccionario se deben colocar los nombres de las columnas que se desea aplicar alguna función:

In [None]:
grouped.agg({"tip" : np.max, "size" : "sum"})

Unnamed: 0_level_0,Unnamed: 1_level_0,tip,size
day,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1
Fri,No,3.5,9
Fri,Yes,4.73,31
Sat,No,9.0,115
Sat,Yes,10.0,104
Sun,No,6.0,167
Sun,Yes,6.5,49
Thur,No,6.7,112
Thur,Yes,5.0,40


In [None]:
grouped.agg({"tip_pct" : ["min", "max", "mean", "std"],
             "size" : "sum"}).round(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,tip_pct,tip_pct,tip_pct,tip_pct,size
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean,std,sum
day,smoker,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Fri,No,0.12,0.188,0.152,0.028,9
Fri,Yes,0.104,0.263,0.175,0.051,31
Sat,No,0.057,0.292,0.158,0.04,115
Sat,Yes,0.036,0.326,0.148,0.061,104
Sun,No,0.059,0.253,0.16,0.042,167
Sun,Yes,0.066,0.71,0.187,0.154,49
Thur,No,0.073,0.266,0.16,0.039,112
Thur,Yes,0.09,0.241,0.164,0.039,40


 ## **<font color="DeepPink">Datos agregados sin índices de fila</font>**

<p align="justify"> 👀 Podemos deshabilitar el índice con el parámetro <code> as_index =False</code>, entonces nuestro <code>DataFrame</code> no tendría un índice múltiple, sino que posee el siguiente aspecto:

In [None]:
grouped = tips.groupby(["day", "smoker"], as_index=False)

In [None]:
grouped.mean(numeric_only=True).round(3)

Unnamed: 0,day,smoker,total_bill,tip,size,tip_pct
0,Fri,No,18.42,2.812,2.25,0.152
1,Fri,Yes,16.813,2.714,2.067,0.175
2,Sat,No,19.662,3.103,2.556,0.158
3,Sat,Yes,21.277,2.875,2.476,0.148
4,Sun,No,20.507,3.168,2.93,0.16
5,Sun,Yes,24.12,3.517,2.579,0.187
6,Thur,No,17.113,2.674,2.489,0.16
7,Thur,Yes,19.191,3.03,2.353,0.164


 ## **<font color="DeepPink">Usando apply( ) en groupby( )</font>**

<p align="justify"> 👀 Primero vamos a generar una función que calcule el top $5$ en propinas, es decir las $5$ propinas mas áltas:

In [None]:
def top(df, n=5, column="tip_pct"):
  return df.sort_values(column, ascending=False)[:n]

<p align="justify"> 👀 Ahora aplicamos la función a nuestro <code>DataFrame</code>:

In [None]:
top(tips, n=6)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,tip_pct
172,7.25,5.15,Male,Yes,Sun,Dinner,2,0.71
178,9.6,4.0,Female,Yes,Sun,Dinner,2,0.417
67,3.07,1.0,Female,Yes,Sat,Dinner,1,0.326
232,11.61,3.39,Male,No,Sat,Dinner,2,0.292
183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.281
109,14.31,4.0,Female,Yes,Sat,Dinner,2,0.28


<p align="justify"> 👀 Ahora, si agrupamos por <code>smoker</code>, y usamos <code>apply</code> con la función que acabamos de crear, entonces obtenemos lo siguiente:

In [None]:
tips.groupby("smoker").apply(top)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
No,232,11.61,3.39,Male,No,Sat,Dinner,2,0.292
No,149,7.51,2.0,Male,No,Thur,Lunch,2,0.266
No,51,10.29,2.6,Female,No,Sun,Dinner,2,0.253
No,185,20.69,5.0,Male,No,Sun,Dinner,5,0.242
No,88,24.71,5.85,Male,No,Thur,Lunch,2,0.237
Yes,172,7.25,5.15,Male,Yes,Sun,Dinner,2,0.71
Yes,178,9.6,4.0,Female,Yes,Sun,Dinner,2,0.417
Yes,67,3.07,1.0,Female,Yes,Sat,Dinner,1,0.326
Yes,183,23.17,6.5,Male,Yes,Sun,Dinner,4,0.281
Yes,109,14.31,4.0,Female,Yes,Sat,Dinner,2,0.28


<p align="justify"> 👀 Y tambien, se pueden colocar otros argumentos en la función:

In [None]:
tips.groupby(["smoker", "day"]).apply(top, n=1, column="total_bill")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,total_bill,tip,sex,smoker,day,time,size,tip_pct
smoker,day,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
No,Fri,94,22.75,3.25,Female,No,Fri,Dinner,2,0.143
No,Sat,212,48.33,9.0,Male,No,Sat,Dinner,4,0.186
No,Sun,156,48.17,5.0,Male,No,Sun,Dinner,6,0.104
No,Thur,142,41.19,5.0,Male,No,Thur,Lunch,5,0.121
Yes,Fri,95,40.17,4.73,Male,Yes,Fri,Dinner,4,0.118
Yes,Sat,170,50.81,10.0,Male,Yes,Sat,Dinner,3,0.197
Yes,Sun,182,45.35,3.5,Male,Yes,Sun,Dinner,3,0.077
Yes,Thur,197,43.11,5.0,Female,Yes,Thur,Lunch,4,0.116


<p align="justify"> 👀 Otro ejemplo:

In [None]:
result = tips.groupby("smoker")["tip_pct"].describe().round(3)

In [None]:
result

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
smoker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
No,151.0,0.159,0.04,0.057,0.136,0.156,0.185,0.292
Yes,93.0,0.163,0.085,0.036,0.107,0.154,0.195,0.71


<br>
<br>
<p align="center"><b>
💗
<font color="DeepPink">
Hemos llegado al final de nuestro colab de Pandas, a seguir codeando...
</font>
</p>
