# Analisis con Pandas y Kaggle (Core)

El objetivo de esta actividad es poner en práctica todos los conocimientos adquiridos sobre Pandas a través del análisis completo de un dataset. Los estudiantes deben aplicar técnicas de carga, exploración, limpieza, transformación, y agregación de datos para extraer insights valiosos. La actividad no incluye visualización de datos, enfocándose únicamente en el análisis y manipulación de datos con Pandas.

## Instrucciones

### 1. Preparacion del entorno

* Asegúrate de tener instalado Pandas en tu entorno de trabajo.
* Descarga el archivo dataset.csv desde Kaggle. Elige un dataset que te interese y que no incluya visualización de datos. Algunas sugerencias pueden ser datasets relacionados con ventas, compras, productos, etc.

### 2. Cargar los Datos

* Carga el archivo CSV en un DataFrame de Pandas.
* Muestra las primeras 10 filas del DataFrame para confirmar que los datos se han cargado correctamente.

### 3. Exploración Inicial de los Datos

* Muestra las últimas 5 filas del DataFrame.
* Utiliza el método info() para obtener información general sobre el DataFrame, incluyendo el número de entradas, nombres de las columnas, tipos de datos y memoria utilizada.
* Genera estadísticas descriptivas del DataFrame utilizando el método describe().

### 4. Limpieza de Datos

* Identifica y maneja los datos faltantes utilizando técnicas apropiadas (relleno con valores estadísticos, interpolación, eliminación, etc.).
* Corrige los tipos de datos si es necesario (por ejemplo, convertir cadenas a fechas).
* Elimina duplicados si los hay.

### 5. Transformación de Datos

* Crea nuevas columnas basadas en operaciones con las columnas existentes (por ejemplo, calcular ingresos a partir de ventas y precios).
* Normaliza o estandariza columnas si es necesario.
* Clasifica los datos en categorías relevantes.

### 6. Análisis de Datos

* Realiza agrupaciones de datos utilizando groupby para obtener insights específicos (por ejemplo, ventas por producto, ventas por región, etc.).
* Aplica funciones de agregación como sum, mean, count, min, max, std, y var.
* Utiliza el método apply para realizar operaciones más complejas y personalizadas.

# Resolución

Para esta tarea, se eligió el siguiente DataSet con datos del clima: https://www.kaggle.com/datasets/prasad22/weather-data

### Carga de Datos

In [2]:
import pandas as pd

camino_dataset = "../data/weather_data.csv"

df = pd.read_csv(camino_dataset)

# Muestra las primeras 10 filas del DataFrame para confirmar que los datos se han cargado correctmente.
df.head(10)

Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh
0,San Diego,2024-01-14 21:12:46,10.683001,41.195754,4.020119,8.23354
1,San Diego,2024-05-17 15:22:10,8.73414,58.319107,9.111623,27.715161
2,San Diego,2024-05-11 09:30:59,11.632436,38.820175,4.607511,28.732951
3,Philadelphia,2024-02-26 17:32:39,-8.628976,54.074474,3.18372,26.367303
4,San Antonio,2024-04-29 13:23:51,39.808213,72.899908,9.598282,29.898622
5,San Diego,2024-01-21 08:54:56,27.341055,49.023236,9.166543,27.473896
6,San Jose,2024-01-13 02:10:54,1.881883,65.742325,0.221709,1.073112
7,New York,2024-01-25 19:04:34,-6.894766,30.804894,8.027624,16.848337
8,New York,2024-03-29 05:20:30,0.963545,38.819158,3.640129,7.989024
9,San Jose,2024-05-18 09:14:02,-1.607088,82.198701,4.101493,25.647282


### Exploración Inicial de los Datos

#### Mostrar las últimas 5 filas del DataFrame

In [3]:
df.tail(5)

Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh
999995,Dallas,2024-01-01 20:29:48,23.416877,37.705024,3.819833,16.538119
999996,San Antonio,2024-01-20 15:59:48,6.75908,40.731036,8.182785,29.005558
999997,New York,2024-04-14 08:30:09,15.664465,62.201884,3.987558,0.403909
999998,Chicago,2024-05-12 20:10:43,18.999994,63.703245,4.294325,6.326036
999999,New York,2024-04-16 16:11:52,10.725351,43.804584,1.883292,15.363828


#### Utiliza el método info() para obtener información general sobre el DataFrame, incluyendo el número de entradas, nombres de las columnas, tipos de datos y memoria utilizada.

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 6 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   Location          1000000 non-null  object 
 1   Date_Time         1000000 non-null  object 
 2   Temperature_C     1000000 non-null  float64
 3   Humidity_pct      1000000 non-null  float64
 4   Precipitation_mm  1000000 non-null  float64
 5   Wind_Speed_kmh    1000000 non-null  float64
dtypes: float64(4), object(2)
memory usage: 45.8+ MB


#### Genera estadísticas descriptivas del DataFrame utilizando el método describe().

In [5]:
df.describe()

Unnamed: 0,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh
count,1000000.0,1000000.0,1000000.0,1000000.0
mean,14.779705,60.02183,5.109639,14.997598
std,14.482558,17.324022,2.947997,8.663556
min,-19.969311,30.000009,9e-06,5.1e-05
25%,2.269631,45.0085,2.580694,7.490101
50%,14.778002,60.018708,5.109917,14.993777
75%,27.270489,75.043818,7.61375,22.51411
max,39.999801,89.999977,14.971583,29.999973


### Limpieza de datos

#### Corrige los tipos de datos si es necesario (por ejemplo, convertir cadenas a fechas).

In [6]:
# Para mayor facilidad, es posible convertir la columna Date_time que es de tipo cadena a datetime de python.
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format="%Y-%m-%d %H:%M:%S")

df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 6 columns):
 #   Column            Non-Null Count    Dtype         
---  ------            --------------    -----         
 0   Location          1000000 non-null  object        
 1   Date_Time         1000000 non-null  datetime64[ns]
 2   Temperature_C     1000000 non-null  float64       
 3   Humidity_pct      1000000 non-null  float64       
 4   Precipitation_mm  1000000 non-null  float64       
 5   Wind_Speed_kmh    1000000 non-null  float64       
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 45.8+ MB


Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh
0,San Diego,2024-01-14 21:12:46,10.683001,41.195754,4.020119,8.23354
1,San Diego,2024-05-17 15:22:10,8.73414,58.319107,9.111623,27.715161
2,San Diego,2024-05-11 09:30:59,11.632436,38.820175,4.607511,28.732951
3,Philadelphia,2024-02-26 17:32:39,-8.628976,54.074474,3.18372,26.367303
4,San Antonio,2024-04-29 13:23:51,39.808213,72.899908,9.598282,29.898622


In [7]:
# Volvemos a mostrar las estadísticas de los datos.
df.describe()

Unnamed: 0,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh
count,1000000,1000000.0,1000000.0,1000000.0,1000000.0
mean,2024-03-10 10:40:58.896321792,14.779705,60.02183,5.109639,14.997598
min,2024-01-01 00:00:06,-19.969311,30.000009,9e-06,5.1e-05
25%,2024-02-04 16:28:23.750000128,2.269631,45.0085,2.580694,7.490101
50%,2024-03-10 11:43:28,14.778002,60.018708,5.109917,14.993777
75%,2024-04-14 03:51:32.500000,27.270489,75.043818,7.61375,22.51411
max,2024-05-18 19:44:10,39.999801,89.999977,14.971583,29.999973
std,,14.482558,17.324022,2.947997,8.663556


Verificamos ahora valores nulos.

In [8]:
from utils import obtener_estadisticas_datos_nulos

df_estadisticas_nulos = obtener_estadisticas_datos_nulos(df)
df_estadisticas_nulos

Unnamed: 0,datos sin NAs en q,Na en q,Na en %
Location,1000000,0,0.0
Date_Time,1000000,0,0.0
Temperature_C,1000000,0,0.0
Humidity_pct,1000000,0,0.0
Precipitation_mm,1000000,0,0.0
Wind_Speed_kmh,1000000,0,0.0


Al parecer, no existen valores nulos en el dataset.

#### Elimina duplicados si los hay.

In [9]:
# Primero, chequeamos si hay o no filas duplicadas.
cant_duplicados = len(df[df.duplicated()])
print(f"Cantidad de filas duplicadas: {cant_duplicados}")

Cantidad de filas duplicadas: 0


Como no existen filas duplicadas, no es necesario eliminar nada.

En caso de que existan duplicados, pueden ser eliminadas así: df.drop_duplicates(keep='first', inplace=True)

Notar que keep determina cual versión es la que va a quedar

In [10]:
# Adicionalmente, se pueden identificar si existen datos duplicados sólo en las columnas Location y Date_Time
cant_duplicados_location_datetime = len(df[df.duplicated(["Location", "Date_Time"])])
print(f"Cantidad de filas por ubicación y fecha: {cant_duplicados}")

Cantidad de filas por ubicación y fecha: 0


Como no existen filas duplicadas, no es necesario eliminar nada.

En caso de que existan duplicados, pueden ser eliminadas así: df.drop_duplicates(["Location", "Date_Time"], keep='first', inplace=True)

Adicionalmente, en caso de que hayan duplicados, hay que ver cuáles valores eliminar y cuales no, ya definiendo un criterio de acuerdo a los datos.

### Transformación de Datos

#### Crea nuevas columnas basadas en operaciones con las columnas existentes

In [11]:
# A modo de prueba, se agrega una nueva columna Temperature_F convirtiendo la temperatura en Celsius a Fahrenheit
df["Temperature_F"] = (df["Temperature_C"] * (9/5)) + 32

df.head()

Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh,Temperature_F
0,San Diego,2024-01-14 21:12:46,10.683001,41.195754,4.020119,8.23354,51.229402
1,San Diego,2024-05-17 15:22:10,8.73414,58.319107,9.111623,27.715161,47.721452
2,San Diego,2024-05-11 09:30:59,11.632436,38.820175,4.607511,28.732951,52.938385
3,Philadelphia,2024-02-26 17:32:39,-8.628976,54.074474,3.18372,26.367303,16.467843
4,San Antonio,2024-04-29 13:23:51,39.808213,72.899908,9.598282,29.898622,103.654783


In [12]:
# También es posible hacerlo con .apply()

df["Temperature_F"] = df["Temperature_C"].apply(lambda x: (x * 9/5) + 32)

df.head()


Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh,Temperature_F
0,San Diego,2024-01-14 21:12:46,10.683001,41.195754,4.020119,8.23354,51.229402
1,San Diego,2024-05-17 15:22:10,8.73414,58.319107,9.111623,27.715161,47.721452
2,San Diego,2024-05-11 09:30:59,11.632436,38.820175,4.607511,28.732951,52.938385
3,Philadelphia,2024-02-26 17:32:39,-8.628976,54.074474,3.18372,26.367303,16.467843
4,San Antonio,2024-04-29 13:23:51,39.808213,72.899908,9.598282,29.898622,103.654783


#### Normaliza o estandariza columnas si es necesario.

In [13]:
df["Humidity_pct"] = df["Humidity_pct"] / 100

# También se puede: df["Humidity_pct"] = df["Humidity_pct"].apply(lambda x: x / 100)

df.head()


Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh,Temperature_F
0,San Diego,2024-01-14 21:12:46,10.683001,0.411958,4.020119,8.23354,51.229402
1,San Diego,2024-05-17 15:22:10,8.73414,0.583191,9.111623,27.715161,47.721452
2,San Diego,2024-05-11 09:30:59,11.632436,0.388202,4.607511,28.732951,52.938385
3,Philadelphia,2024-02-26 17:32:39,-8.628976,0.540745,3.18372,26.367303,16.467843
4,San Antonio,2024-04-29 13:23:51,39.808213,0.728999,9.598282,29.898622,103.654783


In [14]:
df.describe().T

Unnamed: 0,count,mean,min,25%,50%,75%,max,std
Date_Time,1000000.0,2024-03-10 10:40:58.896321792,2024-01-01 00:00:06,2024-02-04 16:28:23.750000128,2024-03-10 11:43:28,2024-04-14 03:51:32.500000,2024-05-18 19:44:10,
Temperature_C,1000000.0,14.779705,-19.969311,2.269631,14.778002,27.270489,39.999801,14.482558
Humidity_pct,1000000.0,0.600218,0.3,0.450085,0.600187,0.750438,0.9,0.17324
Precipitation_mm,1000000.0,5.109639,0.000009,2.580694,5.109917,7.61375,14.971583,2.947997
Wind_Speed_kmh,1000000.0,14.997598,0.000051,7.490101,14.993777,22.51411,29.999973,8.663556
Temperature_F,1000000.0,58.603469,-3.94476,36.085336,58.600404,81.08688,103.999641,26.068605


La humedad ahora está entre 0.3 y 0.9

#### Clasifica los datos en categorías relevantes.

Esto se hace en la sección de análisis de datos también.

### Análisis de Datos

#### Realiza agrupaciones de datos utilizando groupby para obtener insights específicos (por ejemplo, ventas por producto, ventas por región, etc.) y Aplica funciones de agregación como sum, mean, count, min, max, std, y var.

Para realizar un ejemplo de agrupación y de funciones de agregación, una idea sería obtener los promedios de temperatura en Celsius y humedad en porcentaje en cada ciudad y en un rango de tiempo. También se podrían visualizar la cant de datos por ciudad.

In [15]:
from datetime import datetime, timedelta

start_time = datetime(year=2024, month=1, day=1)
end_time = start_time + timedelta(days=3 * 30)

df_en_fechas = df[(df["Date_Time"] >= start_time) & (df["Date_Time"] <= end_time)]

df_en_fechas.groupby("Location")[["Temperature_C", "Humidity_pct"]].agg(["std", "mean", "size", "min", "max", "var"])


Unnamed: 0_level_0,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct
Unnamed: 0_level_1,std,mean,size,min,max,var,std,mean,size,min,max,var
Location,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Chicago,14.455587,15.00101,64786,-9.998746,39.9982,208.964001,0.173226,0.600729,64786,0.300016,0.899989,0.030007
Dallas,14.461545,14.918718,64877,-9.999588,39.998211,209.136284,0.173767,0.599894,64877,0.300006,0.899991,0.030195
Houston,14.42851,14.947027,64965,-9.998008,39.998693,208.181895,0.173695,0.599897,64965,0.300007,0.899991,0.03017
Los Angeles,14.491698,15.027929,64565,-9.999461,39.998997,210.009322,0.173116,0.600384,64565,0.300005,0.89998,0.029969
New York,14.412721,15.006707,64754,-9.99987,39.998938,207.726537,0.173317,0.600959,64754,0.300011,0.899999,0.030039
Philadelphia,14.419662,15.057841,64459,-9.999282,39.999325,207.926652,0.172753,0.600344,64459,0.3,0.899993,0.029844
Phoenix,14.819533,11.640008,65365,-19.969311,39.982125,219.61857,0.173131,0.59987,65365,0.30003,0.899962,0.029974
San Antonio,14.450467,15.058338,64554,-9.999953,39.998271,208.815986,0.172856,0.599695,64554,0.3,0.899996,0.029879
San Diego,14.427006,14.931712,64650,-9.999986,39.999692,208.138508,0.17287,0.60074,64650,0.300013,0.899989,0.029884
San Jose,14.367822,14.929098,64638,-9.998813,39.999015,206.434302,0.17326,0.599872,64638,0.300018,0.899986,0.030019


In [16]:
# Otro rango de fechas

start_time = datetime(year=2024, month=4, day=1)
end_time = start_time + timedelta(days=1)

df_en_fechas = df[(df["Date_Time"] >= start_time) & (df["Date_Time"] <= end_time)]

df_en_fechas.groupby("Location")[["Temperature_C", "Humidity_pct"]].agg(["std", "mean", "size", "min", "max", "var"])

Unnamed: 0_level_0,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct
Unnamed: 0_level_1,std,mean,size,min,max,var,std,mean,size,min,max,var
Location,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Chicago,14.49857,14.936691,713,-9.947536,39.955782,210.20854,0.173764,0.5886,713,0.300145,0.899203,0.030194
Dallas,14.657061,14.45644,725,-9.997936,39.918339,214.829426,0.175719,0.592984,725,0.300188,0.899447,0.030877
Houston,14.294905,13.775213,747,-9.948709,39.957475,204.344305,0.179138,0.592647,747,0.300088,0.899493,0.03209
Los Angeles,14.478412,14.117301,784,-9.983988,39.932386,209.624413,0.175276,0.599401,784,0.301144,0.899313,0.030722
New York,14.370152,14.55261,658,-9.845151,39.936257,206.501274,0.171922,0.599895,658,0.302105,0.898785,0.029557
Philadelphia,14.440143,15.003104,760,-9.906354,39.874026,208.517734,0.171924,0.603745,760,0.301146,0.89999,0.029558
Phoenix,14.165175,14.964916,693,-9.965417,39.986844,200.652184,0.172741,0.605608,693,0.300653,0.899447,0.029839
San Antonio,14.191525,14.213242,739,-9.895227,39.855572,201.399375,0.178742,0.603905,739,0.300512,0.899828,0.031949
San Diego,14.922624,14.376765,792,-9.951691,39.89757,222.684719,0.171377,0.599526,792,0.300101,0.899086,0.02937
San Jose,14.537423,15.449468,683,-9.854595,39.963879,211.336675,0.174601,0.595652,683,0.300795,0.898876,0.030486


In [17]:
# También, se puede agregar una columna que indique el mes y el año,
# para luego agrupar por dicha columna y realizar más estadísticas.
df["Year_Month"] = df["Date_Time"].apply(lambda dt: dt.strftime("%Y-%m"))
df["Year_Month_Day"] = df["Date_Time"].apply(lambda dt: dt.strftime("%Y-%m-%d"))

In [18]:
# Ordenamos por ciudad y luego por fecha para mayor facilidad
# a la hora de visualizar los datos.
df.sort_values(by=["Location", "Date_Time"], inplace=True)
df.head(10)

Unnamed: 0,Location,Date_Time,Temperature_C,Humidity_pct,Precipitation_mm,Wind_Speed_kmh,Temperature_F,Year_Month,Year_Month_Day
996349,Chicago,2024-01-01 00:03:25,5.222404,0.463957,6.984159,1.219788,41.400326,2024-01,2024-01-01
898802,Chicago,2024-01-01 00:07:14,-6.403267,0.424003,0.451659,0.979239,20.474119,2024-01,2024-01-01
484085,Chicago,2024-01-01 00:09:36,8.227097,0.58197,9.686955,10.769515,46.808775,2024-01,2024-01-01
989809,Chicago,2024-01-01 00:10:35,-2.955866,0.86984,0.783134,20.737256,26.679442,2024-01,2024-01-01
576663,Chicago,2024-01-01 00:10:59,28.382266,0.357556,0.064871,3.50288,83.088078,2024-01,2024-01-01
526810,Chicago,2024-01-01 00:13:44,28.630258,0.369097,8.442318,22.669876,83.534465,2024-01,2024-01-01
26654,Chicago,2024-01-01 00:16:51,20.658784,0.840157,7.095816,17.824529,69.185811,2024-01,2024-01-01
737445,Chicago,2024-01-01 00:20:08,7.851437,0.446669,8.633773,4.56248,46.132587,2024-01,2024-01-01
468343,Chicago,2024-01-01 00:21:36,9.436938,0.895538,0.727363,8.222289,48.986488,2024-01,2024-01-01
308904,Chicago,2024-01-01 00:22:51,36.646044,0.595766,9.285471,25.909385,97.962879,2024-01,2024-01-01


Finalmente, se pueden también sacar datos estadísticos por mes y por ciudad.

In [19]:
df_estadisticas_mes_ciudad = df.groupby(
    by=["Year_Month", "Location"])[
        ["Temperature_C", "Humidity_pct", "Precipitation_mm", "Wind_Speed_kmh", "Temperature_F"]
    ].agg(["mean", "std", "min", "max"]).reset_index().sort_values("Year_Month", ascending=True)

df_estadisticas_mes_ciudad.head(10)

Unnamed: 0_level_0,Year_Month,Location,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,...,Precipitation_mm,Precipitation_mm,Wind_Speed_kmh,Wind_Speed_kmh,Wind_Speed_kmh,Wind_Speed_kmh,Temperature_F,Temperature_F,Temperature_F,Temperature_F
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,min,max,mean,std,min,max,...,min,max,mean,std,min,max,mean,std,min,max
0,2024-01,Chicago,14.85855,14.487067,-9.996121,39.995211,0.599856,0.173424,0.300016,0.899958,...,0.000303,9.999346,15.008815,8.697653,0.002459,29.998607,58.745389,26.076721,14.006982,103.991379
1,2024-01,Dallas,14.937326,14.476657,-9.996346,39.997409,0.599315,0.173584,0.300006,0.899991,...,0.000328,9.999968,14.967478,8.643823,0.000661,29.999971,58.887186,26.057982,14.006578,103.995337
2,2024-01,Houston,14.890978,14.480816,-9.996164,39.995989,0.601714,0.173491,0.30002,0.899975,...,0.000653,9.999774,15.003986,8.645307,0.00114,29.999947,58.80376,26.065468,14.006905,103.99278
3,2024-01,Los Angeles,15.045351,14.480871,-9.999461,39.998152,0.600298,0.172821,0.300005,0.89998,...,0.000167,9.997226,14.968723,8.673367,0.000394,29.99933,59.081631,26.065568,14.00097,103.996673
4,2024-01,New York,14.91393,14.386941,-9.999184,39.998678,0.600954,0.173403,0.300031,0.899981,...,0.00034,9.999992,14.989964,8.658113,0.000465,29.997098,58.845074,25.896493,14.001468,103.99762
5,2024-01,Philadelphia,15.002061,14.430391,-9.995552,39.999325,0.599551,0.172931,0.300069,0.899985,...,0.000674,9.999889,14.936646,8.62356,0.000892,29.999973,59.00371,25.974704,14.008007,103.998785
6,2024-01,Phoenix,9.99815,14.755805,-19.969311,39.953507,0.599996,0.173674,0.30003,0.899929,...,0.050475,14.971583,14.997482,8.696203,0.001393,29.999392,49.99667,26.560449,-3.94476,103.916313
7,2024-01,San Antonio,14.970648,14.500521,-9.999806,39.997462,0.598388,0.17261,0.3,0.899994,...,0.000225,9.999712,14.99159,8.652678,0.001013,29.998826,58.947166,26.100937,14.000349,103.995432
8,2024-01,San Diego,14.999351,14.393781,-9.999986,39.999692,0.602029,0.173228,0.300013,0.89998,...,0.000365,9.999573,14.980816,8.676062,0.000614,29.999252,58.998832,25.908806,14.000025,103.999446
9,2024-01,San Jose,14.923275,14.397236,-9.998813,39.996586,0.599735,0.173711,0.300048,0.899897,...,0.000282,9.999563,14.985881,8.646501,0.000734,29.99919,58.861895,25.915024,14.002136,103.993854


Si queremos visualizar por ciudad (Ej: Chicago)

In [20]:
df_estadisticas_mes_ciudad[df_estadisticas_mes_ciudad["Location"] == "Chicago"]

Unnamed: 0_level_0,Year_Month,Location,Temperature_C,Temperature_C,Temperature_C,Temperature_C,Humidity_pct,Humidity_pct,Humidity_pct,Humidity_pct,...,Precipitation_mm,Precipitation_mm,Wind_Speed_kmh,Wind_Speed_kmh,Wind_Speed_kmh,Wind_Speed_kmh,Temperature_F,Temperature_F,Temperature_F,Temperature_F
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,min,max,mean,std,min,max,...,min,max,mean,std,min,max,mean,std,min,max
0,2024-01,Chicago,14.85855,14.487067,-9.996121,39.995211,0.599856,0.173424,0.300016,0.899958,...,0.000303,9.999346,15.008815,8.697653,0.002459,29.998607,58.745389,26.076721,14.006982,103.991379
10,2024-02,Chicago,15.115803,14.445995,-9.998746,39.9982,0.600008,0.17306,0.300024,0.899989,...,0.000215,9.998733,14.963082,8.652912,0.000451,29.999852,59.208446,26.002791,14.002258,103.996759
20,2024-03,Chicago,15.076532,14.422655,-9.998036,39.984423,0.602098,0.173212,0.300061,0.899919,...,0.000866,9.999563,15.064735,8.659804,0.000774,29.999889,59.137758,25.960778,14.003535,103.971962
30,2024-04,Chicago,15.08725,14.449244,-9.999959,39.993524,0.600327,0.174019,0.300031,0.899979,...,0.000774,9.999889,14.99401,8.663069,0.000171,29.999096,59.15705,26.008639,14.000073,103.988343
40,2024-05,Chicago,14.852429,14.396131,-9.999896,39.998561,0.602907,0.172914,0.300017,0.899943,...,0.000118,9.999521,14.938686,8.662737,0.002394,29.997365,58.734372,25.913036,14.000186,103.997409


#### Utiliza el método apply para realizar operaciones más complejas y personalizadas.

Este método ya se puede ver en la sección de transformación de datos.