# 🧪 Prácticas por Dataset de Kaggle

## 🛍️ Retail Sales Dataset

🔗 Dataset disponible en: [https://www.kaggle.com/datasets/mohammadtalib786/retail-sales-dataset?utm_source=chatgpt.com](https://www.kaggle.com/datasets/mohammadtalib786/retail-sales-dataset?utm_source=chatgpt.com)

In [1]:
# Importar pandas
import pandas as pd
import numpy as np

In [2]:
# cargar el DataSet
df_retail = pd.read_csv('./DataSets/retail_sales_dataset.csv')
df_retail

Unnamed: 0,Transaction ID,Date,Customer ID,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,1,2023-11-24,CUST001,Male,34,Beauty,3,50,150
1,2,2023-02-27,CUST002,Female,26,Clothing,2,500,1000
2,3,2023-01-13,CUST003,Male,50,Electronics,1,30,30
3,4,2023-05-21,CUST004,Male,37,Clothing,1,500,500
4,5,2023-05-06,CUST005,Male,30,Beauty,2,50,100
...,...,...,...,...,...,...,...,...,...
995,996,2023-05-16,CUST996,Male,62,Clothing,1,50,50
996,997,2023-11-17,CUST997,Male,52,Beauty,3,30,90
997,998,2023-10-29,CUST998,Female,23,Beauty,4,25,100
998,999,2023-12-05,CUST999,Female,36,Electronics,3,50,150


In [3]:
import random
df_retail['Store'] = [random.choice(['A', 'B', 'C', 'D']) for _ in range(len(df_retail))]
df_retail

Unnamed: 0,Transaction ID,Date,Customer ID,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Store
0,1,2023-11-24,CUST001,Male,34,Beauty,3,50,150,D
1,2,2023-02-27,CUST002,Female,26,Clothing,2,500,1000,B
2,3,2023-01-13,CUST003,Male,50,Electronics,1,30,30,B
3,4,2023-05-21,CUST004,Male,37,Clothing,1,500,500,B
4,5,2023-05-06,CUST005,Male,30,Beauty,2,50,100,D
...,...,...,...,...,...,...,...,...,...,...
995,996,2023-05-16,CUST996,Male,62,Clothing,1,50,50,C
996,997,2023-11-17,CUST997,Male,52,Beauty,3,30,90,C
997,998,2023-10-29,CUST998,Female,23,Beauty,4,25,100,D
998,999,2023-12-05,CUST999,Female,36,Electronics,3,50,150,B


**Pregunta:** ¿Cuántas filas y columnas tiene el dataset?

In [None]:
# Tu código aquí
print(f'Las filas son: {df_retail.shape[0]} \nLas columnas son: {df_retail.shape[1]}')

Las filas son: 1000Las columnas son: 10


**Pregunta:** ¿Cuáles son los tipos de datos de cada columna?

In [5]:
# Tu código aquí
type(df_retail)

pandas.core.frame.DataFrame

**Pregunta:** ¿Qué productos tienen mayores ventas en cantidad?

In [6]:
# Tu código aquí
ventas_producto = df_retail.groupby('Product Category')['Quantity'].sum()
producto_ventas = ventas_producto.sort_values(ascending=False)
print(producto_ventas)

Product Category
Clothing       894
Electronics    849
Beauty         771
Name: Quantity, dtype: int64


**Pregunta:** ¿Qué tiendas venden más productos?

In [7]:
# Tu código aquí
productos_tienda = df_retail.groupby('Store')['Quantity'].count()
tienda_ventas = productos_tienda.sort_values(ascending= False)
print(tienda_ventas)

Store
D    269
B    258
C    256
A    217
Name: Quantity, dtype: int64


**Pregunta:** ¿Existen datos faltantes o duplicados?

In [None]:
# Tu código aquí
faltantes = df_retail.isnull().sum()
print("Datos que faltan por columna:")
print(faltantes)

duplicados = df_retail.duplicated().sum()
print(f"\nNúmero de filas repetidas: {duplicados}")

Datos faltantes por columna:
Transaction ID      0
Date                0
Customer ID         0
Gender              0
Age                 0
Product Category    0
Quantity            0
Price per Unit      0
Total Amount        0
Store               0
dtype: int64

Número de filas duplicadas: 0


**Pregunta:** ¿Cuál es el ingreso total por tienda?

In [9]:
# Tu código aquí
ingreso_tienda = df_retail.groupby('Store')['Total Amount'].sum()
tienda_ingreso = ingreso_tienda.sort_values(ascending=False)
print(tienda_ingreso)


Store
D    121665
B    121535
C    120875
A     91925
Name: Total Amount, dtype: int64


**Pregunta:** Agrupa las ventas por tipo de producto y encuentra la media de precios.

In [10]:
# Tu código aquí
media_precios = df_retail.groupby('Product Category')['Price per Unit'].mean()
media_precios


Product Category
Beauty         184.055375
Clothing       174.287749
Electronics    181.900585
Name: Price per Unit, dtype: float64

**Pregunta:** Crea una nueva columna llamada `ingreso_total` que sea precio * cantidad.

In [11]:
# Tu código aquí
df_retail['ingreso_total'] = df_retail['Price per Unit'] * df_retail['Quantity']
print(df_retail[['Price per Unit', 'Quantity', 'ingreso_total']].head())

   Price per Unit  Quantity  ingreso_total
0              50         3            150
1             500         2           1000
2              30         1             30
3             500         1            500
4              50         2            100


**Pregunta:** Usa una tabla dinámica para comparar ingresos por tienda y por producto.

In [12]:
# Tu código aquí
tabla_dinamica = pd.pivot_table(df_retail, values='Total Amount', index='Store', columns='Product Category', aggfunc='sum', fill_value=0)
print(tabla_dinamica)

Product Category  Beauty  Clothing  Electronics
Store                                          
A                  15125     40085        36715
B                  39010     33590        48935
C                  38960     40390        41525
D                  50420     41515        29730


## 📈 Dummy Advertising and Sales Data

🔗 Dataset disponible en: [https://www.kaggle.com/datasets/harrimansaragih/dummy-advertising-and-sales-data?utm_source=chatgpt.com](https://www.kaggle.com/datasets/harrimansaragih/dummy-advertising-and-sales-data?utm_source=chatgpt.com)

In [13]:
# cargar el DataSet
df_marketing= pd.read_csv('./DataSets/Dummy Data HSS.csv')
df_marketing

Unnamed: 0,TV,Radio,Social Media,Influencer,Sales
0,16.0,6.566231,2.907983,Mega,54.732757
1,13.0,9.237765,2.409567,Mega,46.677897
2,41.0,15.886446,2.913410,Mega,150.177829
3,83.0,30.020028,6.922304,Mega,298.246340
4,15.0,8.437408,1.405998,Micro,56.594181
...,...,...,...,...,...
4567,26.0,4.472360,0.717090,Micro,94.685866
4568,71.0,20.610685,6.545573,Nano,249.101915
4569,44.0,19.800072,5.096192,Micro,163.631457
4570,71.0,17.534640,1.940873,Macro,253.610411


**Pregunta:** ¿Cuál es el gasto promedio en publicidad por canal (TV, Radio, Periódico)?

In [14]:
# Tu código aquí
prom = df_marketing[['TV', 'Radio', 'Social Media']].mean()
print(prom)


TV              54.066857
Radio           18.160356
Social Media     3.323956
dtype: float64


**Pregunta:** ¿Existe correlación entre el presupuesto publicitario y las ventas?

In [15]:
# Tu código aquí
correlacion = df_marketing[['TV', 'Radio', 'Social Media', 'Sales']].corr()
print(correlacion['Sales'])


TV              0.999497
Radio           0.869105
Social Media    0.528906
Sales           1.000000
Name: Sales, dtype: float64


**Pregunta:** ¿Qué campañas tienen ventas superiores a la media?

In [16]:
# Tu código aquí
media_ventas = df_marketing['Sales'].mean()
campañas_superiores = df_marketing[df_marketing['Sales'] > media_ventas]
campañas_superiores


Unnamed: 0,TV,Radio,Social Media,Influencer,Sales
3,83.0,30.020028,6.922304,Mega,298.246340
6,55.0,24.893811,4.273602,Micro,198.679825
8,76.0,24.648898,7.130116,Macro,270.189400
10,62.0,24.345189,5.151483,Nano,224.961019
12,64.0,20.240424,3.921148,Micro,229.632381
...,...,...,...,...,...
4561,60.0,21.841864,5.092528,Macro,210.680016
4563,93.0,25.285149,2.805840,Macro,327.466288
4564,99.0,36.024174,4.288755,Macro,355.807121
4568,71.0,20.610685,6.545573,Nano,249.101915


**Pregunta:** Filtra las campañas con publicidad en TV > 200 y Radio > 20.

In [17]:
# Tu código aquí
campañas_filtradas = df_marketing[(df_marketing['TV'] > 20) & (df_marketing['Radio'] > 20)]
print("Total de campañas que cumplen la condición:", len(campañas_filtradas))


Total de campañas que cumplen la condición: 1938


**Pregunta:** Agrupa por canal publicitario y calcula la media de ventas.

In [18]:
# Tu código aquí
media_ventas_por_canal = df_marketing.groupby('Influencer')['Sales'].mean()
print(media_ventas_por_canal)

Influencer
Macro    195.613601
Mega     190.593666
Micro    191.809095
Nano     191.934304
Name: Sales, dtype: float64


**Pregunta:** Crea una columna de ROI estimado usando una fórmula simple.

In [19]:
# Tu código aquí
df_marketing['Inversion'] = df_marketing['TV'] + df_marketing['Radio'] + df_marketing['Social Media']
df_marketing['ROI'] = (df_marketing['Sales'] - df_marketing['Inversion']) / df_marketing['Inversion']
print(df_marketing[['Sales', 'Inversion', 'ROI']].head())


        Sales   Inversion       ROI
0   54.732757   25.474214  1.148555
1   46.677897   24.647332  0.893832
2  150.177829   59.799856  1.511341
3  298.246340  119.942332  1.486581
4   56.594181   24.843406  1.278036


**Pregunta:** Realiza una pivot_table para ver ventas promedio por cada tipo de canal.

In [20]:
# Tu código aquí
tabla_ventas = pd.pivot_table(df_marketing, values='Sales', index='Influencer', aggfunc='mean')
print(tabla_ventas)


                 Sales
Influencer            
Macro       195.613601
Mega        190.593666
Micro       191.809095
Nano        191.934304


## 🎬 The Movies Dataset

🔗 Dataset disponible en: [https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset?utm_source=chatgpt.com](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset?utm_source=chatgpt.com)

In [21]:
df_movies= pd.read_csv('./DataSets/MovieFranchises.csv')
df_movies

Unnamed: 0,index,MovieID,Title,Lifetime Gross,Year,Studio,Rating,Runtime,Budget,ReleaseDate,VoteAvg,VoteCount,FranchiseID
0,0,1001,Star Wars: Episode IV - A New Hope,775398007,1977,Lucasfilm,PG,121.0,11000000.0,05-25-77,4.09,96233.0,101.0
1,1,1002,Star Wars: Episode V - The Empire Strikes Back,538375067,1980,Lucasfilm,PG,124.0,18000000.0,06-20-80,4.12,79231.0,101.0
2,2,1003,Star Wars: Episode VI - Return of the Jedi,475106177,1983,Lucasfilm,PG,135.0,32500000.0,05-25-83,3.98,76082.0,101.0
3,3,1004,Jurassic Park,1109802321,1993,Universal Pictures,PG-13,127.0,63000000.0,06-11-93,3.69,82700.0,102.0
4,4,1005,The Lost World: Jurassic Park,618638999,1997,Universal Pictures,PG-13,129.0,73000000.0,05-23-97,3.01,19721.0,102.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
600,600,101,Star Wars,1977,George Lucas,,,,,,,,
601,601,102,Jurassic Park,1993,Michael Crichton,,,,,,,,
602,602,103,Wizarding World,2001,J. K. Rowling,,,,,,,,
603,603,104,Middle Earth,2001,J. R. R. Tolkien,,,,,,,,


In [22]:
import random

# Generar ingresos aleatorios cercanos al presupuesto
def generar_ingresos_aleatorios(budget):
    # Definir un rango de variación (puedes ajustarlo según sea necesario)
    min_variacion = 0.8  # 80% del presupuesto
    max_variacion = 1.2  # 120% del presupuesto
    return budget * random.uniform(min_variacion, max_variacion)

# Aplicar la función para generar ingresos en una nueva columna
df_movies['Estimated Gross'] = df_movies['Budget'].apply(generar_ingresos_aleatorios)

# Mostrar las primeras filas del DataFrame con los ingresos generados
print(df_movies[['Title', 'Budget', 'Estimated Gross']].head())

                                            Title      Budget  Estimated Gross
0              Star Wars: Episode IV - A New Hope  11000000.0     1.049912e+07
1  Star Wars: Episode V - The Empire Strikes Back  18000000.0     2.110284e+07
2      Star Wars: Episode VI - Return of the Jedi  32500000.0     3.420450e+07
3                                   Jurassic Park  63000000.0     7.314948e+07
4                   The Lost World: Jurassic Park  73000000.0     7.283156e+07


In [23]:
import random
df_movies['Genero'] = [random.choice(['Terror', 'Accion', 'Documental', 'Suspenso']) for _ in range(len(df_movies))]
df_movies

Unnamed: 0,index,MovieID,Title,Lifetime Gross,Year,Studio,Rating,Runtime,Budget,ReleaseDate,VoteAvg,VoteCount,FranchiseID,Estimated Gross,Genero
0,0,1001,Star Wars: Episode IV - A New Hope,775398007,1977,Lucasfilm,PG,121.0,11000000.0,05-25-77,4.09,96233.0,101.0,1.049912e+07,Terror
1,1,1002,Star Wars: Episode V - The Empire Strikes Back,538375067,1980,Lucasfilm,PG,124.0,18000000.0,06-20-80,4.12,79231.0,101.0,2.110284e+07,Documental
2,2,1003,Star Wars: Episode VI - Return of the Jedi,475106177,1983,Lucasfilm,PG,135.0,32500000.0,05-25-83,3.98,76082.0,101.0,3.420450e+07,Suspenso
3,3,1004,Jurassic Park,1109802321,1993,Universal Pictures,PG-13,127.0,63000000.0,06-11-93,3.69,82700.0,102.0,7.314948e+07,Documental
4,4,1005,The Lost World: Jurassic Park,618638999,1997,Universal Pictures,PG-13,129.0,73000000.0,05-23-97,3.01,19721.0,102.0,7.283156e+07,Suspenso
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
600,600,101,Star Wars,1977,George Lucas,,,,,,,,,,Documental
601,601,102,Jurassic Park,1993,Michael Crichton,,,,,,,,,,Accion
602,602,103,Wizarding World,2001,J. K. Rowling,,,,,,,,,,Suspenso
603,603,104,Middle Earth,2001,J. R. R. Tolkien,,,,,,,,,,Terror


**Pregunta:** ¿Cuáles son las películas con mayor presupuesto?

In [24]:
# Tu código aquí
mayor_presupuesto = df_movies.sort_values(by='Budget', ascending=False)
mayor_presupuesto[['Title', 'Budget']].head()

Unnamed: 0,Title,Budget
50,Avengers: Endgame,400000000.0
43,Star Wars: Episode VIII - The Last Jedi,317000000.0
36,Star Wars: Episode VII - The Force Awakens,306000000.0
45,Avengers: Infinity War,300000000.0
33,Avengers: Age of Ultron,280000000.0


**Pregunta:** ¿Qué películas obtuvieron mayor ganancia (ingresos - presupuesto)?

In [25]:
# Tu código aquí
df_movies['Lifetime Gross'] = pd.to_numeric(df_movies['Lifetime Gross'])
df_movies['Ganancia'] = df_movies['Lifetime Gross'] - df_movies['Budget']
peliculas_mayor_ganancia = df_movies.sort_values(by='Ganancia', ascending=False)
peliculas_mayor_ganancia[['Title', 'Lifetime Gross', 'Budget', 'Ganancia']].head()


ValueError: Unable to parse string "ActorName" at position 60

**Pregunta:** ¿Cuántas películas hay por género?

In [None]:
# Tu código aquí
peliculas_genero = df_movies['Genero'].value_counts()
peliculas_genero


Genero
Documental    160
Terror        159
Suspenso      151
Accion        135
Name: count, dtype: int64

**Pregunta:** ¿Existen películas con presupuesto o ingresos nulos?

In [None]:
# Tu código aquí
peliculas_nulas = df_movies[(df_movies['Budget'] == 0) | 
                            (df_movies['Lifetime Gross'] == 0) |
                            (df_movies['Budget'].isna()) |
                            (df_movies['Lifetime Gross'].isna())]
total_nulas = peliculas_nulas.shape[0]
print(f"Total de películas con presupuesto o ingresos nulos: {total_nulas}")



Total de películas con presupuesto o ingresos nulos: 545


**Pregunta:** Crea una nueva columna de rentabilidad (ganancia/presupuesto).

In [None]:
# Tu código aquí
df_movies['Rentabilidad'] = df_movies['Ganancia'] / df_movies['Budget']
print(df_movies[['Title', 'Lifetime Gross', 'Budget', 'Ganancia', 'Rentabilidad']].head())


                                            Title  Lifetime Gross      Budget  \
0              Star Wars: Episode IV - A New Hope    7.753980e+08  11000000.0   
1  Star Wars: Episode V - The Empire Strikes Back    5.383751e+08  18000000.0   
2      Star Wars: Episode VI - Return of the Jedi    4.751062e+08  32500000.0   
3                                   Jurassic Park    1.109802e+09  63000000.0   
4                   The Lost World: Jurassic Park    6.186390e+08  73000000.0   

       Ganancia  Rentabilidad  
0  7.643980e+08     69.490728  
1  5.203751e+08     28.909726  
2  4.426062e+08     13.618652  
3  1.046802e+09     16.615910  
4  5.456390e+08      7.474507  


**Pregunta:** Agrupa por año de lanzamiento y calcula ingresos promedio.

In [None]:
# Tu código aquí
ingresos_promedio = df_movies.groupby('Year')['Lifetime Gross'].mean()
ingresos_promedio.head()

Year
1977    7.753980e+08
1980    5.383751e+08
1983    4.751062e+08
1993    1.109802e+09
1997    6.186390e+08
Name: Lifetime Gross, dtype: float64

**Pregunta:** Realiza una tabla dinámica que compare ingresos por género y año.

In [None]:
# Tu código aquí
tabla_ingresos = pd.pivot_table(
    df_movies,
    values='Lifetime Gross',
    index='Genero',
    columns='Year',
    aggfunc='mean',
    fill_value=0
)

tabla_ingresos

Year,1977,1980,1983,1993,1997,1999,2001,2002,2003,2004,...,2017,2018,2019,2021,2022,George Lucas,J. K. Rowling,J. R. R. Tolkien,Michael Crichton,Stan Lee
Genero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Accion,0.0,0.0,0.0,0.0,0.0,0.0,0.0,653779970.0,0.0,0.0,...,0.0,870261400.0,2797501000.0,402064900.0,0.0,0.0,0.0,0.0,1993.0,0.0
Documental,775398007.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797660766.0,...,863756100.0,0.0,0.0,1915878000.0,976779132.0,0.0,0.0,0.0,0.0,2008.0
Suspenso,0.0,0.0,475106177.0,0.0,0.0,1027083000.0,633437800.0,0.0,0.0,0.0,...,1106433000.0,1679413000.0,1111513000.0,379751700.0,758661016.0,1977.0,0.0,0.0,0.0,0.0
Terror,0.0,538375067.0,0.0,1109802000.0,618638999.0,0.0,1022290000.0,913912376.0,1146436000.0,0.0,...,853983900.0,622674100.0,0.0,432243300.0,0.0,0.0,2001.0,2001.0,0.0,0.0


## 🌦️ Climate Insights Dataset

🔗 Dataset disponible en: [https://www.kaggle.com/datasets/goyaladi/climate-insights-dataset?utm_source=chatgpt.com](https://www.kaggle.com/datasets/goyaladi/climate-insights-dataset?utm_source=chatgpt.com)

In [None]:
df_climate= pd.read_csv('./DataSets/climate_change_data.csv')
df_climate

Unnamed: 0,Date,Location,Country,Temperature,CO2 Emissions,Sea Level Rise,Precipitation,Humidity,Wind Speed
0,2000-01-01 00:00:00.000000000,New Williamtown,Latvia,10.688986,403.118903,0.717506,13.835237,23.631256,18.492026
1,2000-01-01 20:09:43.258325832,North Rachel,South Africa,13.814430,396.663499,1.205715,40.974084,43.982946,34.249300
2,2000-01-02 16:19:26.516651665,West Williamland,French Guiana,27.323718,451.553155,-0.160783,42.697931,96.652600,34.124261
3,2000-01-03 12:29:09.774977497,South David,Vietnam,12.309581,422.404983,-0.475931,5.193341,47.467938,8.554563
4,2000-01-04 08:38:53.033303330,New Scottburgh,Moldova,13.210885,410.472999,1.135757,78.695280,61.789672,8.001164
...,...,...,...,...,...,...,...,...,...
9995,2022-12-27 15:21:06.966696576,South Elaineberg,Bhutan,15.020523,391.379537,-1.452243,93.417109,25.293814,6.531866
9996,2022-12-28 11:30:50.225022464,Leblancville,Congo,16.772451,346.921190,0.543616,49.882947,96.787402,42.249014
9997,2022-12-29 07:40:33.483348224,West Stephanie,Argentina,22.370025,466.042136,1.026704,30.659841,15.211825,18.293708
9998,2022-12-30 03:50:16.741674112,Port Steven,Albania,19.430853,337.899776,-0.895329,18.932275,82.774520,42.424255


In [None]:
df_climate.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            10000 non-null  object 
 1   Location        10000 non-null  object 
 2   Country         10000 non-null  object 
 3   Temperature     10000 non-null  float64
 4   CO2 Emissions   10000 non-null  float64
 5   Sea Level Rise  10000 non-null  float64
 6   Precipitation   10000 non-null  float64
 7   Humidity        10000 non-null  float64
 8   Wind Speed      10000 non-null  float64
dtypes: float64(6), object(3)
memory usage: 703.3+ KB


**Pregunta:** ¿Cuántos registros hay por año?

In [None]:
# Tu código aquí
df_climate["Year"] = df_climate["Date"].dt.year
registros_por_año = df_climate["Year"].value_counts().sort_index()

print(registros_por_año)

Year
2000    436
2001    435
2002    434
2003    435
2004    435
2005    435
2006    434
2007    435
2008    435
2009    435
2010    434
2011    435
2012    436
2013    434
2014    434
2015    435
2016    436
2017    434
2018    435
2019    434
2020    436
2021    434
2022    434
Name: count, dtype: int64


**Pregunta:** ¿Cuál es la temperatura media mensual más alta y más baja?

In [None]:
# Tu código aquí
temp_mes = df_climate.groupby("YearMonth")["Temperature"].mean()

mes_max_temp = temp_mes.idxmax()
valor_max_temp = temp_mes.max()

mes_min_temp = temp_mes.idxmin()
valor_min_temp = temp_mes.min()

print(f'El mes con mayor temperatura: {mes_max_temp}, {valor_max_temp}')
print(mes_min_temp, valor_min_temp)

2014-02 17.75066256288355
2022-03 12.730452446728151


**Pregunta:** ¿Qué meses tienen mayor precipitación?

In [None]:
# Tu código aquí
precip_mes = df_climate.groupby("YearMonth")["Precipitation"].mean()
meses_mas_lluvia = precip_mes.sort_values(ascending=False).head(10)

print(meses_mas_lluvia)


YearMonth
2000-05    64.273950
2019-06    63.234109
2012-11    61.210838
2008-12    60.925415
2004-12    60.791353
2003-10    60.593679
2017-10    60.517416
2016-08    60.036764
2010-04    59.560569
2018-11    59.490326
Freq: M, Name: Precipitation, dtype: float64


**Pregunta:** ¿Existen valores faltantes en alguna columna?

In [None]:
# Tu código aquí
faltantes = df_climate.isnull().sum()
print(faltantes)

Date              0
Location          0
Country           0
Temperature       0
CO2 Emissions     0
Sea Level Rise    0
Precipitation     0
Humidity          0
Wind Speed        0
Year              0
YearMonth         0
dtype: int64


**Pregunta:** Agrupa por estación del año y calcula la media de temperatura.

In [None]:
# Tu código aquí
df_climate["Estacion"] = df_climate["Date"].dt.month % 12 // 3 + 1
nombres_estaciones = {1: "Invierno", 2: "Primavera", 3: "Verano", 4: "Otoño"}
df_climate["Estacion"] = df_climate["Estacion"].map(nombres_estaciones)

media_temp_estacion = df_climate.groupby("Estacion")["Temperature"].mean()

print(media_temp_estacion)


Estacion
Invierno     14.957014
Otoño        15.003823
Primavera    14.809827
Verano       14.974683
Name: Temperature, dtype: float64


**Pregunta:** Crea una columna que clasifique los días como 'calurosos' o 'templados'.

In [None]:
# Tu código aquí
df_climate["Clima"] = "templado"
df_climate.loc[df_climate["Temperature"] > 25, "Clima"] = "caluroso"
clima = df_climate[["Date", "Temperature", "Clima"]]
clima


Unnamed: 0,Date,Temperature,Clima
0,2000-01-01 00:00:00.000000000,10.688986,templado
1,2000-01-01 20:09:43.258325832,13.814430,templado
2,2000-01-02 16:19:26.516651665,27.323718,caluroso
3,2000-01-03 12:29:09.774977497,12.309581,templado
4,2000-01-04 08:38:53.033303330,13.210885,templado
...,...,...,...
9995,2022-12-27 15:21:06.966696576,15.020523,templado
9996,2022-12-28 11:30:50.225022464,16.772451,templado
9997,2022-12-29 07:40:33.483348224,22.370025,templado
9998,2022-12-30 03:50:16.741674112,19.430853,templado


**Pregunta:** Genera una pivot_table que muestre la temperatura promedio por año y mes.

In [None]:
# Tu código aquí
df_climate["Month"] = df_climate["Date"].dt.month
temp_year_month = df_climate.pivot_table(values="Temperature", index="Year", columns="Month")

temp_year_month


Month,1,2,3,4,5,6,7,8,9,10,11,12
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2000,14.206357,13.932424,16.353462,14.956865,15.447593,15.312805,15.628546,15.079403,14.245601,14.634586,15.791594,14.747044
2001,14.708201,16.5146,13.973159,15.036136,14.800959,14.723588,15.254576,15.039348,14.480689,14.634459,15.584229,14.721999
2002,13.620897,14.760417,14.212325,13.850515,13.673576,13.502537,15.576326,14.671967,16.328612,14.936858,15.609769,14.881162
2003,14.885562,16.854469,14.114849,14.987196,14.129527,14.842865,15.227706,15.968507,15.345826,14.831173,15.958393,14.932671
2004,15.914895,13.71734,14.815031,15.814182,15.283723,15.279225,16.452629,15.951439,14.279708,15.091279,15.018508,15.603309
2005,15.31061,14.956032,16.763042,14.461071,16.430089,15.204628,15.133922,15.070869,15.637863,15.1262,16.467725,14.17181
2006,13.021321,14.050059,16.181762,14.232334,15.479402,16.405412,14.47512,13.628186,15.726648,15.80062,14.23237,15.69214
2007,14.41966,14.28789,14.973328,13.083154,15.425498,16.270065,15.347496,15.393289,16.682545,15.696156,15.788519,15.11382
2008,15.947948,15.317924,14.456524,14.146105,15.636002,15.028988,14.42355,15.095044,14.29023,13.65756,15.499798,14.481284
2009,15.004275,13.500589,14.986215,13.581644,13.945644,14.644463,14.477702,16.043215,13.838197,13.994606,14.802935,14.848085
