# Prueba de tratamiento de datos de temperaturas

In [1]:
import datetime

import pandas as pd

from cubic.utils import solar_info

Cargamos el conjunto de datos con las temperaturas. Aprovechamos en la carga a especificar bien los tipos de datos porque me han asustado de que vienen muchos y no quiero que desborde la memoria.

In [2]:
RAW_DATASET_PATH = "../data/raw/etsiaab_temp_06_25.csv"
PROCESSED_DATASET_PATH = "../data/processed/etsiaab_temp_06_25.csv"

df = pd.read_csv(
    RAW_DATASET_PATH,
    index_col="Line#",
    decimal=",",
    parse_dates=["Date"],
    date_format="%d/%m/%y %H:%M:%S %z"
)

df.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
Index: 188 entries, 1 to 188
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype                    
---  ------  --------------  -----                    
 0   Date    188 non-null    datetime64[ns, UTC+02:00]
 1   2LR2    188 non-null    float64                  
 2   2TR2    188 non-null    float64                  
 3   2MR2    188 non-null    float64                  
 4   1LR2    188 non-null    float64                  
 5   1TR2    188 non-null    float64                  
 6   1MR2    188 non-null    float64                  
dtypes: datetime64[ns, UTC+02:00](1), float64(6)
memory usage: 11.8 KB


Vale, el dataset de ejemplo ocupa bastante poquito. Veamos las primeras filas a ver si se ha leído correctamente.

In [3]:
df.head()

Unnamed: 0_level_0,Date,2LR2,2TR2,2MR2,1LR2,1TR2,1MR2
Line#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,2025-06-27 00:00:00+02:00,19.6,19.67,19.77,19.35,20.18,19.92
2,2025-06-27 00:30:00+02:00,19.17,19.37,19.28,19.09,19.62,19.3
3,2025-06-27 01:00:00+02:00,18.25,18.23,18.25,18.14,18.62,18.36
4,2025-06-27 01:30:00+02:00,17.09,16.94,16.71,16.66,17.2,16.92
5,2025-06-27 02:00:00+02:00,16.54,16.17,16.11,16.26,16.49,16.11


## Ingeniería de características

Vamos a crear nuevas características que nos pueden ayudar a encontrar información relevante.

### Información del momento del día

Añadiremos información de en qué momento del día estamos (`'dawn'`, `'sunrise'`, `'noon'`, `'sunset'` o `'dusk'`). Para ello, lo primero es elegir la geoposición que nos interesa, que la he cogido a ojo en Google Maps pero que lo mismo he fallado porque la he puesto en todo el centro de la escuela.

In [4]:
LAT, LON = 40.45, -3.73

Ahora sí creamos la nueva columna (tarda, ya que en cada fila se tienen que hacer cálculos

In [5]:
df["Solar event"] = df["Date"].apply(lambda dt: solar_info(40.45, -3.73, dt))
df

Unnamed: 0_level_0,Date,2LR2,2TR2,2MR2,1LR2,1TR2,1MR2,Solar event
Line#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2025-06-27 00:00:00+02:00,19.60,19.67,19.77,19.35,20.18,19.92,dawn
2,2025-06-27 00:30:00+02:00,19.17,19.37,19.28,19.09,19.62,19.30,dawn
3,2025-06-27 01:00:00+02:00,18.25,18.23,18.25,18.14,18.62,18.36,dawn
4,2025-06-27 01:30:00+02:00,17.09,16.94,16.71,16.66,17.20,16.92,dawn
5,2025-06-27 02:00:00+02:00,16.54,16.17,16.11,16.26,16.49,16.11,dawn
...,...,...,...,...,...,...,...,...
184,2025-06-30 19:30:00+02:00,25.91,26.21,26.34,26.04,26.32,26.34,sunset
185,2025-06-30 20:00:00+02:00,26.53,26.66,26.87,26.62,26.94,27.02,sunset
186,2025-06-30 20:30:00+02:00,28.08,27.95,28.14,27.84,27.99,28.20,sunset
187,2025-06-30 21:00:00+02:00,29.92,30.13,30.24,29.94,30.18,30.35,sunset


In [6]:
df.to_csv(PROCESSED_DATASET_PATH)
new_df = pd.read_csv(PROCESSED_DATASET_PATH)
new_df

Unnamed: 0,Line#,Date,2LR2,2TR2,2MR2,1LR2,1TR2,1MR2,Solar event
0,1,2025-06-27 00:00:00+02:00,19.60,19.67,19.77,19.35,20.18,19.92,dawn
1,2,2025-06-27 00:30:00+02:00,19.17,19.37,19.28,19.09,19.62,19.30,dawn
2,3,2025-06-27 01:00:00+02:00,18.25,18.23,18.25,18.14,18.62,18.36,dawn
3,4,2025-06-27 01:30:00+02:00,17.09,16.94,16.71,16.66,17.20,16.92,dawn
4,5,2025-06-27 02:00:00+02:00,16.54,16.17,16.11,16.26,16.49,16.11,dawn
...,...,...,...,...,...,...,...,...,...
183,184,2025-06-30 19:30:00+02:00,25.91,26.21,26.34,26.04,26.32,26.34,sunset
184,185,2025-06-30 20:00:00+02:00,26.53,26.66,26.87,26.62,26.94,27.02,sunset
185,186,2025-06-30 20:30:00+02:00,28.08,27.95,28.14,27.84,27.99,28.20,sunset
186,187,2025-06-30 21:00:00+02:00,29.92,30.13,30.24,29.94,30.18,30.35,sunset
