The data is downloaded by the following solar panel conditions:

### Número de paneles necesarios para abastecer una casa

El número de paneles necesarios para abastecer una casa depende de varios factores, incluyendo el consumo promedio de energía de la casa, la irradiación solar en la ubicación, la eficiencia del sistema y las horas de sol al día. Vamos a desglosarlo:

---

### 1. **Consumo promedio de energía de una casa**
El consumo de energía varía según la región y los hábitos de los habitantes. Por ejemplo:

- En **Colombia**, una casa promedio consume entre **250 y 300 kWh/mes**.
- Esto equivale a aproximadamente:
  \[
  \text{Consumo diario promedio} = \frac{300 \, \text{kWh}}{30 \, \text{días}} = 10 \, \text{kWh/día}.
  \]

---

### 2. **Producción diaria por panel solar**
Para un panel de **0.59 kWp**, la energía diaria que genera depende de las horas de sol pico (HSP), que son las horas equivalentes de irradiación solar de 1000 W/m² al día. 

#### Supongamos:
- **HSP en Colombia**: Entre **4 y 6 horas/día** (depende de la región).
- Energía diaria por panel:
  \[
  \text{Energía generada} = \text{Potencia pico (kWp)} \times \text{HSP}.
  \]
  Por ejemplo, si HSP = 5:
  \[
  \text{Energía generada} = 0.59 \, \text{kWp} \times 5 \, \text{horas} = 2.95 \, \text{kWh/día}.
  \]

---

### 3. **Número de paneles necesarios**
Para cubrir **10 kWh/día**, se calcula:
\[
\text{Número de paneles} = \frac{\text{Consumo diario}}{\text{Energía diaria por panel}}.
\]
\[
\text{Número de paneles} = \frac{10}{2.95} \approx 4 \, \text{paneles}.
\]

---

### 4. **Factores adicionales**
- **Pérdidas del sistema**: Por cableado, inversor, inclinación, etc., se pierde entre **10% y 20%** de la energía.
  - Ajustando por pérdidas (\(20\%\)):
    \[
    \text{Número ajustado de paneles} = 4 \times 1.2 \approx 5 \, \text{paneles}.
    \]
- **Autonomía o almacenamiento**: Si deseas almacenar energía (por ejemplo, con baterías) para uso nocturno o en días nublados, necesitarás más paneles y un sistema de almacenamiento adecuado.

---

### **Conclusión**
Para una casa promedio en Colombia con un consumo de **10 kWh/día**, necesitarías alrededor de **5 paneles solares** de **0.59 kWp** cada uno, asumiendo condiciones óptimas de irradiación y un sistema bien diseñado.



Por tanto, para descargar los datos, consideramos valores tomados cada hora desde el año 2018 hasta el año 2023 para los 32 departamentos de Colombia.

Con una pérdida del sistema de 16% y un kWp de 5.9.

In [86]:
# Import libraries

import numpy as np
import pandas as pd
import os

In [87]:
# Load the data

# Ruta de la carpeta con los archivos CSV
folder_path = "../data/AllDepartments/"

# Lista para almacenar los DataFrames
dataframes = []

# Recorrer cada archivo CSV en la carpeta
for file_name in os.listdir(folder_path):
    if file_name.endswith(".csv"):  # Verificar que sea un archivo CSV
        # Ruta completa del archivo
        file_path = os.path.join(folder_path, file_name)
        
        # Leer el archivo, ignorando las primeras N filas y últimas M
        df = pd.read_csv(
            file_path,
            skiprows=10,       
            skipfooter=13,     
            engine="python"   # Necesario para skipfooter
        )
        
        # Añadir el nombre del departamento como columna
        department_name = file_name.replace(".csv", "")
        df["Departamento"] = department_name
        
        # Limpiar columnas innecesarias 
        df = df[["time", "P", "Gb(i)","Gr(i)","Gd(i)", "H_sun","T2m","WS10m","Departamento"]]
        
        # Agregar el DataFrame limpio a la lista
        dataframes.append(df)

# Combinar todos los DataFrames en uno solo
combined_df = pd.concat(dataframes, ignore_index=True)

In [88]:
combined_df.sample(10)

Unnamed: 0,time,P,Gb(i),Gr(i),Gd(i),H_sun,T2m,WS10m,Departamento
1123270,20200302:2230,95.6,0.0,1.13,45.67,13.3,24.73,0.9,Nariño
394293,20201227:2130,1456.99,386.97,4.94,102.53,13.76,28.62,3.66,Guajira
1529039,20180620:2330,0.0,0.0,0.0,0.0,0.0,22.91,0.0,Guaviare
1030607,20210805:2330,353.85,98.19,0.93,32.73,4.3,28.05,6.14,SanAndres
1592232,20190906:0030,0.0,0.0,0.0,0.0,0.0,25.06,0.76,Amazonas
1180431,20200909:1530,2319.32,680.24,23.9,128.68,67.73,29.81,1.72,Huila
686814,20180515:0630,0.0,0.0,0.0,0.0,0.0,22.95,1.38,Putumayo
173285,20191010:0530,0.0,0.0,0.0,0.0,0.0,19.48,0.41,Cauca
1360381,20230323:1330,2313.62,638.66,14.46,176.54,35.42,27.51,1.1,Sucre
723073,20220704:0130,0.0,0.0,0.0,0.0,0.0,22.82,1.1,Putumayo


In [89]:
combined_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1682688 entries, 0 to 1682687
Data columns (total 9 columns):
 #   Column        Non-Null Count    Dtype  
---  ------        --------------    -----  
 0   time          1682688 non-null  object 
 1   P             1682688 non-null  float64
 2   Gb(i)         1682688 non-null  float64
 3   Gr(i)         1682688 non-null  float64
 4   Gd(i)         1682688 non-null  float64
 5   H_sun         1682688 non-null  float64
 6   T2m           1682688 non-null  float64
 7   WS10m         1682688 non-null  float64
 8   Departamento  1682688 non-null  object 
dtypes: float64(7), object(2)
memory usage: 115.5+ MB


In [90]:
data = combined_df.copy()

In [91]:
print(data[52580:52590])  # Muestra las filas cercanas al error


                time        P   Gb(i)  Gr(i)   Gd(i)  H_sun    T2m  WS10m  \
52580  20231231:2030  2047.82  574.75  11.03  138.42  31.05  31.66   2.48   
52581  20231231:2130   863.82  205.48   4.01   92.16  18.27  30.86   2.69   
52582  20231231:2230  2703.08  634.60   4.01  346.16   4.93  30.02   2.76   
52583  20231231:2330     0.00    0.00   0.00    0.00   0.00  28.80   2.62   
52584  20180101:0030     0.00    0.00   0.00    0.00   0.00  20.87   0.21   
52585  20180101:0130     0.00    0.00   0.00    0.00   0.00  20.30   0.07   
52586  20180101:0230     0.00    0.00   0.00    0.00   0.00  20.05   0.21   
52587  20180101:0330     0.00    0.00   0.00    0.00   0.00  19.79   0.21   
52588  20180101:0430     0.00    0.00   0.00    0.00   0.00  19.41   0.21   
52589  20180101:0530     0.00    0.00   0.00    0.00   0.00  18.78   0.55   

         Departamento  
52580         Cordoba  
52581         Cordoba  
52582         Cordoba  
52583         Cordoba  
52584  NorteSantander  
52585  N

In [92]:
# Correct the time column

# Convert to datetime
data['time'] = pd.to_datetime(data['time'], format='%Y%m%d:%H%M')
data.sample(10)

Unnamed: 0,time,P,Gb(i),Gr(i),Gd(i),H_sun,T2m,WS10m,Departamento
1253833,2023-01-25 01:30:00,0.0,0.0,0.0,0.0,0.0,16.34,0.34,Risaralda
574309,2023-07-13 13:30:00,2421.42,643.74,13.27,144.44,37.58,11.99,0.97,Boyaca
516665,2022-12-14 17:30:00,2263.61,505.73,19.4,270.35,61.22,20.88,0.41,Quindio
963978,2019-12-29 18:30:00,2450.41,653.13,20.31,188.34,52.17,31.04,4.41,Casanare
93613,2022-09-06 13:30:00,2432.85,723.1,12.47,134.66,40.57,24.41,0.69,NorteSantander
663655,2021-09-22 07:30:00,0.0,0.0,0.0,0.0,0.0,23.26,0.69,Caqueta
56431,2018-06-10 07:30:00,0.0,0.0,0.0,0.0,0.0,22.21,0.69,NorteSantander
388590,2020-05-04 06:30:00,0.0,0.0,0.0,0.0,0.0,26.95,3.59,Guajira
926147,2021-09-04 11:30:00,409.02,84.0,1.56,55.93,8.03,11.74,0.41,Antioquia
930817,2022-03-18 01:30:00,0.0,0.0,0.0,0.0,0.0,14.5,0.69,Antioquia


In [93]:
# add loingitude and latitude

departments_coordinates = {
    'Amazonas': {'latitude': -3.5, 'longitude': -70.2},
    'Antioquia': {'latitude': 6.5, 'longitude': -75.5},
    'Arauca': {'latitude': 7.1, 'longitude': -70.7},
    'Atlantico': {'latitude': 10.6, 'longitude': -74.2},
    'Bolivar': {'latitude': 10.3, 'longitude': -75.5},
    'Boyaca': {'latitude': 5.5, 'longitude': -73.4},
    'Caldas': {'latitude': 5.1, 'longitude': -75.5},
    'Caqueta': {'latitude': 1.6, 'longitude': -75.6},
    'Casanare': {'latitude': 5.9, 'longitude': -72.4},
    'Cauca': {'latitude': 2.5, 'longitude': -76.6},
    'Cesar': {'latitude': 10.4, 'longitude': -73.8},
    'Choco': {'latitude': 5.7, 'longitude': -77.6},
    'Cordoba': {'latitude': 8.6, 'longitude': -75.9},
    'Cundinamarca': {'latitude': 4.1, 'longitude': -74.2},
    'Guaviare': {'latitude': 3.2, 'longitude': -72.6},
    'Guainia': {'latitude': 3.9, 'longitude': -67.5},
    'Huila': {'latitude': 2.9, 'longitude': -75.3},
    'Guajira': {'latitude': 11.0, 'longitude': -71.9},
    'Magdalena': {'latitude': 9.9, 'longitude': -74.2},
    'Meta': {'latitude': 3.6, 'longitude': -73.3},
    'Nariño': {'latitude': 1.2, 'longitude': -77.0},
    'NorteSantander': {'latitude': 7.8, 'longitude': -72.9},
    'Putumayo': {'latitude': 1.0, 'longitude': -75.4},
    'Quindio': {'latitude': 4.5, 'longitude': -75.6},
    'Risaralda': {'latitude': 5.0, 'longitude': -75.7},
    'SanAndres': {'latitude': 12.5, 'longitude': -81.7},
    'Santander': {'latitude': 7.6, 'longitude': -73.1},
    'Sucre': {'latitude': 9.0, 'longitude': -75.2},
    'Tolima': {'latitude': 4.0, 'longitude': -75.2},
    'ValleCauca': {'latitude': 3.4, 'longitude': -76.5},
    'Vaupes': {'latitude': 0.7, 'longitude': -69.5},
    'Vichada': {'latitude': 4.0, 'longitude': -69.3}
}


In [94]:
# Supongamos que tu DataFrame se llama data
data['latitude'] = data['Departamento'].map(lambda x: departments_coordinates.get(x, {}).get('latitude'))
data['longitude'] = data['Departamento'].map(lambda x: departments_coordinates.get(x, {}).get('longitude'))


In [95]:
data.sample(10)

Unnamed: 0,time,P,Gb(i),Gr(i),Gd(i),H_sun,T2m,WS10m,Departamento,latitude,longitude
1088496,2022-03-15 00:30:00,0.0,0.0,0.0,0.0,0.0,26.91,4.55,Atlantico,10.6,-74.2
1538619,2019-07-25 03:30:00,0.0,0.0,0.0,0.0,0.0,22.56,0.97,Guaviare,3.2,-72.6
380149,2019-05-18 13:30:00,2108.07,545.31,14.88,154.47,41.82,28.3,5.66,Guajira,11.0,-71.9
553932,2021-03-16 12:30:00,662.16,90.39,4.22,123.47,22.06,10.22,0.48,Boyaca,5.5,-73.4
608982,2021-06-27 06:30:00,0.0,0.0,0.0,0.0,0.0,17.22,0.48,Cundinamarca,4.1,-74.2
647739,2019-11-29 03:30:00,0.0,0.0,0.0,0.0,0.0,24.76,1.45,Caqueta,1.6,-75.6
367177,2023-11-24 01:30:00,0.0,0.0,0.0,0.0,0.0,27.55,2.0,Meta,3.6,-73.3
126944,2020-06-26 08:30:00,0.0,0.0,0.0,0.0,0.0,26.08,1.79,Magdalena,9.9,-74.2
733012,2023-08-22 04:30:00,0.0,0.0,0.0,0.0,0.0,23.8,0.83,Putumayo,1.0,-75.4
1305245,2022-12-07 05:30:00,0.0,0.0,0.0,0.0,0.0,21.77,1.38,Arauca,7.1,-70.7


In [96]:
# VErify it all worked correctly
data.isna().sum()

time            0
P               0
Gb(i)           0
Gr(i)           0
Gd(i)           0
H_sun           0
T2m             0
WS10m           0
Departamento    0
latitude        0
longitude       0
dtype: int64

In [97]:
data['P'].std()

np.float64(918.7572443558688)

Now, we associate a score based on the characteristics

In [98]:
from sklearn.preprocessing import MinMaxScaler


In [100]:
from sklearn.preprocessing import MinMaxScaler

# Inicializamos el MinMaxScaler
scaler = MinMaxScaler()

# Seleccionamos las columnas a normalizar
columns_to_normalize = ['Gb(i)','Gr(i)','Gd(i)', 'H_sun', 'T2m', 'WS10m', 'P']

# Ajustamos el scaler y transformamos las columnas seleccionadas
scaled_data = scaler.fit_transform(data[columns_to_normalize])

# Convertir el resultado escalado de nuevo a un DataFrame con las mismas columnas
scaled_df = pd.DataFrame(scaled_data, columns=columns_to_normalize)

# Ahora scaled_df contiene las columnas escaladas, pero el DataFrame original no se modifica



In [102]:
# Asignar los pesos
weights = {
    'G_i': 0.70,
    'h_sun': 0.10,
    'T2m': 0.10,
    'WS10m': 0.10,
}

# Calcular el score de viabilidad utilizando las columnas escaladas
data['viability_score'] = (scaled_df['Gb(i)'] * weights['G_i'] +
                          scaled_df['H_sun'] * weights['h_sun'] +
                          scaled_df['T2m'] * weights['T2m'] +
                          scaled_df['WS10m'] * weights['WS10m']) 


In [107]:
data.sample(10)

Unnamed: 0,time,P,G(i),H_sun,T2m,WS10m,Departamento,latitude,longitude,viability_score
725014,2022-09-22 22:30:00,141.1,329.28,6.06,28.88,0.28,Putumayo,1.0,-75.4,0.170794
430510,2019-02-14 22:30:00,102.75,242.44,6.6,28.9,0.9,Cesar,10.4,-73.8,0.151232
620441,2022-10-17 17:30:00,302.45,675.27,71.29,16.13,1.03,Cundinamarca,4.1,-74.2,0.29763
1234217,2020-10-29 17:30:00,335.86,781.17,68.57,21.96,0.9,Risaralda,5.0,-75.7,0.338527
617188,2022-06-04 04:30:00,0.0,0.0,0.0,10.47,0.62,Cundinamarca,4.1,-74.2,0.027646
132247,2021-02-02 07:30:00,0.0,0.0,0.0,25.88,1.52,Magdalena,9.9,-74.2,0.074608
587173,2018-12-31 13:30:00,180.02,382.46,31.59,10.47,1.52,Cundinamarca,4.1,-74.2,0.165731
432024,2019-04-19 00:30:00,0.0,0.0,0.0,24.67,1.17,Cesar,10.4,-73.8,0.069627
642218,2019-04-13 02:30:00,0.0,0.0,0.0,24.2,0.83,Caqueta,1.6,-75.6,0.066747
327331,2019-05-08 19:30:00,273.89,636.27,48.2,26.69,1.66,Meta,3.6,-73.3,0.295823


In [103]:
data['viability_score'].max()

np.float64(0.7964727809391401)

In [105]:
# Filtrar las filas donde G(i) > 0 
df_filtered = df[df['Gb(i)'] > 0]

# Ahora, calculamos el score promedio por departamento
# Utilizamos la función `groupby` para agrupar por departamento y luego calculamos el promedio de la columna 'viability_score'
department_score = df_filtered.groupby('Departamento')['viability_score'].mean().reset_index()

# Ver los primeros resultados
department_score.head()


KeyError: 'Column not found: viability_score'