## Pasos del proyecto

1. Elegir temática
    1. **Elegir una temática y buscar datos en función de la misma**. Es algo más complejo que la segunda opción ya que puede ocurrir que no encuentres datos que te sirvan para tu propósito, o que tengas que conseguirlos mediante APIs, scrapping, descarga de BBDD.... No obstante, esta opción es la que más se asemejaría a un caso real, ya que una gran problemática en este tipo de proyectos es encontrar datos.

    2. **Directamente ir a páginas de datos abiertos y escoger un dataset de tu interés**. Si no tienes clara la temática puedes buscar en alguno de los enlaces de datos (recomendable Kaggle) y escoger un tema que te motive.


La tematica será relacionada con las platas solares fotovoltaicas.

Me gustaría conocer si los paises europeos que apuestan por está energía renovable es unicamente por la irradiancia solar que reciben o si hay algún parámetro más que afecte.

He buscando en las siguientes plataformas que datos hay disponibles para europa:

- https://datacommons.org/
    - Poblacion total
    - Esperanza de vida
    - Personas desempleadas
    - Nivel de educación
    - Generación anual de energía solar
    - Capacidad instalada de energía solar
    - Flujo financiero de apoyo a la energía solar
    - Precipitación media
    - Superficie país
- https://datacatalog.worldbank.org
    - Irradiancia solar



2. Obtención de datos. ¿Puedes llevar a cabo el proyecto con estos datos?

De momento espero que los datos sean suficientes. En caso de necesidad, volvería a buscar más información.

3. Define tu hipótesis. ¿Qué piensas que puedes obtener de estos datos? ¿Qué vas a poder resolver? ¿Cómo lo vas a llevar a cabo?

- ¿Qué características tiene un país europeo que apuesta por la energía solar fotovoltaica?
- ¿Dependerá del tipo de población?
    - Habitantes / km2
    - Nivel de educación
    - Esperanza de vida
    - Actividad profesional 
- ¿Dependerá del clima?
    - Precipitaciones medias
- ¿Dependerá de la superficie del país?
    - Superficie km2
    - Superficie verde
- ¿Dependen exclusivamente de la irradiación horizontal global (GHI)? – información de contraste para los 10 países con mayor capacidad de producción


4. Preprocesado: obtén todos los datos de las diferentes fuentes que hayas utilizado, juntalos y ponlos en un formato entendible.

In [133]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


#### Hipótesis de comprobación

Los primeros datos que voy a tratar son los más relevantes para la obtención de las preguntas generadas.
Esta información debe estar alineadada con la capacidad productiva y generada del país. 



In [134]:
df_sol = pd.read_excel("./0_INFORMACION_INICIAL/solargis_pvpotential_countryranking_2020_data.xlsx", header= 1)

df_sol.head(5)


Unnamed: 0,ISO_A3,Country or region,Note,World Bank \nRegion,"Total population, 2018","Total area, 2018",Evaluated area,Level 1 area \n(% of evaluated area),"Human development \nIndex, 2017","Gross domestic product (USD per capita), 2018",...,"Average practical potential \n(PVOUT Level 1, \nkWh/kWp/day), long-term","Average economic potential (LCOE, USD/kWh), 2018","Average PV \nseasonality index, long-term","PV equivalent area (% of total area), long-term","Cummulative installed PV capacity (MWp), 2018","Cummulative installed PV capacity (Wp per capita), 2018","Access to electricity\n(% of rural population), 2016","Electric power consumption (kWh per capita), 2014","Reliability of supply and transparency of tariff index, 2019","Approximate electricity \nTariffs for SMEs \n(US cent/kWh), 2019"
0,ABW,Aruba (Neth.),,Other,105845,180.0,180,0.847926,,25630.266492,...,4.9646,0.0853,1.1803,,6.1,57.631442,92.452844,,,
1,AFG,Afghanistan,,SOA,37172386,652860.0,652860,0.58735,0.497695,520.896603,...,5.0159,0.0851,1.6665,,22.0,0.591837,78.961074,,0.0,17.6
2,AGO,Angola,,AFR,30809762,1246700.0,1246700,0.751833,0.581179,3432.385736,...,4.6586,0.0919,1.3211,0.003,13.4,0.434927,15.984209,312.228825,3.0,4.6
3,ALB,Albania,,ECA,2866376,27400.0,27400,0.614858,0.784911,5253.630064,...,4.0426,0.1051,2.3266,0.21,1.0,0.348873,100.0,2309.366503,3.0,8.7
4,AND,Andorra,,Other,77006,470.0,470,0.110961,0.857684,42029.762737,...,4.199,0.0986,2.1531,,0.0,0.0,100.0,,,


In [135]:
#Quito las columnas que creo que no interesan:
df_sol_1 = df_sol.copy()
df_sol_1.drop(columns=["ISO_A3", "Note", "World Bank \nRegion", "Evaluated area"], inplace=True)
df_sol_1.head(3)


Unnamed: 0,Country or region,"Total population, 2018","Total area, 2018",Level 1 area \n(% of evaluated area),"Human development \nIndex, 2017","Gross domestic product (USD per capita), 2018","Average theoretical potential (GHI, kWh/m2/day), \nlong-term","Average practical potential \n(PVOUT Level 1, \nkWh/kWp/day), long-term","Average economic potential (LCOE, USD/kWh), 2018","Average PV \nseasonality index, long-term","PV equivalent area (% of total area), long-term","Cummulative installed PV capacity (MWp), 2018","Cummulative installed PV capacity (Wp per capita), 2018","Access to electricity\n(% of rural population), 2016","Electric power consumption (kWh per capita), 2014","Reliability of supply and transparency of tariff index, 2019","Approximate electricity \nTariffs for SMEs \n(US cent/kWh), 2019"
0,Aruba (Neth.),105845,180.0,0.847926,,25630.266492,6.1098,4.9646,0.0853,1.1803,,6.1,57.631442,92.452844,,,
1,Afghanistan,37172386,652860.0,0.58735,0.497695,520.896603,5.4904,5.0159,0.0851,1.6665,,22.0,0.591837,78.961074,,0.0,17.6
2,Angola,30809762,1246700.0,0.751833,0.581179,3432.385736,5.7467,4.6586,0.0919,1.3211,0.003,13.4,0.434927,15.984209,312.228825,3.0,4.6


In [136]:
#Como la tabla es muy grande voy a separar por partes los indicadores
titulo =list(df_sol_1.columns)
titulo

['Country or region',
 'Total population, 2018',
 'Total area, 2018',
 'Level 1 area \n(% of evaluated area)',
 'Human development \nIndex, 2017',
 'Gross domestic product (USD per capita), 2018',
 'Average theoretical potential (GHI, kWh/m2/day), \nlong-term',
 'Average practical potential \n(PVOUT Level 1, \nkWh/kWp/day), long-term',
 'Average economic potential (LCOE, USD/kWh), 2018',
 'Average PV \nseasonality index, long-term',
 'PV equivalent area (% of total area), long-term',
 'Cummulative installed PV capacity (MWp), 2018',
 'Cummulative installed PV capacity (Wp per capita), 2018',
 'Access to electricity\n(% of rural population), 2016',
 'Electric power consumption (kWh per capita), 2014',
 'Reliability of supply and transparency of tariff index, 2019',
 'Approximate electricity \nTariffs for SMEs \n(US cent/kWh), 2019']

In [137]:
#Parametros básicos
df_sol_basic= df_sol_1.copy()
df_sol_basic = df_sol_basic.loc[:, titulo[0:6]]
df_sol_basic
#Parametros de plantas
df_sol_pv= df_sol_1.copy()
titulo_seleccionado = [titulo[0]] + (titulo[7:11])  #Dos listas sumadas
titulo_seleccionado
df_sol_pv = df_sol_pv.loc[:,titulo_seleccionado]
df_sol_pv
#Parametros del sector
df_sol_sector= df_sol_1.copy()
titulo_seleccionado = [titulo[0]] + (titulo[12:17])  #Dos listas sumadas
titulo_seleccionado
df_sol_sector = df_sol_sector.loc[:,titulo_seleccionado]
df_sol_sector

Unnamed: 0,Country or region,"Cummulative installed PV capacity (Wp per capita), 2018","Access to electricity\n(% of rural population), 2016","Electric power consumption (kWh per capita), 2014","Reliability of supply and transparency of tariff index, 2019","Approximate electricity \nTariffs for SMEs \n(US cent/kWh), 2019"
0,Aruba (Neth.),57.631442,92.452844,,,
1,Afghanistan,0.591837,78.961074,,0.0,17.6
2,Angola,0.434927,15.984209,312.228825,3.0,4.6
3,Albania,0.348873,100.000000,2309.366503,3.0,8.7
4,Andorra,0.000000,100.000000,,,
...,...,...,...,...,...,...
204,Kosovo,,,,2.0,
205,Republic of Yemen,5.263400,57.691162,219.799922,,
206,South Africa,44.285510,67.921435,4197.907047,4.0,14.8
207,Zambia,0.288154,2.657746,717.349168,4.0,4.7


In [138]:
#Estudiamos como esta la info del primer slicing, indicadores básicos
df_sol_basic.info()
#Vamos a renombrar las columnas porque dan problemas, los caracteres raros e intros
df_sol_basic.columns= ["Pais", "Poblacion_2018", "Area_2018", "Area_Evaluada_Porcentaje", "Desarrollo_humano_2017", "PIB_USD_2018"]
df_sol_basic

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 6 columns):
 #   Column                                         Non-Null Count  Dtype  
---  ------                                         --------------  -----  
 0   Country or region                              209 non-null    object 
 1   Total population, 2018                         209 non-null    int64  
 2   Total area, 2018                               207 non-null    float64
 3   Level 1 area 
(% of evaluated area)            209 non-null    float64
 4   Human development 
Index, 2017                 185 non-null    float64
 5   Gross domestic product (USD per capita), 2018  200 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 9.9+ KB


Unnamed: 0,Pais,Poblacion_2018,Area_2018,Area_Evaluada_Porcentaje,Desarrollo_humano_2017,PIB_USD_2018
0,Aruba (Neth.),105845,180.0,0.847926,,25630.266492
1,Afghanistan,37172386,652860.0,0.587350,0.497695,520.896603
2,Angola,30809762,1246700.0,0.751833,0.581179,3432.385736
3,Albania,2866376,27400.0,0.614858,0.784911,5253.630064
4,Andorra,77006,470.0,0.110961,0.857684,42029.762737
...,...,...,...,...,...,...
204,Kosovo,1845300,10887.0,0.790152,,4281.292329
205,Republic of Yemen,28498687,527970.0,0.515187,0.451900,944.408499
206,South Africa,57779622,1213090.0,0.842475,0.699030,6374.015446
207,Zambia,17351822,743390.0,0.804838,0.588083,1539.900158


In [139]:
#Estudiamos como esta la info del primer slicing, indicadores básicos
df_sol_pv.info()
#Vamos a renombrar las columnas porque dan problemas, los caracteres raros e intros
df_sol_pv.columns= ["Pais", "Potencial_kWh/kWp/day", "Potencial_Economico_USD/kWh_2018", "BORRAR", "Area_equivalente_PV"]
df_sol_pv.drop(columns=["BORRAR"], inplace=True)
df_sol_pv

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 5 columns):
 #   Column                                                                 Non-Null Count  Dtype  
---  ------                                                                 --------------  -----  
 0   Country or region                                                      209 non-null    object 
 1   Average practical potential 
(PVOUT Level 1, 
kWh/kWp/day), long-term  209 non-null    float64
 2   Average economic potential (LCOE, USD/kWh), 2018                       209 non-null    float64
 3   Average PV 
seasonality index, long-term                               209 non-null    float64
 4   PV equivalent area (% of total area), long-term                        135 non-null    float64
dtypes: float64(4), object(1)
memory usage: 8.3+ KB


Unnamed: 0,Pais,Potencial_kWh/kWp/day,Potencial_Economico_USD/kWh_2018,Area_equivalente_PV
0,Aruba (Neth.),4.9646,0.0853,
1,Afghanistan,5.0159,0.0851,
2,Angola,4.6586,0.0919,0.003
3,Albania,4.0426,0.1051,0.210
4,Andorra,4.1990,0.0986,
...,...,...,...,...
204,Kosovo,3.6959,0.1150,
205,Republic of Yemen,5.2137,0.0818,0.005
206,South Africa,5.0036,0.1146,0.116
207,Zambia,4.8281,0.0880,0.007


In [140]:
#Estudiamos como esta la info del primer slicing, indicadores básicos
df_sol_sector.info()
#Vamos a renombrar las columnas porque dan problemas, los caracteres raros e intros
df_sol_sector.columns= ["Pais", "Capacidad_instalada_2018", "Acceso_electricidad_rural_2016", "Consumo_kWh_per_capita_2014","borrar" ,"Tarifa_luz_USD/kWh"]
df_sol_sector.drop(columns=["borrar"], inplace=True)
df_sol_sector

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 6 columns):
 #   Column                                                          Non-Null Count  Dtype  
---  ------                                                          --------------  -----  
 0   Country or region                                               209 non-null    object 
 1   Cummulative installed PV capacity (Wp per capita), 2018         203 non-null    float64
 2   Access to electricity
(% of rural population), 2016             198 non-null    float64
 3   Electric power consumption (kWh per capita), 2014               137 non-null    float64
 4   Reliability of supply and transparency of tariff index, 2019    183 non-null    float64
 5   Approximate electricity 
Tariffs for SMEs 
(US cent/kWh), 2019  182 non-null    float64
dtypes: float64(5), object(1)
memory usage: 9.9+ KB


Unnamed: 0,Pais,Capacidad_instalada_2018,Acceso_electricidad_rural_2016,Consumo_kWh_per_capita_2014,Tarifa_luz_USD/kWh
0,Aruba (Neth.),57.631442,92.452844,,
1,Afghanistan,0.591837,78.961074,,17.6
2,Angola,0.434927,15.984209,312.228825,4.6
3,Albania,0.348873,100.000000,2309.366503,8.7
4,Andorra,0.000000,100.000000,,
...,...,...,...,...,...
204,Kosovo,,,,
205,Republic of Yemen,5.263400,57.691162,219.799922,
206,South Africa,44.285510,67.921435,4197.907047,14.8
207,Zambia,0.288154,2.657746,717.349168,4.7


In [141]:
#Ya tengo el dataset un poco más organizado
df_sol_basic.head(3)

Unnamed: 0,Pais,Poblacion_2018,Area_2018,Area_Evaluada_Porcentaje,Desarrollo_humano_2017,PIB_USD_2018
0,Aruba (Neth.),105845,180.0,0.847926,,25630.266492
1,Afghanistan,37172386,652860.0,0.58735,0.497695,520.896603
2,Angola,30809762,1246700.0,0.751833,0.581179,3432.385736


In [142]:
df_sol_pv.head(3)

Unnamed: 0,Pais,Potencial_kWh/kWp/day,Potencial_Economico_USD/kWh_2018,Area_equivalente_PV
0,Aruba (Neth.),4.9646,0.0853,
1,Afghanistan,5.0159,0.0851,
2,Angola,4.6586,0.0919,0.003


In [143]:
df_sol_sector.head(3)

Unnamed: 0,Pais,Capacidad_instalada_2018,Acceso_electricidad_rural_2016,Consumo_kWh_per_capita_2014,Tarifa_luz_USD/kWh
0,Aruba (Neth.),57.631442,92.452844,,
1,Afghanistan,0.591837,78.961074,,17.6
2,Angola,0.434927,15.984209,312.228825,4.6


5. Limpia los datos: duplicados, missings, columnas inútiles...

6. Exploratorio: obtén todos los estadísticos y gráficos que necesites para entender bien tu dataset.

7. Concluye con tu análisis si estabas o no en lo cierto acerca de tu planteamiento y tu hipótesis.

8. (Bonus track). Crea un dashboard para representar tu análisis exploratorio.

9. Documenta tu proyecto y súbelo a GitHub.