# Summaries:

Before we look at the conclusions, let's see how the data looked before it was cleaned, and after it was cleaned, and explain the differences:


## Cleaning:

In [4]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sqlite3

original_solar_df = pd.read_csv('csv_files/solar_power_by_country.csv')
print(original_solar_df.info())
print(original_solar_df.isnull().sum())
original_solar_df.head(5)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Country or territory          80 non-null     object 
 1   2016_New                      36 non-null     float64
 2   2016_Total                    75 non-null     float64
 3   2017_New                      36 non-null     float64
 4   2017_Total                    75 non-null     float64
 5   2018_New                      20 non-null     float64
 6   2018_Total                    80 non-null     int64  
 7   2019_New                      14 non-null     float64
 8   2019_Total                    80 non-null     int64  
 9   2020_New                      27 non-null     float64
 10  2020_Total                    80 non-null     int64  
 11  W per capita 2019             59 non-null     float64
 12  Share of total consumption %  42 non-null     float64
dtypes: floa

Unnamed: 0,Country or territory,2016_New,2016_Total,2017_New,2017_Total,2018_New,2018_Total,2019_New,2019_Total,2020_New,2020_Total,W per capita 2019,Share of total consumption %
0,China,34540.0,78070.0,53000.0,131000.0,45000.0,175018,30100.0,204700,49655.0,254355,147.0,6.2
1,European Union,,101433.0,,107150.0,8300.0,115234,16000.0,134129,18788.0,152917,295.0,6.0
2,United States,14730.0,40300.0,10600.0,51000.0,10600.0,53184,13300.0,60682,14890.0,75572,231.0,3.4
3,Japan,8600.0,42750.0,7000.0,49000.0,6500.0,55500,7000.0,63000,4000.0,67000,498.0,8.3
4,Germany,1520.0,41220.0,1800.0,42000.0,3000.0,45930,3900.0,49200,4583.0,53783,593.0,9.7


This original Data Frame, taken from the "Solar power by country" dataset off of Kaggle: https://www.kaggle.com/datasets/prasertk/solar-power-by-country, had a few issues:
- Tons of missing values, especially in the New columns, w per capita column, and share of total consumption columns.
- Had column names starting with numbers, which is not ideal for interacting with a SQL database.
- Lacked categorical columns to group the data by.

I fixed these columns by:
- Creating a function that filled the missing "new" values by subtracting the previous year from the following year.
- Created a column named region, that labeled the country into the correct continent.
- Filled in missing values for "w per capita 2019" and "Share of total consumption" of my countries based on averages from their regions, specifically looking at the coefficient variations, but also applying common sense through summary statistics and looking at visual distributions. 
    - Africa, Asia, and Europe had small enough variations and close enough averages that I used median and mean values to fill in missing values.
- Changed the names of the columns to snake case.
- Created two Data Frames, one that maintained missing values (unless averages GREATLY described missing values), and one that contained aggregations. 

I then added the cleaned Data Frame into the csv_collection folder. After cleaning, the aggregated Data Frame looked like this:

In [5]:
aggregated_solar_df = pd.read_csv('csv_files/solar_cleaned_aggregated.csv')

print(aggregated_solar_df.info())
print(aggregated_solar_df.isnull().sum())
display(aggregated_solar_df.head(5))


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   country                     80 non-null     object 
 1   _2016_total                 75 non-null     float64
 2   _2017_new                   75 non-null     float64
 3   _2017_total                 75 non-null     float64
 4   _2018_new                   75 non-null     float64
 5   _2018_total                 80 non-null     int64  
 6   _2019_new                   80 non-null     float64
 7   _2019_total                 80 non-null     int64  
 8   _2020_new                   80 non-null     float64
 9   _2020_total                 80 non-null     int64  
 10  w_per_capita_2019           69 non-null     float64
 11  share_of_total_consumption  69 non-null     float64
 12  region                      79 non-null     object 
dtypes: float64(8), int64(3), object(2)
me

Unnamed: 0,country,_2016_total,_2017_new,_2017_total,_2018_new,_2018_total,_2019_new,_2019_total,_2020_new,_2020_total,w_per_capita_2019,share_of_total_consumption,region
0,China,78070.0,53000.0,131000.0,45000.0,175018,30100.0,204700,49655.0,254355,147.0,6.2,Asia
1,European Union,101433.0,5717.0,107150.0,8300.0,115234,16000.0,134129,18788.0,152917,295.0,6.0,
2,United States,40300.0,10600.0,51000.0,10600.0,53184,13300.0,60682,14890.0,75572,231.0,3.4,North America
3,Japan,42750.0,7000.0,49000.0,6500.0,55500,7000.0,63000,4000.0,67000,498.0,8.3,Asia
4,Germany,41220.0,1800.0,42000.0,3000.0,45930,3900.0,49200,4583.0,53783,593.0,9.7,Europe


The count of missing values went from 336 down to 43, the names make sense for working with in pandas and SQL, and the countries can now be grouped by region.


## Exploration:

The goals as laid out in the README:
- What has been the general trend of solar generation from 2016 - 2020?
- What countries are leading the way in solar generation, and how do their trends compare to the rest of the world?
- How do different regions compare to each other in solar growth? How correlated is growth within a region?
- How do other factors, such as GDP, play into trends of solar growth?
- Create Data Frames that are ideal for Tableau visualizations

General trends:

<!-- ![General Trend Graph](summary_images/general_solar_trends.png) -->

<a href="summary_images/general_solar_trends.png" target="_blank">
  <img src="summary_images/general_solar_trends.png" alt="General Trend Graph" width="1000">
</a>



Countries leading the way in solar and how its (China's) trend compares to the rest of the world. Also, how different regions compare to each other:



<!-- ![Leading countries:](summary_images/leading_countries.png) -->
<a href="summary_images/leading_countries.png" target="_blank">
  <img src="summary_images/leading_countries.png" alt="General Trend Graph" width="1000">
</a>

How do different factors, such as GDP, play into solar power growth:

<!-- ![Gdp findings](summary_images/gdp_findings.png) -->

<a href="summary_images/gdp_findings.png" target="_blank">
  <img src="summary_images/gdp_findings.png" alt="General Trend Graph" width="800" height="auto">
</a>


<a href="summary_images/w_per_capita.png" target="_blank">
  <img src="summary_images/w_per_capita.png" alt="General Trend Graph" width="800" height="auto">
</a>

# Conclusions:

What has been the general trend of solar generation from 2016 - 2020?

Solar power generation increased consistently from 2016 to 2020, with the total amount added each year also rising. While the overall growth rate remained relatively stable across most regions, Europe saw a gradual increase, and Asia experienced a sharp drop in growth rate after a major surge in 2016 — largely driven by China. This means that although the rate of expansion is slowing/stable in some regions, the absolute amount of solar power being added to the grid continues to grow each year.

What countries are leading the way in solar generation, and how do their trends compare to the rest of the world?

China is by far leading the way in solar generation, with the United States being the only other outlier. China in fact, accounted for more solar generation than any individual continent in every year. However, China's share of the total amount of global solar is leveling off, as the rates of other countries increase their development, and their's decreases. 

How do different regions compare to each other in solar growth? How correlated is growth within a region?

Solar growth by continent turns out to be strongly correlated. Looking at the growth rate of the ten fastest growing countries, they are strong patterns emerging by continent. The top four rate-increasers from 2017-2018 are all in Central/South America, and they all followed the pattern of huge 2018 rate increase, a decrease into 2019, and a leveling off into 2020. Conversly, the UAE, Oman, and Saudi Arabia, followed a relatively opposite growth pattern to the Latin American countries. This indicates that growth rates of solar follow strong patterns per region, and that there are likely larger economic and political events that are shaping these patterns, not simply individual country economics.

How do other factors, such as GDP, play into trends of solar growth?

The relationship between GDP and solar generation is fascinating. Countries around the world take very different paths in how they produce solar energy, regardless of their economic standing. For instance, China has a high total GDP but low GDP per capita, and it leads the world in total solar production — yet its per capita solar generation is relatively low. Europe presents the inverse: higher GDP per capita and relatively higher per capita solar production, but far less total output. Let’s look at each graph to better understand these relationships.

GDP Findings:
Graph 1: This scatterplot shows an inverse relationship between GDP per capita and total solar production, with lower-income regions (like Asia, led by China) producing more solar power. However, this trend is skewed by China’s scale, making it less accurate when generalized across continents like Europe or North America. This shows that GDP alone is not a good determinator of solar growth globally, but it can paint an interesting picture when comparing individual continents and countries.
Graph 2: This bar graph confirms the inverse relationship shown in Graph 1: as GDP per capita increases, total solar output tends to decrease. It reinforces that solar generation is not solely dependent on wealth.
- Graph 3: Instead of total production compared with per capita gdp, this graph shows us production per capita findings. Understandably, this levels the playing field considerably; now regions that have low per capita gdps, like Africa, have a higher ratio. This essentially shows a level of efficiency. Where even though regions in Africa and Asia have low gdps per capita, they output a level of solar that is higher than Europeans and North Americans relative to their income. 
- Graph 4: This ratio shows us the correlation between total solar production and net gdp. These findings are more predictable than the other graphs, but equally as insightful. Asia has a high gdp and high solar output, ranking it at the top. However, North America, known for its high gdp but low solar output, is predictably at the bottom. This clearly shows which countries are using their economic power towards solar power the most efficiently, and is my choice for the graph that best illustrates this principle.

Wats per capita:
Overall generation of solar power is not corelated positevely with generation per capita. Australia and Europe clearly lead the way in solar per capita. While Asia and North America, aka China and the US, lag in solar per capita, even though they greatly outperform the rest of the world in total output.



### Tableau visualizations:

In order to work with one Data Frame in Tableau that contains the columns of as much data as possible, but also spreads the data cleanly out by year, I melted the data to make it horizontally layed out instead of vertically. Here you can see the columns I added, and the shape of the data, as to easily visualize changes by year:

Please view the dashboard in Tableau, in order to interact dynamically with the data and gain more insights into global solar growth.

In [6]:
tablea_df = pd.read_csv('csv_files/solar_by_year.csv')

print(tablea_df.info())
display(tablea_df.head(30))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 16 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   country                                400 non-null    object 
 1   region                                 395 non-null    object 
 2   w_per_capita_2019                      345 non-null    float64
 3   share_of_total_consumption             345 non-null    float64
 4   year                                   400 non-null    int64  
 5   total                                  390 non-null    float64
 6   new                                    310 non-null    float64
 7   percentage_share_consumption           395 non-null    float64
 8   avg_w_per_capita_2019                  395 non-null    float64
 9   avg_2019_new                           395 non-null    float64
 10  gdp_per_capita                         395 non-null    float64
 11  gdp_ne

Unnamed: 0,country,region,w_per_capita_2019,share_of_total_consumption,year,total,new,percentage_share_consumption,avg_w_per_capita_2019,avg_2019_new,gdp_per_capita,gdp_net,gdp_per_capita_total_generation_ratio,gdp_w_per_capita_ratio,gdp_net_total_generation_ratio,growth_rate
0,South Africa,Africa,44.0,2.0,2016,1450.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
1,Egypt,Africa,17.0,1.65,2016,48.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
2,Morocco,Africa,6.0,1.3,2016,202.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
3,Algeria,Africa,10.0,1.65,2016,219.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
4,Senegal,Africa,8.0,1.65,2016,43.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
5,Namibia,Africa,55.0,1.65,2016,36.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
6,Kenya,Africa,23.333333,1.65,2016,32.0,,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
7,South Africa,Africa,44.0,2.0,2017,1800.0,13.0,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
8,Egypt,Africa,17.0,1.65,2017,169.0,121.0,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
9,Morocco,Africa,6.0,1.3,2017,204.0,2.0,1.65,23.333333,504.571429,2409.0,2781.0,0.259814,103.242857,0.299935,
