# Summaries

Here I will give summaries of: 
- Data collection
- Cleaning
- Explorations
    - Affiliation insights - differences in efficiency and density between Red and Blue localities
    - Other insights - time series analysis, basic explorations of solar statistics in Virginia
    - Demographic explorations - correlations between demographic/sociographic statistics and solar farm density
- The null hypothesis

And also a final conclusion.

## Data Collection:

I collected data from https://en.wikipedia.org/wiki/2020_United_States_presidential_election_in_Virginia to compile 
countys that went red or blue in 2020 in Virginia.

I collected data from https://en.wikipedia.org/wiki/List_of_cities_and_counties_in_Virginia to compile a list of countys in VA and their demographic information.

I also downloaded 3 csvs from the US Department of Health and Human Services website, concerning age, income, and education data on Virginian counties.

I also downloaded a csv from the Virginian DEQ website that contained a list of all renewable energy projects that had been permitted in Virginia.

## Data Cleaning:

In the cleaning_counties.ipynb file, I cleaned the csvs on demographics and election info, and also merged them with further Health and Human Services csvs, all into one large Data Frame containing all the necessary demographic and sociographic info.

I then cleaned the DEQ Data Frame that contained the information on permitted solar farms.


For the main demographics csv, I did the following:
- Dropped eroneous columns
- Made the area readable in sq miles
- Made a population density column

I then cleaned the election csv, doing the following:
- Renamed columns and dropped eroneous ones
- Aligned the naming conventions with the naming conventions of the other csvs concerning counties and independent cities in Virginia.
- Created a political affiliation column

I then added the other health and human services csvs to the main demographic Data Frame, and then merged them all together, only containing the pertinent information. 

Here is the before and after of all the csvs/Data Frames on county demographics and sociographics:

In [8]:
import pandas as pd

va_pop_hist_original_df = pd.read_csv('csv_collection/county_population_hist_info.csv')
va_election_2020_original_df = pd.read_csv('csv_collection/virginia_2020_election.csv')
income_df = pd.read_csv('csv_collection/HDPulse_data_export.csv')
education_df = pd.read_csv('csv_collection/HDPulse_data_education.csv')
age_df = pd.read_csv('csv_collection/HDPulse_data_ages.csv')



print('The main demographic and sociographic DFs looked like this:')
display(va_pop_hist_original_df.head(3))
print('The election DF:')
display(va_election_2020_original_df.head(3))
print('The other health and human services csvs on income, education, and age:')
display(income_df.head(3))
display(education_df.head(3))
display(age_df.head(3))


The main demographic and sociographic DFs looked like this:


Unnamed: 0,County,FIPS code[5],County seat[6][7],Est.[6],Origin,Etymology,Population[8],Area[6],Map
0,Accomack County,1,Accomac,1663,Accomac Shire was established in 1634 as one o...,"From the Native American word Accawmack, meani...",33411,"455 sq mi (1,178 km2)",
1,Albemarle County,3,Charlottesville,1744,"In 1744, the Virginia General Assembly created...","Willem Anne van Keppel, 2nd Earl of Albemarle,...",117313,"723 sq mi (1,873 km2)",
2,Alleghany County,5,Covington,1822,Formed from parts of Bath and Botetourt counti...,Alleghany Mountains,14632,"446 sq mi (1,155 km2)",


The election DF:


Unnamed: 0,County/City,Joe Biden Democratic,Joe Biden Democratic.1,Donald Trump Republican,Donald Trump Republican.1,Various candidates Other parties,Various candidates Other parties.1,Margin,Margin.1,Total
0,County/City,#,%,#,%,#,%,#,%,Total
1,Accomack,7578,44.68%,9172,54.07%,212,1.25%,-1594,-9.39%,16962
2,Albemarle,42466,65.68%,20804,32.18%,1387,2.14%,21662,33.50%,64657


The other health and human services csvs on income, education, and age:


Unnamed: 0,County,FIPS,Value (Dollars),Rank within US (of 3141 counties)
0,United States,0,78538,
1,Virginia,51000,90974,12 of 52
2,Norton City,51720,38497,3071


Unnamed: 0,County,Value (Percent),People (Education: At Least Bachelor's Degree)
0,United States,35.0,79954302
1,Virginia,41.5,2471630
2,Covington City,9.6,367


Unnamed: 0,County,Value (Percent)
0,Virginia,35.7
1,United States,35.9
2,Radford City,21.8


In [9]:
# And the final merged data frame looked like this:
cleaned_counties_df = pd.read_csv('csv_collection/cleaned_counties.csv')
display(cleaned_counties_df.head(3))

Unnamed: 0,city/county,biden_votes,biden_%,trump_votes,trump_%,other_party_votes,other_party_%,margin_votes,margin_%,total_votes,population,area,pop_density_sqmi,affiliation_2020,median_household_income,bachelors_or_over_%,age_over_50_%
0,Accomack County,7578,44.68,9172,54.07,212,1.25,-1594,-9.39,16962,33411.0,455.0,73.430769,red,57500.0,21.8,47.3
1,Albemarle County,42466,65.68,20804,32.18,1387,2.14,21662,33.5,64657,117313.0,723.0,162.258645,blue,102617.0,60.6,38.3
2,Alexandria City,66240,80.28,14544,17.63,1724,2.09,51696,62.65,82508,,,,blue,113638.0,65.8,30.2


Now, all the counties and independent cities are labeled correctly in one dataframe, with all of their election, demographic, and sociographic information combined.

I then cleaned the DEQ data frame to drop all projects under 20mw, dropped eroneous columns, and renamed them for easy manipulation.

## Explorations:

### Affiliations:

In the affiliation_insights.ipynb file, I looked at:
- Percent of land dedicated to solar farms.
- MWs per capita.
- Solar farm efficiency.
- Frequency of solar projects.
- Graphing the distribution and spread of project sizes.

I found these distributions and spreads:

<a href="summary_images/affiliation_summary.png" target="_blank">
  <img src="summary_images/affiliation_summary.png" alt="General Trend Graph" width="800" height="auto">
</a>

And I found these statistics:

- 0.280 % of Red localities are currently solar farms 
- 0.500 % of Blue localities are currently solar farms 
- There are 0.556 MWs per 1000 people in Blue localities
- There are 1.760 MWs per 1000 people in Red localities
- Solar farms in Blue localities generate 41.4635377239465 MWs per mile
- Solar farms in Red localities generate 52.0385169459398 MWs per mile
- Red localities are 1.6775559588626738 times more likely to have solar projects in them.
- There are a total of 4289.6 mw on red affiliated localities
- There are a total of 1027.0 mw on blue affiliated localities
- There are 59 solar farms on red counties, and 19 on blue counties

And came to this summary:

There's a clear trend between Red and Blue localities when it comes to producing utility-scale solar power, that also reflects the trends from the demographic file. Although Blue localities allocate more land towards solar farms than Red ones do, Red localities are much more efficient in their production. This is indicative of how much more efficient large-scale projects are than small ones. Multiple small solar farms require additional buffer zones that add to project size but take away from MW production. Larger projects are able to squeeze in more solar panels per acre, creating more efficient solar farms. It's reasons like these that developers continue to prioritize rural counties for development, as not only do they feature more attractive zoning regulations, but they are much more profitable per square mile.

### Top ten counties:

I found this about the top ten counties:

<a href="summary_images/top_counties.png" target="_blank">
  <img src="summary_images/top_counties.png" alt="General Trend Graph" width="800" height="auto">
</a>

Summary:

The top ten counties follow some similar trends. Almost all of them have three or four projects, and almost all of them house 200 - 300 MW. It seems likely that these counties have adopted similar regulations on the size and density of solar projects. There are a couple outliers, Halifax and Louisa to be specific. Halifax has on average much smaller projects than most counties, but has a very high count. Louisa county has a normal number of projects, but the highest average size. 

### Time series:


<a href="summary_images/time_series.png" target="_blank">
  <img src="summary_images/time_series.png" alt="General Trend Graph" width="800" height="auto">
</a>

Summary:

The number of projects spread looks mostly aligned with the number of megawatts, meaning that as the number of projects increased so did the number of megawatts relatively equally. It took a few years for the amount of solar being added to peak in 2020, the same year as the Virginia Clean Economy Act, which makes sense. There seems to have been an influx of projects during that year, and production steadily leveled off until 2022, when it again spiked in 2023. One thing that is obvious from the graphs, is production was weaker pre-2020, it then spiked in 2020-2023, and it has begun to lower off again. Solar development has been unstable and volatile for the duration of the years analyzed.

### Demographics:

In order to find potential correlations between sociographic factors and solar farm density within counties, I did the following: I analyzed the distribution of solar farms across Virginia counties by income, education level, age, and population density. I then looked at the overal distribution of Virginian counties (using the same sociographics) but regardless of solar farms. I then compared the two distributions using Mann-Whitney and t-tests, to see if the distributions were statistially different. I also looked at the average differences, and visualized them as well. Here are my findings:


#### Income and solar farm density:

<a href="summary_images/incomes.png" target="_blank">
  <img src="summary_images/incomes.png" alt="General Trend Graph" width="800" height="auto">
</a>

And these findings:

- Mann-Whitney U Statistic: 4572.5
- P-value: 0.7966407201563974
- Average median household income of counties with solar farms is: 74032.92
- Average median household income of a VA county is: 74775.30

There is a high p-value (much greater than the 0.05 threshold), which indicates differences in the data are most likely due to random chance. The averages are also extrememly similar. Becaues the distribution and averages are so similar, this implies that there is no statistical significance of the effect of median household income on solar farm density in Virginia. 

#### Education and solar farm density:

<a href="summary_images/education.png" target="_blank">
  <img src="summary_images/education.png" alt="General Trend Graph" width="800" height="auto">
</a>

And these findings:

- Mann-Whitney U Statistic: 4031.5
- P-value: 0.25098648952341573
- The average percent of a county's adult population with bachelors degrees or over (that have solar farms), is: 25.95
- The average percent of a county's adult population with bachelors degrees or over, is: 29.07



There is a high p-value (much greater than the 0.05 threshold), which indicates differences in the data are most likely due to random chance. The averages are also extrememly similar. Becaues the distribution and averages are so similar, this implies that there is no statistical significance of the effect of education on solar farm density in Virginia.

#### Age and solar farm density:

<a href="summary_images/age.png" target="_blank">
  <img src="summary_images/age.png" alt="General Trend Graph" width="800" height="auto">
</a>

And these findings:
- Mann-Whitney U Statistic: 5175.5
- P-value: 0.0676279607784501
- The percent of a county's (that have solar farms) population who's age is over 50 is: 42.51
- The percent of a county's population who's age is over 50 is: 40.65

There is a high p-value (greater than the 0.05 threshold), which indicates differences in the data are most likely due to random chance. The averages are also extrememly similar. Becaues the distribution and averages are so similar, this implies that there is no statistical significance of the effect of age on solar farm density in Virginia.

## Null Hypothesis:

<a href="summary_images/pop_density.png" target="_blank">
  <img src="summary_images/pop_density.png" alt="General Trend Graph" width="800" height="auto">
</a>

These were my findings:
- Mann-Whitney U Statistic: 4031.5
- P-value: 0.25098648952341573
- Average population density per sq mi of countys with solar farms is: 162.03
- Average population density per sq mi of a VA county is: 246.54

There is a high p-value (greater than the 0.05 threshold), which indicates differences in the data are **most likely** due to random chance. However, there are some outliers in the solar density data set, which can heavily skew the probability test. Also, the visual graph tells a compelling story, as does the high difference in averages. Without further testing, I cannot statistically confirm that population density is a key factor in solar farm density. However, the big picture (means, visuals), is pointing to the fact that population density most likely plays a key factor in solar farm density per county. 

I cannot statistically disprove the null hypothesis. However, among the other sociographic and demographic findings, it does play the largest part in indicating where a solar farm will be permitted.

## Conclusion

While the statistical test (Mann–Whitney U) produced a high p-value (~0.25), suggesting no significant difference in population density between counties with and without solar farms, the average population density is noticeably lower in solar-hosting counties (162 vs. 247 people per sq mi). This gap, along with visual trends observed in the data, hints at a potential relationship that may not be statistically strong — possibly due to outliers or limited sample size — but is still worth noting as a practical pattern.

It certainly is a stronger relationship than any of the other demographic and sociographic factors, and it follows the trends found in the affiliations file. There are over 3 times as many solar farms on red localities (which are more rural) than blue localities, and there is about 4 times as many mw permitted on red localities than blue localities. This follows the pattern of population density playing a role in solar development. 

Although red localities make up much more land in Virginia, they are still permit solar at a higher rate than on blue localities. Blue localities by contrast, have solar power taking up a higher percentage of their land than red ones do, and they generate electricity at a lower rate as well – which points to the power of solar farms being more cost-effective when developed at scale.

This paints a picture consistent with the current business standard amongst developers. Solar power is more efficient, profitable, and available on counties with rural lands – and progressive/blue policies do not make up for the lack of land availability in terms of success in permitting projects and producing power. 

