# Block 2: Some Statistical Concepts

## Objectives 

Welcome to Block 2 of the DIFUSE Wind Energy Module!

By the end of this block, you should be able to: 
- Describe how wind speed is measured
- Understand how wind speed impacts power output,
- Outline the benefits and detriments of high wind speeds,
- Analyze wind speed data from different regions using tools such as frequency tables
- Work with wind data to draw with meaningful conclusions
- Estimate what the best wind speed is in a given region


<font color = red> Before attempting *Block 2*, </font>
- Ensure you clear all outputs by going to *Edit --> Clear all outputs*
- **RUN**  </font> the code block below. 

*To run a code block, move the cursor to the left edge of the
block, and click the play button that appears at the top left.*

![picture](https://drive.google.com/uc?id=150ersKJY7dH-pDBkkHhCXtrmcljtiB9x)

In [None]:
#@title
#PLEASE RUN BEFORE ATTEMPTING BLOCK
import pandas as pd
import csv
import tabulate as tab
import plotly.express as px

##**Wind Speeds and Power Output**

Wind speeds are very erratic and not at all deterministic. It is an **extremely** random parameter, thus, in order to build a wind power plant, it is necessary to gain heaps of data in order to come up with some meaningful conclusions. In fact, in the frequency tables you will encounter below, the data collected spans a period of $6$ years! Even then, more data is always better when hypothesizing whether a location is ideal. 

\\

---


\\
You have already seen how wind speed determines power output in Block 1; here we will talk about how feasible some power outputs are. All wind turbines have a minimum wind speed needed just to turn and generate some electricity, then an arbitrary range that will allow them to generate at full capacity, and finally a maximum or cutoff, where the wind turbine will stop in order to avoid damage. 

\\

---


\\
So, the wind turbine cannot just withstand enormous speeds and generate a ton of electricity. One has to determine what the best wind speeds are based on the height of the tower of the wind turbine and also on the design of the wind turbine. Luckily, there's already information on the best wind speeds for small wind turbines and even more information on the mean wind speeds of different regions; all of which you will learn below!

##**Frequency Tables**
Here we have the frequency tables of three different locations: 

*   Galveston, Texas
*   Penn State Greater Alleghany
*   Dartmouth College

A Frequency Table is a table that counts the frequency, or number of appearances, of a certain value. In this case, the number of hours in a period of 6 years that a certain wind speed has occured. For example, from 2015 - 2021, in Galveston, TX, the amount of hours spent at a wind speed of $ 2 \ m/s $ was $17,863$. The reason frequency tables are important is because they help us find the <font color = blue> **mean** </font> wind speed, which we will focus on later. 

Meanwhile, examine  the frequency tables and notice:


*   Are there any similarity/differences between these regions?
*   What wind speeds seem to be the most common?
*   Which location do you think is the most ideal based on the frequency tables? 
*   Is there any major variation between locations?

### Galveston Table

<font size=3> This is the Frequency Table for Galveston, TX. 

*  The first column 'WS50M' is the wind speed in meters per second at a height of 50 meters
*   The second column is the number of hours, during a span of 6 years, in which the wind speed was the corresponding speed next to it. 

$I.E.$ for the first row, 'WS50M' is a speed of $0 \ m/s$ and that was the speed for $2,206$ hours out of the $6$ years the data was measured.



In [None]:
#@title
## DO NOT EDIT 
#Frequency Table for Galveston
import pandas as pd

#Function to count frequency of wind speed values (to nearest whole number)
def freq_counter(speed_df, column_label):
  """Parameters:
  speed_df = (dataframe) WindSpeed Dataframe: Must contain a labelled column with wind_speed data 
  column_label = (string) label of the column with the wind speed data to be counted
  --------------"""
  #empty frequency dictionary
  freq_dict = {}

  #rounds speed value and counts it into dictionary
  for value in speed_df.loc[:,column_label]:
    if round(value) in freq_dict:
      freq_dict[round(value)] += 1
    else:
      freq_dict[round(value)] = 1
  
  #sorts frequency dictionary (orders wind speeds numerically)
  #creates and returns new dataframe of frequency table
  sorted_dict = sorted(freq_dict.items())
  new_df = pd.DataFrame(sorted_dict, columns = [column_label, "No. of hours"])
  return new_df

#Test
wind_data_full = pd.read_csv('https://raw.githubusercontent.com/difuse-dartmouth/wind-speed-power-analysis/main/completed_module/data/WindSpeedGalveston.csv')
test_df = freq_counter(wind_data_full,"WS50M")
test_df

### Penn State GA Table

<font size=3> This is the Frequency Table for Penn State Greater Alleghany. 

*  The first column 'WS50M' is the wind speed in meters per second at a height of 50 meters
*   The second column is the number of hours, during a span of 6 years, in which the wind speed was the corresponding speed next to it. 

$I.E.$ for the first row, 'WS50M' is a speed of $0 \ m/s$ and that was the speed for $316$ hours out of the $6$ years the data was measured.

In [None]:
#@title
## DO NOT EDIT
# Frequency Table for Penn State Greater Alleghany
import pandas as pd

#Function to count frequency of wind speed values (to nearest whole number)
def freq_counter(speed_df, column_label):
  """Parameters:
  speed_df = (dataframe) WindSpeed Dataframe: Must contain a labelled column with wind_speed data 
  column_label = (string) label of the column with the wind speed data to be counted
  --------------"""
  #empty frequency dictionary
  freq_dict = {}

  #rounds speed value and counts it into dictionary
  for value in speed_df.loc[:,column_label]:
    if round(value) in freq_dict:
      freq_dict[round(value)] += 1
    else:
      freq_dict[round(value)] = 1
  
  #sorts frequency dictionary (orders wind speeds numerically)
  #creates and returns new dataframe of frequency table
  sorted_dict = sorted(freq_dict.items())
  new_df = pd.DataFrame(sorted_dict, columns = [column_label, "No. of hours"])
  return new_df

#Test
wind_data_full = pd.read_csv('https://raw.githubusercontent.com/difuse-dartmouth/wind-speed-power-analysis/main/completed_module/data/PennStateGreaterAlleghany.csv')
test_df = freq_counter(wind_data_full,"WS50M")
test_df

### Dartmouth College Table

<font size=3> This is the Frequency Table for Dartmouth College. 

*  The first column 'WS50M' is the wind speed in meters per second at a height of 50 meters
*   The second column is the number of hours, during a span of 6 years, in which the wind speed was the corresponding speed next to it. 

$I.E.$ for the first row, 'WS50M' is a speed of $0 \ m/s$ and that was the speed for $221$ hours out of the $6$ years the data was measured.

In [None]:
#@title
## DO NOT EDIT 
# Frequency Table for Dartmouth College
import pandas as pd

#Function to count frequency of wind speed values (to nearest whole number)
def freq_counter(speed_df, column_label):
  """Parameters:
  speed_df = (dataframe) WindSpeed Dataframe: Must contain a labelled column with wind_speed data 
  column_label = (string) label of the column with the wind speed data to be counted
  --------------"""
  #empty frequency dictionary
  freq_dict = {}

  #rounds speed value and counts it into dictionary
  for value in speed_df.loc[:,column_label]:
    if round(value) in freq_dict:
      freq_dict[round(value)] += 1
    else:
      freq_dict[round(value)] = 1
  
  #sorts frequency dictionary (orders wind speeds numerically)
  #creates and returns new dataframe of frequency table
  sorted_dict = sorted(freq_dict.items())
  new_df = pd.DataFrame(sorted_dict, columns = [column_label, "No. of hours"])
  return new_df

#Test
wind_data_full = pd.read_csv('https://raw.githubusercontent.com/difuse-dartmouth/wind-speed-power-analysis/main/completed_module/data/WindSpeedDartmouth.csv')
test_df = freq_counter(wind_data_full,"WS50M")
test_df

### Concept Check 1

We have now seen 3 frequency tables. Before examining the tables, you were told to examine these tables carefully and notice similarities and differences, and these tables certainly had a lot of both. 

For the most part, we see that the wind speeds between $3 \ m/s$ and $5 \ m/s$ are prevalent and the most **common**. This will be important later in the module. 

Meanwhile, think about how that impacts energy output.

Is it a good or bad thing? How would this impact a wind turbine? 

##**Mean Wind Speed**

Finding the <font color = blue> **mean** </font> wind speed is an important part of selecting a location for a wind turbine/wind turbine farm. 

The <font color = blue> **mean** </font> wind speed equation is:

<font size = 5> $\textbf{M} = \frac{1}{n} \sum_{i=1}^{w}m_{i} \mu _{i}$ </font>

where, $m_{i}$ is the number of repeating values (in this case the number of hours) and $\mu _{i}$ is the frequency

However, with the frequency tables, this is simplified. All you have to do is divide the number of hours at an arbitrary wind speed by the total number of hours, and you will find the average amount of time that a wind speed occurs. The number you would get is the percentage of time that a specific wind speed occurs at a certain location. 


**Now that we know all this, you will do a short exercise!**


*   Find the average amount of time spent at $2 \ m/s$ for Galveston
*   Find the average amount of time spent at $9 \ m/s$ for Penn State GA
*   Find the average amount of time spent at $7 \ m/s$ for Dartmouth College

*For all of these, round your answers to the nearest thousandth*

Work the problems out and then <font color= red> **RUN** </font> the code below to submit and verify your answers!

In [None]:
#@title
## DO NOT EDIT
#Code block that verifies mean wind speed answers

galveston_average = 17863/54768
roundGalveston = round(galveston_average, 3) #average wind spped for Galveston at 2m/s
penn_average = 849/52632
roundPenn = round(penn_average, 3)#average wind speed for Penn State GA at 9m/s
dartmouth_average = 3553/52632
roundDartmouth = round(dartmouth_average, 3)#average wind speed for Dartmouth at 7m/s

input_A = float(input("1. What is the average for Galveston? ")) #code that determines if input is correct
if input_A == roundGalveston :
  print()
  print('\tYou were correct!')
  print()
else :
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()

input_B = float(input("2. What is the average for Penn State GA? " )) 
if input_B == roundPenn :
  print()
  print("\tYou were correct!")
  print()
else :
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()

input_C = float(input("3. What is the average for Dartmouth College? "))  
if input_C == roundDartmouth : 
  print()
  print("\tYou were correct!")
  print()
else : 
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()

if input_A == roundGalveston and input_B == roundPenn and input_C == roundDartmouth : #code that gives farewell message if all answers were correct
  print()
  print("Congrats! You now know how to find the mean wind speed from a frequency table!\nYou can now move to the next section!")
else :
  print()
  print("Try the code block again to get all the answers correct!")

## **Real Life Application**

Now, you found the average for certain wind speeds in different regions, but which is the best mean wind speed, and how is it used to determine an optimal location? 

![](https://drive.google.com/uc?export=view&id=1xvuH0JOmHi4221y9xd9sQzbPV_Z4_kab)

(Image: Thomas Reaubourg/Unsplash)

<font size = 3>  According to the U.S. Energy Information and Administration, the optimal wind speed for a small wind turbine (the type of turbine we have the data for in this module block) is **$4 \ m/s$**. This might seem odd because, as you learned in Block 1, the higher the wind speed, the better the power output! However, a machine **cannot** operate at its maximum capacity forever. So, while a wind speed of $20 \ m/s$ would generate a lot of power, it is not feasible in the long run. So, in this case, a wind speed of $4 \ m/s$ would generate the most power while being feasible in the long run. </font>

<font size = 3> Based on this, we know that the most optimal location would be one where the mean wind speed of $4 \ m/s$ is the greatest. You already learned how to find the mean wind speed in the previous exercise for different speeds, but this time we should have you find it for the **most optimal** speed. Therefore:</font>

*   Find which region we gave you has the greatest mean wind speed of $4 \ m/s$

*Afterwards, <font color = red> submit and verify your answers with the code block below!*

In [None]:
#@title
# DO NOT EDIT
# Code that determines which location is best for small wind turbine

galveston_optimal = 5056/54768
roundG = round(galveston_optimal, 3) #average wind speed at 4m/s for Galveston
penn_optimal = 12015/52632
roundP = round(penn_optimal, 3)#average wind speed at 4m/s for Penn State GA
dartmouth_optimal = 12956/52632
roundD = round(dartmouth_optimal, 3)#average wind speed at 4m/s for Dartmouth

input_D = float(input("What is the mean for Galveston? ")) #code that verifies all answers
if input_D == roundG :
  print()
  print("\t That is correct!")
  print()
else : 
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()
  print()

input_E = float(input("What is the mean for Penn State GA? ")) 
if input_E == roundP :
  print()
  print("\t That is correct!")
  print()
else : 
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()
  print()

input_F = float(input("What is the mean for Dartmouth College? ")) 
if input_F == roundD :
  print()
  print("\t That is correct!")
  print()
else : 
  print()
  print("\tSorry, your answer seems to be wrong, try again!")
  print()
  print("\tRemember to round up to the third decimal;\n")
  print("\tTo run this again simply click the run button at the top left corner!")
  print()
  print()

if input_D == roundG and input_E == roundP and input_F == roundD : #code that sends farewell message if all answers are correct
  print()
  print("Congrats! You got all answers correct and now know that Dartmouth College was the most optimal location!")
else : 
  print()
  print("You got an answer or two wrong, try again!")

### Concept Check 2

<font size = 3> We see here that Galveston was far from having the greatest <font color = blue> **mean** </font> wind speed of $4 \ m/s$, **however** both the regions of Penn State GA and Dartmouth College were extremely close! These numbers essentially mean that in a period of 6 years (2015 - 2021), 1/4 of the time the wind speed was $4 \ m/s$, so both of these locations were extremely optimal for small wind turbines. This might be easier to see in a graph, so below you'll see some histograms so that you can compare them to one another. 

![](https://drive.google.com/uc?export=view&id=1jK3LJv3sfdBNs3GLa4u9NsNpx8kmTfuF)

![](https://drive.google.com/uc?export=view&id=1S-prsm2llP-F1QvXGapY8VFRkm7uiGfP)

![](https://drive.google.com/uc?export=view&id=1F2bQlXzNDmfUp4vBcvgkUmTwvvWHAlE3)

<font size=3>It's easier to see now that both the histograms, and therefore the wind speeds, of Penn State GA and Dartmouth College are extremely similar, but the Galveston, TX data is different and has most of its occuring wind speeds in the lower ranges.</font>
--------------------------------------------------------------------------------
<font size = 3> It is easy to understand then why it is important to use the data from frequency tables in determining locations for wind turbines. </font>
--------------------------------------------------------------------------------
Still, there are other things to consider. This data is a range of wind speeds collected over 6 years. Is that enough time to make a conclusion? If not, how far back should one look? These are all questions you must ask yourselves when determining the best location for a wind turbine or wind power plant.


#Congratulations!
You have completed Block 2 of the module.

Ensure that you visit both concept checks and work through all the questions before continuing to Block 3. 

Feel free to direct any questions to your professor.


#Credits
DIFUSE at Dartmouth
