# Numpy Statistical Properties Exploration: Climate Change

You are a data scientist interested in climate change in your area.  Every month for an entire year you obtain the average land temperature of your city measured in degrees Celsius.  You record this information in a notebook. 
<br><br>
Here is your data:
![title](img/temps.png)
<br>
Let's look at how you can use Numpy statistical functions to analyze your dataset.

**Question 1)** In the next cell, import the Numpy package. Then create a 1-Dimensional Numpy array of your recorded temperatures.

In [4]:
import numpy as np
temperature = np.array([5.313, 3.723, 7.496, 12.341, 13.477,16.487,17.697,13.901,9.895,6.828,4.849])
temperature

array([ 5.313,  3.723,  7.496, 12.341, 13.477, 16.487, 17.697, 13.901,
        9.895,  6.828,  4.849])

**Question 2)** Find the AVERAGE temperature.

In [3]:
np.mean(temperature)

10.182454545454545

**Question 3)** SORT your data to see if there are any extreme temperatures that could throw off your analysis.  Extreme temperatures would be those data values that are much higher or much lower than all other data values.

In [7]:
np.sort(temperature)

array([ 3.723,  4.849,  5.313,  6.828,  7.496,  9.895, 12.341, 13.477,
       13.901, 16.487, 17.697])

**Question 4)** Do you think that there were any temperatures that could have impacted your average temperature calculation above?  If so, the MEDIAN might be a better representation of your dataset.  In the next cell, calculate the MEDIAN temperature.

In [8]:
np.median(temperature)

9.895

**Question 5)** While the median tells you where 50% of your temperature data fall below.  Where does 75% of your temperature data fall below?  Calculate the 75th PERCENTILE.

In [9]:
np.percentile(temperature, 75)

13.689

**Question 6)** Now that you have a good idea about specific temperature values, let's look at the spread of your data.  Calculate the STANDARD DEVIATION to see how close or far away your data is from the mean.  Describe in the cell below your calculation what the STANDARD DEVIATION says about your temperature data.

In [10]:
np.std(temperature)

4.651397900604943

Great work!  Numpy made it super easy to calculate several important statistical properties of your dataset.  When extreme values exist in the data, they impact the MEAN calculation.  In this case, the MEDIAN temperature may be more useful than the MEAN temperature because it is not affected by these extreme values.  
<br><br>
The STANDARD DEVIATION is very useful for understanding the spread of your data.  A large STANDARD DEVIATION indicates that most temperature values are spread far away from the average and are therefore, a lot higher/lower than the average temperature.  While a small STANDARD DEVIATION indicates most temperature values are close to (or not that different from) the average temperature.  