<a href="https://colab.research.google.com/github/NatFT/PythonCourse/blob/main/Sea_Level_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using numpy to look for a correlation between time data and sea level rise
---

### Data Source
Global Average Absolute Sea Level Change, 1880-2014 from the US Environmental Protection Agency using data from CSIRO, 2015; NOAA, 2015.
https://datahub.io/core/sea-level-rise

The data describes annual sea levels from 1880 to 2013.  Measures are adjusted using two standards: Commonwealth Scientific and Industrial Research Organisation(CSIRO) and National Oceanic and Atmospheric Administration (NOAA)  

Raw Data file:  https://raw.githubusercontent.com/freeCodeCamp/boilerplate-sea-level-predictor/master/epa-sea-level.csv

For this exercise:
*  import the pandas library
*  import the numpy library
*  read the csv dataset containing data on sea-levels from the year 1880 to 2013 into a dataframe (df)
*  use df.head() and df.info() to inspect the data and the column data types



In [12]:
import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/freeCodeCamp/boilerplate-sea-level-predictor/master/epa-sea-level.csv')

print(df.head())
print(df.tail())
df.info()

   Year  CSIRO Adjusted Sea Level  Lower Error Bound  Upper Error Bound  \
0  1880                  0.000000          -0.952756           0.952756   
1  1881                  0.220472          -0.732283           1.173228   
2  1882                 -0.440945          -1.346457           0.464567   
3  1883                 -0.232283          -1.129921           0.665354   
4  1884                  0.590551          -0.283465           1.464567   

   NOAA Adjusted Sea Level  
0                      NaN  
1                      NaN  
2                      NaN  
3                      NaN  
4                      NaN  
     Year  CSIRO Adjusted Sea Level  Lower Error Bound  Upper Error Bound  \
129  2009                  8.586614           8.311024           8.862205   
130  2010                  8.901575           8.618110           9.185039   
131  2011                  8.964567           8.661417           9.267717   
132  2012                  9.326772           8.992126           9.

### Then
---
1.  Calculate some statistics on the level array, eg:
*  mean
*  standard deviation
*  total 

2.  Use the fact that the arrays are aligned (e.g. the first number in the level array is linked to the first year in the year array and display:

*  the year with the biggest rise in level
*  the year with the lowest rise in level

*(**Hint**:  to do this you can use a new numpy function np.where() )*
 ```
np.where(array == value_to_find)
```
*There is some reference material [here](https://thispointer.com/find-the-index-of-a-value-in-numpy-array/)*

**Note**: ```np.where(...)``` will return a tuple containing all indexes where that value was found.  You can print all, or you can print the first value (it is likely that there will only be one in this case) using [0][0].  *With the correct code you should get an answer of 2012*


3.  Calculate the Pearson product-moment correlation coefficient between year and the rise in sea level.  (*Expected output:  0.98 when rounded to 2 decimal places*)

In [36]:
sea_level = np.array(df['CSIRO Adjusted Sea Level'], np.float64)
print("The mean change in sea level is", round(sea_level.mean(),2), "inches to 2dp.")
print("The standard deviation of the change in sea levels is", round(sea_level.std(),2), "inches to 2dp.")
print("The total change in sea level between 1800 an 2013 is", round(sea_level.sum(),2), "inches to 2dp.")

year = np.array(df['Year'], np.int64)

max_index = np.where(sea_level == sea_level.max())
print("The year with the biggest rise in sea level was", int(year[max_index]), "with a change of", round(float(sea_level[max_index]),2), "inches to 2 dp.")
min_index = np.where(sea_level == sea_level.min())
print("The year with the biggest reduction in sea level was", int(year[min_index]), "with a change of", round(float(sea_level[min_index]),2), "inches to 2 dp.")

coef_matrix = np.corrcoef(year, sea_level)
coef = coef_matrix[0][1]
print("The correlation coefficient is",round(coef,2),"to 2 dp. Therefore there is a strong coefficient and year, i.e. as the year increases so does the sea level.")

The mean change in sea level is 3.65 inches to 2dp.
The standard deviation of the change in sea levels is 2.48 inches to 2dp.
The total change in sea level between 1800 an 2013 is 489.15 inches to 2dp.
The year with the biggest rise in sea level was 2012 with a change of 9.33 inches to 2 dp.
The year with the biggest reduction in sea level was 1882 with a change of -0.44 inches to 2 dp.
The correlation coefficient is 0.98 to 2 dp. Therefore there is a strong coefficient and year, i.e. as the year increases so does the sea level.


# Reflection
----

## What skills have you demonstrated in completing this notebook?

Your answer: 

## What caused you the most difficulty?

Your answer: 