# Week 21 - Formative Exercise

This week you're given a scenario below, you must select appropriate techniques to consider, train at least two models based on the scenario and then finally evaluate your models using reasonable metrics. 

## Scenario
You are working as a data scientist at the Met Office.

Wales is particularly succeptible to climate change which is increasing the frequency of both flooding and drought events. Luckily temperature and rainfall records are well kept for four stations in Wales.

To help identify the future state of the climate in Wales you have been asked to conduct one of the two tasks below, choose one which interests you the most.

### Task 1: Rainfall Prediction
The first option is to produce a model capable of predicting the rainfall in June of 2100.

### Task 2: Temperature Prediction
The second option is to produce a model capable of predicting the temperature in September of 2150.

## Dataset

You have been provided a weather dataset for four weather stations across Wales, these are:
+ 0: Valley
+ 1: Cardiff
+ 2: Ross-on-Wye
+ 3: Aberporth

<div>
<img src="wales.png" width="250"/>
</div>

The dataset is provided in a Numpy array format which can be loaded as follows:

`dataset = np.loadtxt("weather_data.csv", delimiter=",")`

Once you have loaded the dataset it will look like a numpy array consisting of 5 columns which are as follows:
1. Weather Station (a numerical indicator as defined above)
2. Year of the reading
3. Month of the reading
4. Temperature (Degrees Celsius)
5. Rainfall (mm)

You do not have to use all the data; however, you should consider carefully which data you need to successfully train and test a model to answer the questions.

## Recommended Steps

To help you answer the questions here are a list of steps I recommend you take to tackle each question individually:
1. Load the data from the `weather_data.csv` file.
2. Extract only the inputs and outputs you need.
3. Split the data into training and testing.
4. Select a suitable modelling approach.
5. Implement and fit the model.
6. Evaluate the model's performance.
7. (Optional) Try a second modelling approach to compare performance!

## Tips and Tricks

Now you've got a handle on the metrics necessary for evaluating tasks I recommend you make use of the sklearn metrics library so you don't need to implement them yourself! Take a look through the [metrics](https://scikit-learn.org/stable/modules/model_evaluation.html) documentation, you'

In [2]:
import numpy as np
from sklearn.metrics import r2_score

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

dataset = np.loadtxt("weather_data.csv", delimiter=",")

# input_x = dataset[:,1],dataset[:,2],dataset[:,3]
input_x = dataset [: ,1:4]
output_y = dataset [:, 4]

# x_train,x_test , y_train,y_test = train_test_split  (input_x,output_y , test_size = 0.2 , random_state = 42)
x_train,x_test , y_train,y_test = train_test_split  (input_x,output_y , test_size = 0.2 , random_state = 56)

linear_model = LinearRegression().fit(x_train,y_train)
# linear_model.fit(x_train,y_train)
preds = linear_model.predict(x_test)

r2 = r2_score(y_test,preds)
print(r2)



# print(dataset)


0.08705734236593377
