# <i class="fas fa-laptop"></i> Practice: Coding Machine Learning
```{jupyter-info}
{rel-data-download}`weather.csv`
```

In this notebook, we will practice trying to predict the weather. We won't try to predict it in the sense you are familiar with, where meteorologists try to predict what the weather will be a week out from now. Instead, we will do a simpler example where we look at various information about a day and try to predict the maximimum temperature that day.

The data is stored in `weather.csv` and has the following columns.
* `STA`: A code representing what station the measurements were taken from
* `YR`: Which year this measurement was taken
* `MO`: Which month this measurement was taken
* `DA`: Which day this measurement was taken
* `MAX` (our target): The maximum temperature that was reached that day
* `MIN`: The minimum temperature that was reached that day.

Since the target we want to predict is a number, this will be a regression task rather than a classification task. Almost all the code you will write will be the same as we saw in the lesson, except:
* You will use a `DecisionTreeRegressor` from `sklearn.tree` instead of a `DecisionTreeClassifier`
* You will use the `mean_squared_error` function from the `sklearn.metrics` module instead of `accuracy_score`. It behaves similarly in the sense it takes the true labels and the predicted labels, but is different in that it returns the **error** of the predictions instead. Formally, this is returning the mean-squared error between your predictions and the true values (find the difference for each example, square them, and average them). A higher MSE means the model did worse, while an MSE of 0 means there were no errors!

As a recommendation, you may use the following variable names for the parts of the problem:
* `data` should store the `DataFrame` of all the data stored in `weather.csv`.
* `features` should store the `DataFrame` of just the features.
* `labels` should store the `Series` of labels.
* `model` should store the `DecisionTreeRegressor`.
* `error` should store the error of the trained `model` on the whole dataset.

We don't specify each step so that you refer back to your notes and the code process you saw from the notebook earlier in the lesson. Refer back to that for the steps to train the model (accounting for the differences we highlighted above). Remember to import all the necessary libraries!

As a hint for correctness on this task, your model should get 0 error on this dataset. We will discuss in Lesson 12 why getting 0 error might be a sign of something is actually wrong with our model, but for this lesson, we will consider that correct!

**For these problems, you should not use any loops!**

In [3]:
# Write your code here!
import pandas as pd
# Import the model
from sklearn.tree import DecisionTreeRegressor
# Import the function to compute accuracy
from sklearn.metrics import mean_squared_error

from google.colab import files
uploaded = files.upload()

data = pd.read_csv('weather.csv')
data.head()

features = data.loc[:, data.columns != 'MAX']
labels = data['MAX']

# Create an untrained model
model = DecisionTreeRegressor()
# Train it on our training data
model.fit(features, labels)

# Make predictions on the data
predictions = model.predict(features)
# Assess the accuracy of the model
mean_squared_error(labels, predictions)

Saving weather.csv to weather (2).csv


0.0