# The Data

Temperature and energy demand data (MW) for the City of Toronto for the last 2 years... scraped by me!

In [3]:
import pandas as pd

df = pd.read_csv('data/weather_power.csv')

In [4]:
df.head()

Unnamed: 0,date,temperature,energy_demand
0,2018-01-01 00:00:00,-16.9,5340
1,2018-01-01 01:00:00,-16.3,5211
2,2018-01-01 02:00:00,-17.6,5096
3,2018-01-01 03:00:00,-18.6,4987
4,2018-01-01 04:00:00,-17.8,4926


# Model 01

Select and split... we'll just use temperature for now:

In [5]:
from sklearn.model_selection import train_test_split

target = 'energy_demand'
y = df[target]
X = df[['temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

> Max's Tip: you should try to get to a number as quickly as possible

We'll use [`DummyRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html) to make this happen:

In [6]:
from sklearn.dummy import DummyRegressor
model = DummyRegressor()
model.fit(X_train, y_train)

DummyRegressor()

`DummyRegressor` will literally just predict the average for everything... doesn't matter what you put in:

In [7]:
model.predict(X_test)

array([5692.04161844, 5692.04161844, 5692.04161844, ..., 5692.04161844,
       5692.04161844, 5692.04161844])

You can confirm by running:

In [8]:
y_train.mean()

5692.041618441358

But now we have a model that we can score so that we know what to beat:

In [9]:
from sklearn.metrics import mean_squared_error

round(mean_squared_error(y_test, model.predict(X_test)) ** (1/2))

1451.0

Because the average is ~ 5700 MW our model right now is off by about 1400 MW on every prediction... not super great!

To run a prediction for a single row, or new entry, I like to send things to a dictionary to get a sense of structure:

In [10]:
X.sample(1).to_dict(orient='list')

{'temperature': [17.1]}

I like to take that and embed it in a new pandas.DataFrame:

In [11]:
new = pd.DataFrame({'temperature': [21]})
model.predict(new)[0]

5692.041618441358

Save the model so that we can wrap an app around the serialized version:

In [12]:
import pickle

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

Make sure it works:

In [13]:
with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

model.predict(new)[0]

5692.041618441358

# App 01

Now we need to sit this model behind an app. For now, it's just going to be a shitty hello world: Flask app. Write this to a file called `app.py`:

In [14]:
%%writefile app.py

from flask import Flask

app = Flask(__name__)

@app.route('/')
def index():
    return 'Hello'

if __name__ == '__main__':
    app.run(port=5000, debug=True)

Overwriting app.py


Preview the app by running it from the command line:

In [15]:
!python app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 122-223-301
^C


Interrupt (ctrl+c, or Kernel > Interrupt) when finished to move on...

# App 02

Now that we have some boilerplate in place, we can extend the hello world example to include the model. Though this looks like it'll work... it won't, but run it anyway:

In [16]:
%%writefile app.py

import pickle

from flask import Flask
import pandas as pd

app = Flask(__name__)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

@app.route('/')
def index():
    new = pd.DataFrame({'temperature': [20]})
    prediction = model.predict(new)
    print(prediction)
    return prediction

if __name__ == '__main__':
    app.run(port=5000, debug=True)

Overwriting app.py


And run the app at the command line:

In [18]:
!python app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 122-223-301
[5692.04161844]
127.0.0.1 - - [21/Aug/2020 19:27:30] "[35m[1mGET / HTTP/1.1[0m" 500 -
Traceback (most recent call last):
  File "/Users/max/opt/miniconda3/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/max/opt/miniconda3/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/max/opt/miniconda3/lib/python3.8/site-packages/flask/app.py", line 1867, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/max/opt/miniconda3/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/max/opt/miniconda3/lib/python3.8/site

When you visit http://127.0.0.1:5000/ you'll get a:

**TypeError** 

> TypeError: The view function did not return a valid response. The return type must be a string, dict, tuple, Response instance, or WSGI callable, but it was a ndarray.  

Interrupt the kernel so that we can move on and fix it

# App 03

To fix the return type issue we can just return a dictionary:

In [19]:
%%writefile app.py

import pickle
from flask import Flask
import pandas as pd

app = Flask(__name__)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

@app.route('/')
def index():
    new = pd.DataFrame({'temperature': [20]})
    prediction = model.predict(new)[0]
    # return str(prediction)
    return {'prediction': prediction}

if __name__ == '__main__':
    app.run(port=5000, debug=True)

Overwriting app.py


Alternatively, you could `return str(prediction)`

Run the app, to confirm that it works:

In [20]:
!python app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 122-223-301
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET /?__debugger__=yes&cmd=resource&f=style.css HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET /?__debugger__=yes&cmd=resource&f=debugger.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET /?__debugger__=yes&cmd=resource&f=console.png HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET /?__debugger__=yes&cmd=resource&f=jquery.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:31:23] "[37mGET /?__debugger__=yes&cmd=resource&f=ubuntu.ttf HTTP/1.1[0m" 200 -
^C


# App 04

Right now our app is just returning the prediction for when the temperature is 20 degrees. In order to make it dynamic, we need to use "query params"... they look like:

`http://website.com/endpoint?query=string`

Query params will allow our model to accept different inputs. In order to capture query params, we need to import the `flask.request` object so that we can peel off `request.args` (basically just a dictionary).

At the same time, we're going to add a temperature endpoint and re-organize the structue of the app:

In [21]:
%%writefile app.py

import pickle
from flask import Flask, request
import pandas as pd

app = Flask(__name__)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

@app.route('/')
def index():
    return 'Use the /predict endpoint'

@app.route('/predict')
def predict():
    query = request.args
    print(query)
    new = pd.DataFrame({'temperature': [20]})
    prediction = model.predict(new)[0]
    return {'prediction': prediction}

if __name__ == '__main__':
    app.run(port=5000, debug=True)

Overwriting app.py


Run and watch the command line when you enter arbitrary query strings like:
    
- http://127.0.0.1:5000/predict?hi=there&name=max
- http://127.0.0.1:5000/predict?even=more&query=strings
- http://127.0.0.1:5000/predict?temperature=25

In [22]:
!python app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 122-223-301
ImmutableMultiDict([('hi', 'there'), ('name', 'max')])
127.0.0.1 - - [21/Aug/2020 19:35:22] "[37mGET /predict?hi=there&name=max HTTP/1.1[0m" 200 -
ImmutableMultiDict([('even', 'more'), ('query', 'strings')])
127.0.0.1 - - [21/Aug/2020 19:35:24] "[37mGET /predict?even=more&query=strings HTTP/1.1[0m" 200 -
127.0.0.1 - - [21/Aug/2020 19:35:24] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
^C


Will return `ImmutableMultiDict([('hi', 'there'), ('name', 'max')])` which is just a fancy dictionary...

# App 05 

To *actually* connect the temperature query string to the model, we'll grab it off `request.args` and format it as a float:

In [23]:
%%writefile app.py

import pickle
from flask import Flask, request
import pandas as pd

app = Flask(__name__)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

@app.route('/')
def index():
    return 'Use the /predict endpoint'

@app.route('/predict')
def predict():
    query = request.args
    temperature = float(query.get('temperature'))
    print(temperature)
    new = pd.DataFrame({'temperature': [temperature]})
    prediction = model.predict(new)[0]
    return {'prediction': prediction}

if __name__ == '__main__':
    app.run(port=5000, debug=True)

Overwriting app.py


Preview the app at the command line:

In [24]:
!python app.py

 * Serving Flask app "app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 122-223-301
25.0
127.0.0.1 - - [21/Aug/2020 19:36:35] "[37mGET /predict?temperature=25 HTTP/1.1[0m" 200 -
50.0
127.0.0.1 - - [21/Aug/2020 19:36:40] "[37mGET /predict?temperature=50 HTTP/1.1[0m" 200 -
30.0
127.0.0.1 - - [21/Aug/2020 19:36:40] "[37mGET /predict?temperature=30 HTTP/1.1[0m" 200 -
^C


And hit the url with some temperatures:

- http://127.0.0.1:5000/predict?temperature=25

You'll notice that the print statement in the console is registering the different values, but the model is just returning the same thing.

Well that shouldn't surprise, because our model is dumb! Let's fix our dummy model now...

# Model 02

We have a number to beat, let's see if we can beat it with `LinearRegression`:

In [1]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

NameError: name 'X_train' is not defined

Unfortunately, this will break because our data has some NaNs in the temperature column:

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23040 entries, 0 to 23039
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   date           23040 non-null  object 
 1   temperature    22873 non-null  float64
 2   energy_demand  23040 non-null  int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 540.1+ KB


# Model 03

Not to worry, this is easily fixed with `DataFrameMapper` from [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) (my secret weapon).

DataFrameMapper accepts a list of tuples, where each tuple identifies the column name and then the transformer that operates on it:

In [29]:
from sklearn.impute import SimpleImputer
from sklearn_pandas import DataFrameMapper

mapper = DataFrameMapper([
    ('temperature', SimpleImputer())
], df_out=True)

It works as though it's a first class transformer, so `fit`, `transform`, and `fit_transform` all work. Except it's a bit finnicky:

In [30]:
mapper.fit_transform(X_train)

ValueError: temperature: Expected 2D array, got 1D array instead:
array=[-16.9 -16.3 -17.6 ...   9.1   8.7   8.3].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

The above hits a **Value Error**

> ValueError: temperature: Expected 2D array, got 1D array instead:

Because of how scikit-learn is desined. It has to do with the differences between how numbers and strings and columns are encoded.

No worries, though, it's quickly fixed by wrapping the pesky column in square brackets (won't always work)...

In [31]:
mapper = DataFrameMapper([
    (['temperature'], SimpleImputer())
], df_out=True)

mapper.fit_transform(X_train)

Unnamed: 0,temperature
0,-16.9
1,-16.3
2,-17.6
3,-18.6
4,-17.8
...,...
20731,10.3
20732,9.5
20733,9.1
20734,8.7


But it works here!

# Model 04

Using the DataFrameMapper, we can now transform our `X` objects to intermediate `Z` objects... 

> Max's Tip: You could just overwrite the `X`s but I like `Z`s becauxe they remind me "HEY I DID SOMETHING TO THIS!~"

In [32]:
Z_train = mapper.fit_transform(X_train)
Z_test = mapper.transform(X_test)

model = LinearRegression()
model.fit(Z_train, y_train)

LinearRegression()

Now we can score the model again:

In [33]:
round(mean_squared_error(y_test, model.predict(Z_test)) ** (1/2))

1345.0

And find that this new model beats the dummy, albeit not by much. 

Let's peek at some examples:

In [34]:
pd.DataFrame({
    'y_true': y_test,
    'y_hat': model.predict(Z_test)
}).sample(25)

Unnamed: 0,y_true,y_hat
21406,6000,5692.041618
21349,5942,5869.045155
22964,7224,5923.632752
20749,5489,5709.708384
22466,4777,5869.045155
22476,7557,5972.318988
21230,6526,5964.942285
21541,6082,5888.224581
22689,5623,5888.224581
22085,6005,5935.435476


# Model 05

Now that we have something a little more dynamic, we should "serialize " the mapper and the model:

In [35]:
with open('mapper.pkl', 'wb') as f:
    pickle.dump(mapper, f)
    
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

Except this isn't super ideal because in this paradigm we have to keep track of and load two separate things:

In [36]:
with open('mapper.pkl', 'rb') as f:
    mapper = pickle.load(f)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

Predicting a new sample is a matter of running:

In [37]:
X_train.sample(1).to_dict(orient='list')

{'temperature': [2.2]}

And:

In [38]:
new = pd.DataFrame({'temperature': [21]})

Z_new = mapper.transform(new)
model.predict(Z_new)[0]

5879.372538053129

But less is more, so I think we can do better...

# Model 06

Enter pipelines... a scikit-learn tool that will ensure that we only have to manage one thing.

As an added bonus, using pipeline will make it so that we can get rid of the intermediate `Z` objects at the same time!

In [39]:
from sklearn.pipeline import make_pipeline

In [40]:
pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train);

Now dump:

In [41]:
with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

And load (to confirm it worked):

In [42]:
with open('pipe.pkl', 'rb') as f:
    pipe = pickle.load(f)

In [43]:
pipe.predict(new)[0]

5879.372538053129

For deploying... we'll actually need our full model to be in a file, so let's do that now:

In [44]:
%%writefile model.py

import pickle
import pandas as pd
from sklearn_pandas import DataFrameMapper
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline

df = pd.read_csv('data/weather_power.csv')

target = 'energy_demand'
y = df[target]
X = df[['temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

mapper = DataFrameMapper([
    (['temperature'], SimpleImputer())
], df_out=True)

model = LinearRegression()

pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train)

with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

Overwriting model.py


# Deploy 01 - Heroku

1. Setup a virtual environment:

```
python -m venv .venv
```

2. Activate it:

```
source .venv/bin/activate
```

3. Install app and model dependencies (`gunicorn` is for in between the web and flask):

```
pip install gunicorn flask scikit-learn pandas sklearn_pandas
```

3. Freeze the dependencies:

```
pip freeze > requirements.txt
```

4. Retrain the model inside of the virtual environment:

```
python model.py
```

5. Make sure the app still works locally:

```
python app.py
```

6. Specify a python runtime (3.8 not working yet):

```
python --version
echo "python-3.7.9" > runtime.txt
```

7. Create a `Procfile`:

```
echo "web: gunicorn app:app" > Procfile
```

8. (Optional) If your project isn't already a git repo, make it one:

```
git init
touch .gitignore
echo ".venv" >> .gitignore
```

9. Login to Heroku from the [command line](https://devcenter.heroku.com/articles/heroku-cli):

```
heroku login
```

10. Create a project:

```
heroku create
```

11. Add a remote to the randomly generated project:

```
heroku git:remote -a silly-words-009900
```

12. Test the app locally:

```
heroku local
```

13. add, commit push:

```
git add .
git commit -m '🚀'
git push heroku master
```

14. Hit the url and make sure it works!

- http://\<url\>/predict?temperature=20

15. Make sure nothing is wrong (check the logs!):

```
heroku logs -t 
```

# App 07

FastAPI is the new kid on the block. It's faster, less boilerplate, and the future. Converting from Flask to FastAPI isn't that tough:

In [None]:
%%writefile app.py

import pickle
import pandas as pd

import uvicorn
from fastapi import FastAPI

app = FastAPI()

with open('pipe.pkl', 'rb') as f:
    pipe = pickle.load(f)

@app.get('/')
def index():
    return 'Use the /predict endpoint with a temperature argument'

@app.get('/predict')
def predict(temperature: float):
    new = pd.DataFrame({'temperature': [temperature]})
    prediction = pipe.predict(new)[0]
    return {'prediction': prediction}

if __name__ == '__main__':
    uvicorn.run(app)

Run at the command line with:

In [None]:
!uvicorn app:app --port 5000 --reload

Try it out and make sure it still works:

- http://127.0.0.1:5000/predict?temperature=20

As a bonus there's also some wicked auto-generate docs at:

- http://127.0.0.1:5000/docs

Interrupt and kill when you've verified it works

# Deploy 02

Heroku + fastAPI

0. If you're not in your environment anymore you can re-enter with:

`source .venv/bin/activate`

(And to exit out to your base environment):

`deactivate`

1. Install the new dependencies:

```
pip install uvicorn fastapi
```

2. Freeze the dependencies:

```
pip freeze > requirements.txt
```

3. Retrain the model inside of the virtual environment:

```
python model.py
```

4. Make sure the app still works locally:

```
uvicorn app:app --port=5000
```

5. Create a new `Procfile`:

```
echo "web: uvicorn app:app --host=0.0.0.0 --port=${PORT:-5000}" > Procfile
```

6. Test the app locally:

```
heroku local
```

7. add, commit push:

```
git add .
git commit -m '🚀'
git push heroku master
```

8. Click on the url and make sure it works!

- http://\<url\>/predict?temperature=20

# Model 07

The model we have right now sucks...

In [None]:
from matplotlib import pyplot as plt

plt.plot(range(len(y_test)), y_test)
plt.plot(range(len(y_test)), model.predict(Z_test));

It barely explains anything... but it actually turns temperature can explain a lot when we properly handle the column...

In [None]:
plt.scatter(df['temperature'], df['energy_demand'], alpha=1/20);

Whenever we see a "U" or a rainbow, we should look at brining in Polynomial Features

In [None]:
from sklearn.preprocessing import PolynomialFeatures

mapper = DataFrameMapper([
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

mapper.fit_transform(X_train)

In a full pipeline:

In [None]:
pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train)

This time:

In [None]:
pipe.score(X_test, y_test)

In [None]:
round(mean_squared_error(y_test, pipe.predict(X_test)) ** (1/2))

We explained over half of the variance and have our model off by < 1000 mw. Look at the updated prediction curve.

Still have a lot of work to do, but let's wrap it up in a `model.py` file so that we can redeploy!

In [None]:
%%writefile model.py

import pickle
import pandas as pd
from sklearn_pandas import DataFrameMapper
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures

df = pd.read_csv('data/weather_power.csv')

target = 'energy_demand'
y = df[target]
X = df[['temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

mapper = DataFrameMapper([
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

model = LinearRegression()
pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train)

with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

# Deploy 03

No dependencies have changed, so it's a lot easier to push a change

1. Retrain the model inside of the virtual environment:

```
python model.py
```

2. Test the app locally:

```
heroku local
```

3. add, commit push:

```
git add .
git commit -m '🚀'
git push heroku master
```

4. Click on the url and make sure it works!

- http://\<url\>/predict?temperature=20

# Model 08 

Right now our model is just using temperature:

In [None]:
plt.plot(range(len(y_test)), y_test)
plt.plot(range(len(y_test)), pipe.predict(Z_test));

There has to be so relationship betwen date. And hour. Let's convert the date column from a string to a date:

In [None]:
df['date'] = pd.to_datetime(df['date'])

Alternatively, we could import the data with the date column (0th position), already parsed:

In [None]:
df = pd.read_csv('data/weather_power.csv', parse_dates=[0])

We probably want month, weekday and hour:

In [None]:
col = df['date']

pd.concat([col.dt.month, col.dt.weekday, col.dt.hour], axis=1)

Let's recut the X and y:

In [None]:
target = 'energy_demand'
y = df[target]
X = df[['date', 'temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

We can make a scikit-learn transformer to do this:

In [None]:
from sklearn.base import TransformerMixin

class DateEncoder(TransformerMixin):
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return pd.concat([X.dt.month, X.dt.weekday, X.dt.hour], axis=1)

That way we can apply it on the `X_train` and `X_test` objects:

In [None]:
DateEncoder().fit_transform(X_train['date'])

Can easily embed it in our mapper framework:

In [None]:
mapper = DataFrameMapper([
    ('date', DateEncoder(), {'input_df': True}),
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

model = LinearRegression()
pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train)

Explains even more variance:

In [None]:
pipe.score(X_test, y_test)

And shaving off more from RMSE:

In [None]:
mean_squared_error(y_test, pipe.predict(X_test)) ** (1/2)

The prediction plot:

In [None]:
plt.plot(range(len(y_test)), y_test)
plt.plot(range(len(y_test)), pipe.predict(X_test));

Because of the way pickle works, we have move the `DateEncoder` to a `utils.py` file:

In [None]:
%%writefile utils.py

import pandas as pd
from sklearn.base import TransformerMixin

class DateEncoder(TransformerMixin):
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return pd.concat([X.dt.month, X.dt.weekday, X.dt.hour], axis=1)

Let's write the complete model to a new file:

In [None]:
%%writefile model.py

import pickle
import pandas as pd
from sklearn_pandas import DataFrameMapper
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures

from utils import DateEncoder # CUSTOM IMPORT

df = pd.read_csv('data/weather_power.csv', parse_dates=[0])

target = 'energy_demand'
y = df[target]
X = df[['date', 'temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

mapper = DataFrameMapper([
    ('date', DateEncoder(), {'input_df': True}),
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

model = LinearRegression()
pipe = make_pipeline(mapper, model)
pipe.fit(X_train, y_train)

with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

# App 08

Add the custom `DateEncoder` and accept a dictionary in the index, and change it to a post request:

In [None]:
%%writefile app.py

import pickle
import pandas as pd
import uvicorn
from fastapi import FastAPI
from typing import Dict
import os
from utils import DateEncoder

app = FastAPI()

with open('pipe.pkl', 'rb') as f:
    pipe = pickle.load(f)

@app.post('/')
def index(json_data: Dict):
    new = pd.DataFrame({
        'date': [pd.Timestamp(json_data.get('date'))],
        'temperature': [float(json_data.get('temperature'))]
    })
    prediction = pipe.predict(new)[0]
    return {'prediction': prediction}

if __name__ == '__main__':
    uvicorn.run(app)

# Post Carrier Function

Need a post carrier to hit the local endpoint and when it's deployed, this is the function we'll use:

In [None]:
import json
from urllib.request import Request, urlopen
import pandas as pd

def post(url, data):
    data = bytes(json.dumps(data).encode("utf-8"))
    request = Request(
        url=url,
        data=data,
        method="POST"
    )
    request.add_header("Content-type", "application/json; charset=UTF-8")
    with urlopen(request) as response:
        data = json.loads(response.read().decode("utf-8"))
    return data

# Deploy 04

Deploy on Dokku (Heroku Open Source, and better)

Create a new Procfile (respect port 5000):

`echo "web: uvicorn app:app --host=0.0.0.0 --port=${PORT:-5000}" > Procfile`


Rerun the model:

```
python model.py
```

Make sure the app works locally:

```
uvicorn app:app --port 5000 --reload
```

Use the post function:

In [None]:
data = {
    "date": str(pd.Timestamp('now')),
    "temperature": 25
}

In [None]:
post("http://127.0.0.1:5000", data)

Once you've confirmed that it still works:

Deploy Dokku by:

1. Sign up for a [DigitalOcean](https://m.do.co/c/2909cd1f3f10) account

2. Spin up a $5 Ubuntu 20/18.04 server...

3. ssh into it:

```
ssh root@165.XXX.43.118
```

4. (Strongly advised) Update everything:

```
sudo apt update
sudo apt -y upgrade
```

5. Setup firewall:

````
ufw app list
ufw allow OpenSSH
ufw enable
````

6. Add some rules ([source](https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-with-ufw-on-ubuntu-18-04)):

```
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 22
sudo ufw allow http
sudo ufw allow https
```

7. Install dokku:

```
wget https://raw.githubusercontent.com/dokku/dokku/v0.21.4/bootstrap.sh
sudo DOKKU_TAG=v0.21.4 bash bootstrap.sh
```

**THIS STEP IS IMPORTANT!**

8. Visit the Droplet’s IP address in a browser to finish configuring Dokku


9. Copy and paste your ssh key from your **laptop** into the config window:

```
cat .ssh/id_rsa.pub
```

10. And add the IP of the server to the hostname:

`68.183.XXX.31`


11. Click "Finish Setup"...


12. Go back to the server terminal and create a dokku app on the server:

```
dokku apps:create powerapp
dokku domains:enable powerapp

```

**On Laptop**

12. Add dokku as a remote:

```
git remote add dokku dokku@165.XXX.43.118:powerapp
```

13. Verify that the remote got added:

```
git remote -v
```

14. Push it up (for every new change just run these commands):

```
git add .
git commit -m '🤞'
git push dokku master
```

15. Test if it works with the post function:

In [None]:
post("http://142.93.148.110", data)

# Model 09

Add some tensorflow

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

mapper = DataFrameMapper([
    ('date', DateEncoder(), {'input_df': True}),
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

Z_train = mapper.fit_transform(X_train)
Z_test = mapper.transform(X_test)

model = Sequential()
model.add(Input(shape=(Z_train.shape[1],)))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))

model.compile(
    loss='mean_squared_error',
    optimizer='adam',
    metrics=[tf.keras.metrics.RootMeanSquaredError()]
)

model.fit(
    Z_train, y_train,
    epochs=100, batch_size=32,
    validation_data=(Z_test, y_test)
)

model.evaluate(Z_test, y_test)
r2_score(y_test, model.predict(Z_test)), r2_score(y_test, model.predict(Z_test))

Predict something new:

In [None]:
new = pd.DataFrame({
    'date': [pd.Timestamp('now')],
    'temperature': [17]
})

model.predict(mapper.transform(new))[0][0]

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(X_test['date'], y_test, alpha=1/2);
plt.plot(X_test['date'], model.predict(Z_test).flatten(), alpha=1/2);

round(mean_squared_error(y_test, model.predict(Z_test)) ** (1/2))

# Model 10

Use a KerasRegressor Wrapper and a SelectKBest:

In [None]:
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.models import load_model
from sklearn.feature_selection import SelectKBest

Z_train = mapper.fit_transform(X_train)
Z_test = mapper.transform(X_test)

columns = 5
select = SelectKBest(k=columns)

select.fit_transform(Z_train, y_train)

def nn():
    columns = 5
    m = Sequential()
    m.add(Input(shape=(columns,)))
    m.add(Dense(10, activation='relu'))
    m.add(Dense(10, activation='relu'))
    m.add(Dense(1))
    m.compile(
        loss='mean_squared_error',
        optimizer='adam',
        metrics=[tf.keras.metrics.RootMeanSquaredError()]
    )
    return m

model = KerasRegressor(nn, epochs=100, batch_size=32, verbose=0)

See how the pipe can work:

In [None]:
pipe = make_pipeline(mapper, select, model)
pipe.fit(X_train, y_train)

new = pd.DataFrame({
    'date': [pd.Timestamp('now')],
    'temperature': [17]
})

float(pipe.predict(new))

Serializing is going to be a bit of a headache, need to dump the model first then pickle the pipeline:

In [None]:
pipe.named_steps['kerasregressor'].model.save('model.h5')
pipe.named_steps['kerasregressor'].model = None

In [None]:
with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

Loading it back in looks like this:

In [None]:
with open('pipe.pkl', 'rb') as f:
    pipe = pickle.load(f)

pipe.named_steps['kerasregressor'].model = load_model('model.h5')

float(pipe.predict(new))

Should move the `nn` definition to utils:

In [None]:
%%writefile utils.py

import pandas as pd
from sklearn.base import TransformerMixin
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

class DateEncoder(TransformerMixin):
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return pd.concat([X.dt.month, X.dt.weekday, X.dt.hour], axis=1)
    
def nn():
    columns = 5
    m = Sequential()
    m.add(Input(shape=(columns,)))
    m.add(Dense(10, activation='relu'))
    m.add(Dense(10, activation='relu'))
    m.add(Dense(1))
    m.compile(
        loss='mean_squared_error',
        optimizer='adam',
        metrics=[tf.keras.metrics.RootMeanSquaredError()]
    )
    return m

Write the full model:

In [None]:
%%writefile model.py

import pickle
import pandas as pd
from sklearn_pandas import DataFrameMapper
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.feature_selection import SelectKBest
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.models import load_model

from utils import DateEncoder, nn

df = pd.read_csv('data/weather_power.csv', parse_dates=[0])

target = 'energy_demand'
y = df[target]
X = df[['date', 'temperature']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)

mapper = DataFrameMapper([
    ('date', DateEncoder(), {'input_df': True}),
    (['temperature'], [SimpleImputer(), PolynomialFeatures(degree=2, include_bias=False)])
], df_out=True)

columns = 5
select = SelectKBest(k=columns)

model = KerasRegressor(nn, epochs=100, batch_size=32, verbose=0)

pipe = make_pipeline(mapper, select, model)
pipe.fit(X_train, y_train)

pipe.named_steps['kerasregressor'].model.save('model.h5')
pipe.named_steps['kerasregressor'].model = None

with open('pipe.pkl', 'wb') as f:
    pickle.dump(pipe, f)

# App 10

Add the tensorflow model and async and RequestData:

In [None]:
%%writefile app.py

import os
import pickle

from fastapi import FastAPI
import uvicorn
from typing import Dict
from pydantic import BaseModel

import pandas as pd
from tensorflow.keras.models import load_model
from utils import DateEncoder, nn

app = FastAPI()

with open('pipe.pkl', 'rb') as f:
    pipe = pickle.load(f)

pipe.named_steps['kerasregressor'].model = load_model('model.h5')

class RequestData(BaseModel):
    date: str
    temperature: float

@app.post('/')
async def index(request: RequestData): # add async and RequestData
    new = pd.DataFrame({
        'date': [pd.Timestamp(request.date)],
        'temperature': [request.temperature]
    })
    prediction = float(pipe.predict(new))
    return {'prediction': prediction}

if __name__ == '__main__':
    uvicorn.run(app)

# Deploy 5

1. Update environment:

```
pip install tensorflow
```

2. Freeze the dependencies:

```
pip freeze > requirements.txt
```

3. Retrain the model inside of the virtual environment:

```
python model.py
```

4. Make sure the app still works locally:

```
uvicorn app:app --port 5000 --reload
```

5. Push everything up to GitHub:

```
git add .
git commit -m '🚀'
git push dokku
```

6. Check logs on server (to make sure it all works)

```
dokku logs powerapp --tail
```

7. Test with the post function:

In [None]:
data = {
    "date": str(pd.Timestamp('now')),
    "temperature": 20
}

post("http://142.93.148.110", data)

And that's it!

New changes that don't have any new depends will just need:

```
git add .
git commit -m '🚀'
git push dokku
```