# Model Deployment - Traditional methods and Cloud Services

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
import pickle

# Pickling for Deployment Example

This notebook shows the basic outline for training a model, evaluating it, then using it in a "production" context to make predictions about new data.



## Extract, Transform, Load Data

This is easy here because I'm using a nice tidy dataset from sklearn

In [2]:
# get premade wine dataset from sklearn
data = load_wine()

In [3]:
print(data.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

## Build a Model to Make Predictions 

In [4]:
# let's build a model to predict the class of wine
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
classifier = RandomForestClassifier(max_depth=2, random_state=0, n_estimators=100)
classifier.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=2, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=0, verbose=0,
                       warm_start=False)

## Evaluate the Model 

The pickle format is popular for this task in Python. Pickling is a form of serialization or flattening, which basically means converting everything about an object in memory into bits of data that can be stored in a file.

**First**, we create a pickle object for our classifier. 
**Second**, we use the pickle.dump method to convert a Python object hierarchy into a byte stream. This process is also called as serilaization.

*pickle.dump(pythonObject, pickleDestination, pickle_protocol=None, *, fix_imports=True)* 

**Third**, close the file. 


In [5]:
output_file = open("wine_classifier.pickle", "wb") # "wb" means "write as bytes"
pickle.dump(classifier, output_file)
output_file.close()

## Load the Model 

The goal is to take information that was stored in memory at one time, then save it so it can be used later. Here specifically this is useful because training a model is usually a lot slower than using the model to make a prediction, so this saves us from having to re-run that costly operation each time.

In [6]:
model_file = open("wine_classifier.pickle", "rb") # "rb" means "read as bytes"
loaded_model = pickle.load(model_file)
model_file.close()

## Make a Prediction with the Loaded Model 

In this section I'm constructing a request JSON that resembles what would come from a user who wants a predicted class of wine based on these feature values. This code would not actually exist in your deployed application, it would be created automatically by whatever protocol generated the request.

In [7]:
# make a fake request JSON from the user with all the headings
request_json = {}

expected_features = ("Alcohol", "Malic acid", "Ash", "Alcalinity of ash", \
        "Magnesium", "Total phenols", "Flavanoids", "Nonflavanoid phenols", \
        "Proanthocyanins", "Color intensity", "Hue", \
        "OD280/OD315 of diluted wines", "Proline")
example_values = [1.282e+01, 3.370e+00, 2.300e+00, 1.950e+01, 8.800e+01, 1.480e+00, \
       6.600e-01, 4.000e-01, 9.700e-01, 1.026e+01, 7.200e-01, 1.750e+00, \
       6.850e+02]

for i, feature in enumerate(expected_features):
    request_json[feature] = example_values[i]
request_json

{'Alcohol': 12.82,
 'Malic acid': 3.37,
 'Ash': 2.3,
 'Alcalinity of ash': 19.5,
 'Magnesium': 88.0,
 'Total phenols': 1.48,
 'Flavanoids': 0.66,
 'Nonflavanoid phenols': 0.4,
 'Proanthocyanins': 0.97,
 'Color intensity': 10.26,
 'Hue': 0.72,
 'OD280/OD315 of diluted wines': 1.75,
 'Proline': 685.0}

This is the section that more closely resembles what you might have in your application. I'm checking to make sure that the expected values are in the request_json, transforming them into the right format to make a prediction, then printing out that prediction. In your actual deployed code, you would most likely be returning the response, not printing it.

In [8]:
if request_json and all(feature in request_json for feature in expected_features):
    # unpack all of the relevant values from the request into a list
    test_value = [request_json[feature] for feature in expected_features]
    
    # make a prediction from the "user input"
    predicted_class = int(loaded_model.predict([test_value])[0])
    
    # construct a response
    response_json = {"prediction": predicted_class}
    print(response_json)
else:
    print("something was missing from the request")

{'prediction': 2}


## Productionizing Models as a Career Skill
1. Many data scientists don't know how to put machine learning models into production.  
2. Putting a model into production is a mandatory skill for data scientists at most small to medium-sized companies.
3. Being able to productionize models will make you a much more attractive candidate to employers, and give you a competitive advantage!

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-data-science-and-machine-learning-engineering-online-ds-ft-100719/master/images/new-venn-diagram.png" width=60%>

> -  A decade ago, productionizing a machine learning model would have meant building your own web server with something like [Flask](http://flask.pocoo.org/) or [Django](https://www.djangoproject.com/) and hosting somewhere, just like you would with any web app. 
> - Now, we don't even need to worry about things like server code -- instead, we can use preexisting services from AWS that were created specifically to simplify the process of productionizing machine learning solutions!

## Cloud Computing

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Instructor%20Notebooks/sect_43/image/cloud_PNG27.png" alt="image3" style = "width:400px">

"Simply put, cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. You typically pay only for cloud services you use, helping you lower your operating costs, run your infrastructure more efficiently, and scale as your business needs change." - Microsoft Azure

![microsoft example](https://miro.medium.com/max/624/1*QmV2VDvgIquNx-Daxcdddw.png)

Most cloud computing services fall into four broad categories: infrastructure as a service (IaaS), platform as a service (PaaS), serverless, and software as a service (SaaS). These are sometimes called the cloud computing "stack" because they build on top of one another. Knowing what they are and how they’re different makes it easier to accomplish your business goals.

### Solves many problems:

- How can I keep my data secure yet accessible remotely?
- How can I pay less for software licenses?
- What if I need more server space in the future?
- I have more data to analyze than can fit on my computer. What can I do?
- My model has taken three days to run. Is there a faster way?

## AWS is the Most Popular & It is Intimidating!

If you want:
- to work from a Jupyter notebook locally
- to keep your analysis in a Jupyter notebook
- to store your work on git as well
- to not concerned about access or keeping data private
- the easiest and fastest solution to getting our notebook in the cloud

### Focus on the last 2 questions 
- **I have more data to analyze than can fit on my computer. What can I do?**
- **My model has taken three days to run. Is there a faster way?**

So you will likely only use **S3**, **Sagemaker**, and **IAM**. 


<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Instructor%20Notebooks/sect_43//image/aws_focus.png" alt="foci" style = "width:90%">

### Storage 

<img src="https://pngimg.com/uploads/bucket/bucket_PNG7777.png" alt="bucket" style = "text-align:left;width:200px;float:none">

#### Buckets defined

[by PC Mag](https://www.pcmag.com/encyclopedia/term/bucket)

> A customer-defined storage area in a cloud-based storage system such as Amazon's S3 or Google Storage. Each bucket can be divided into folders. Customers are not charged for the buckets themselves, only when data reside within them. See S3 cloud storage and Google Storage.

#### S3 stands for _Amazon Simple Storage Service_
Amazon uses [S3 buckets](https://aws.amazon.com/s3/) for the most general form of object storage.

<!---
<img src="https://cdn.worldvectorlogo.com/logos/aws-s3.svg"></br>--->

### Credentials 

![credentials](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRvaB5OvGWguYHBlVyagwofOP9kX0h5HqtbcIa02MyAVs_XS90McA&s)

#### Credentials Defined:

[From AWS](https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html)

> When you interact with AWS, you specify your AWS security credentials to verify who you are and whether you have permission to access the resources that you are requesting. AWS uses the security credentials to authenticate and authorize your requests.

>For example, if you want to download a specific file from an Amazon Simple Storage Service (Amazon S3) bucket, your credentials must allow that access. If your credentials aren't authorized to download the file, AWS denies your request.

#### Our approach to credentials:

Make everything public. </br>
But we will still have to work with **IAM** a bit to make things talk to each other. 

<img src="https://a0.awsstatic.com/libra-css/images/logos/aws_logo_smile_1200x630.png" alt="aws" style ="text-align:center;width:250px;float:none" ></br>

### Region

<img src="https://www.concurrencylabs.com/img/posts/9-choose-region-wisely/choose-your-aws-region.png" wiodth=30%></br>

#### Regions Defined:
[from AWS documentation](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/):
>AWS has the concept of a Region, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of multiple, isolated, and physically separate AZ's within a geographic area...

>Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra-low-latency networks. AWS customers focused on high availability can design their applications to run in multiple AZ's to achieve even greater fault-tolerance. AWS infrastructure Regions meet the highest levels of security, compliance, and data protection.

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Instructor%20Notebooks/sect_43//image/aws_regions_facts.png" alt="aws_regions" style ="text-align:center;width:500px;float:none" >

- Each time you create a new "service" in AWS, you need to define its region.
- Each region is a separate geographic area and is completely independent
- Each Amazon region is designed to be completely isolated from the other regions & helps achieve the greatest possible fault tolerance and stability
- Communication between regions is across the public Internet and appropriate measures should be taken to protect the data using encryption
- Data transfer between regions is charged at the Internet data transfer rate for both the sending and the receiving instance
- Resources aren’t replicated across regions unless done explicitly

Here are some real factors impacted by your choice of region:

    - Latency
    - Cost
    - Legal Compliance
    - Features

## Amazon Sagemaker
<img src="https://d2908q01vomqb2.cloudfront.net/77de68daecd823babbb58edb1c8e14d7106e83bb/2018/04/24/SageMaker.jpg" alt="sagemaker" style ="text-align:center;width:250px;float:none" ></br>

> ***SageMaker is a platform created by Amazon to centralize all the various services related to Data Science and Machine Learning. If you're a data scientist working on AWS, chances are that you'll be spending most (if not all) of your time in SageMaker getting things done.***


> * Amazon has centralized all of the major data science services inside **_Amazon SageMaker_**. SageMaker provides numerous services for things such as:
    * Data Labeling
    * Cloud-based Notebooks
    * Training and Model Tuning
    * Inference
    
#### SageMaker Components
<img src="https://raw.githubusercontent.com/learn-co-students/dsc-introduction-to-aws-sagemaker-online-ds-ft-100719/master/images/use_cases.png">


## Overview of the Process 

When productionizing a machine learning model using AWS, you'll typically use the following workflow:

1. Explore and preprocess data
2. Build SageMaker container (Docker)
3. Test training and inference code on your local machine 
4. Train and deploy model with SageMaker

## Resources 
   - [Canvas Lesson: AWS Ecosystem](https://github.com/learn-co-curriculum/dsc-the-aws-ecosystem) (has account/IAM setup)
   - [Canvas Lesson: Intro to Sagemaker](https://github.com/learn-co-curriculum/dsc-introduction-to-aws-sagemaker)
   - [Canvas Lesson: Productionizing Models with SageMaker](https://learn.co/tracks/module-4-data-science-career-2-1/big-data-deep-learning-and-natural-language-processing/section-43-operationalizing-code-and-aws/productionizing-a-model-with-docker-and-sagemaker)
   - **[AWS: Official SageMaker TutorialRepo](https://github.com/aws-samples/amazon-sagemaker-keras-text-classification)**
