# Advertisement Model
From Raywenderlich's tutorial [Beginning Machine Learning with scikit-learn](https://www.raywenderlich.com/174-beginning-machine-learning-with-scikit-learn#toc-anchor-006)

In [70]:
import pandas as pd

# read dataset
df = pd.read_csv("Advertising.csv", usecols=[1, 2, 3, 4])
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


### Input and Output Columns
- X = input
- y = output

In [71]:
X = df.iloc[:, :-1] # TV, radio, newspaper columns
y = df.iloc[:, -1] # Sales column


## Training an Validating Linear Regression Model


### Splitting Dataset
To train and validate a model, you need to split the data into two sets:
- **Training set**: Used to train the model. These samples are used as inputs into the machine learning algorithms.
- **Testing set**: Not yet seen by the model, this set is used to test or validate the model. Since the sales for the test set are already known and independent from the training set, the test set can be used to get a score for how well the model was trained using the training set.

##### `train_test_split` function [here](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
1. **X**: The inputs (spending amounts) that we read in from the Advertisments.csv sample data.
2. **y**: The output (number of sales) also from the sample data.
3. **test_size**: The percentage of the data to use for testing, this is typically set to be anywhere from 25% to 40%.
4. **random_state**: If not entered, the function will randomly select rows to use for the train and test samples. In production, this is exactly what you want, but for development and tutorials like this one, it’s important to get consistent results so you know where to look when things go wrong.

In [72]:
import sklearn.model_selection as ms

X_train, X_test, y_train, y_test = ms.train_test_split(X, y, test_size=0.25, random_state=42)

### Steps to train most scikit-learn models
1. Create model
2. Train model using `fit` function
3. Validate model

### Creating Linear Model

In [73]:
import sklearn.linear_model as lm

regr = lm.LinearRegression()  # creating a linear regression model object
regr.fit(X_train, y_train)    # For scikit-learn models, the fit method always trains the model
regr.score(X_test, y_test)    # determines how good the model is

0.8935163320163657

In [74]:
# use the linear regression object to predict sales for new input values
X_new = [[ 50.0, 150.0, 150.0],
         [250.0,  50.0,  50.0],
         [100.0, 125.0, 125.0]]

regr.predict(X_new)

# if you spent $50k, $150k and $150k on marketing for the three platforms, 
# you can expect sales of 34,150 units!

array([34.15367536, 23.83792444, 31.57473763])

## Training and Validating a Support Vector Machine Model
- SVM is a supervised machine learning model for classification tasks. It's really good at doing both classification and regression simultaneously.
- If you have a lot of data that needs to be classified, an SVM can help you achieve that

For more about SVM, check out my [Day 3 note in DS2.1](https://github.com/SamuelFolledo/DS2.1-Machine-Learning/blob/master/classwork/Day%201%20-%205.ipynb)

In [75]:
import sklearn.svm as svm

svr = svm.LinearSVR(random_state=42)
svr.fit(X_train, y_train)
svr.score(X_test, y_test)



0.7789504345748774

In [76]:
svr.predict(X_new)

array([20.8913895 , 24.34919798, 21.75584162])

## Converting the Model to Apple’s Core ML Format
With the model built, it’s time to export it to Core ML

#### The coremltools.converters.sklearn.convert function takes the following parameters:
1. The scikit-learn model to be converted.
2. Input and output feature names that Xcode will use to generate a Swift class interface.

In [85]:
# from sklearn.tree import DecisionTreeClassifier, export_graphviz
import coremltools

input_features = ["tv", "radio", "newspaper"]
output_feature = "sales"

# Convert and save the scikit-learn model
model = coremltools.converters.sklearn.convert(regr, input_features, output_feature)
model.save("Advertising.mlmodel")

NameError: name '_tree' is not defined

## Apple's method
from https://apple.github.io/coremltools/generated/coremltools.converters.sklearn.convert.html

In [82]:
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import coremltools
from sklearn.linear_model import LinearRegression
# from ._converter_internal import _convert_sklearn_model

inputs = ["TV", "radio", "newspaper"]
output = "sales"

model = LinearRegression()
model.fit(df[inputs], df[output])

# tree.export_graphviz(model)

coreml_model = coremltools.converters.sklearn.convert(model, inputs, output)
coreml_model.save('Advertising.mlmodel')

NameError: name '_tree' is not defined