# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [2]:
# import libraries
import re
import nltk
nltk.download('punkt')
nltk.download('stopwords')
import pandas as pd
import numpy as np
from sqlalchemy import create_engine, inspect
from nltk.tokenize import word_tokenize
from sklearn.multioutput import MultiOutputClassifier
from sklearn.neighbors import KNeighborsClassifier, BallTree
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, HashingVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import QuantileTransformer
from sklearn.neural_network import MLPClassifier
import time
from sklearn.metrics import classification_report


[nltk_data] Downloading package punkt to /Users/akniels1/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/akniels1/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [3]:
# load data from database and get table names
engine = create_engine('sqlite:///DisasterResponse.db')
inspector = inspect(engine)
inspector.get_table_names()

['Message']

In [4]:
# read tables into a Pandas Dataframe
df = pd.read_sql_table('Message', engine)
df

Unnamed: 0,index,id,message,original,genre,related,request,offer,aid_related,medical_help,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
0,0,2,Weather update - a cold front from Cuba that c...,Un front froid se retrouve sur Cuba ce matin. ...,direct,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,7,Is the Hurricane over or is it not over,Cyclone nan fini osinon li pa fini,direct,1,0,0,1,0,...,0,0,1,0,1,0,0,0,0,0
2,2,8,Looking for someone but no name,"Patnm, di Maryani relem pou li banm nouvel li ...",direct,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,9,UN reports Leogane 80-90 destroyed. Only Hospi...,UN reports Leogane 80-90 destroyed. Only Hospi...,direct,1,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,4,12,"says: west side of Haiti, rest of the country ...",facade ouest d Haiti et le reste du pays aujou...,direct,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26211,26381,30261,The training demonstrated how to enhance micro...,,news,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26212,26382,30262,A suitable candidate has been selected and OCH...,,news,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26213,26383,30263,"Proshika, operating in Cox's Bazar municipalit...",,news,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26214,26384,30264,"Some 2,000 women protesting against the conduc...",,news,1,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


### 2. Write a tokenization function to process your text data

In [5]:
## Function to tokenize a text 
def tokenize(text):
    text = text.lower()
    token = word_tokenize(text)
    words = [w for w in token if w not in stopwords.words('english')]
    stemmed = [PorterStemmer().stem(w) for w in words]
    return token

### 3. Build a machine learning pipeline
This machine pipeline should take in the `message` column as input and output classification results on the other 36 categories in the dataset. You may find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [6]:
## Pipline to transform data First counts the words based on the tokenizer
## Second turn the count matrix into a normalized term frequency
## third the predictive Multi ouput classifier with Kmeans is applied on the data
pipeline = Pipeline([
    ('vectorizer', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(KNeighborsClassifier()))])

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [7]:
## create the x and y variables and Split the data into training and testing data
x = df['message']
y = df.drop(['index', 'id', 'message', 'genre', 'original'], axis=1)
x_train , x_test , y_train ,y_test  = train_test_split (x, y , test_size = .33, random_state = 45)

### 5. Test your model
Report the f1 score, precision and recall for each output category of the dataset. You can do this by iterating through the columns and calling sklearn's `classification_report` on each.

In [8]:
## fit the data to training and predict on testing
tic = time.perf_counter()
pipeline.fit(x_train, y_train)
y_pred = pipeline.predict(x_test)
toc = time.perf_counter()
print(f"fit the function in {toc - tic:0.4f} seconds")
#https://realpython.com/python-timer/

fit the function in 381.0939 seconds


In [9]:
## for loop created to test f1 score for the predicted data
for index ,column in enumerate(y_test, 0):
    f1 = classification_report(y_test[column].values, [row[index] for row in y_pred])
    print( column)
    print(f1)

related
              precision    recall  f1-score   support

           0       0.51      0.53      0.52      1993
           1       0.85      0.85      0.85      6588
           2       0.71      0.21      0.33        71

    accuracy                           0.77      8652
   macro avg       0.69      0.53      0.56      8652
weighted avg       0.77      0.77      0.77      8652

request
              precision    recall  f1-score   support

           0       0.91      0.95      0.93      7227
           1       0.69      0.51      0.59      1425

    accuracy                           0.88      8652
   macro avg       0.80      0.73      0.76      8652
weighted avg       0.87      0.88      0.87      8652

offer
              precision    recall  f1-score   support

           0       0.99      1.00      1.00      8603
           1       0.00      0.00      0.00        49

    accuracy                           0.99      8652
   macro avg       0.50      0.50      0.50      865

shops
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      8619
           1       0.00      0.00      0.00        33

    accuracy                           1.00      8652
   macro avg       0.50      0.50      0.50      8652
weighted avg       0.99      1.00      0.99      8652

aid_centers
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      8557
           1       0.07      0.02      0.03        95

    accuracy                           0.99      8652
   macro avg       0.53      0.51      0.51      8652
weighted avg       0.98      0.99      0.98      8652

other_infrastructure
              precision    recall  f1-score   support

           0       0.96      0.99      0.97      8277
           1       0.20      0.04      0.07       375

    accuracy                           0.95      8652
   macro avg       0.58      0.52      0.52      8652
weighted avg       0.93      0.95  

  _warn_prf(average, modifier, msg_start, len(result))


### 6. Improve your model
Use grid search to find better parameters. 

In [11]:
## gridsearch allows you to run a test different parameters on predictions the parameters below are tested

# new_pipeline = Pipeline([
#     ('vectorizer', CountVectorizer(tokenizer=tokenize)),
#     ('tfidf', TfidfTransformer()),
#     ])

# new_pipeline.fit(x_train, y_train)
# new_pipel = new_pipeline.transform(x_train, y_train)
parameters = {
        'vectorizer__ngram_range': ((1, 1), (1, 2)),
        'tfidf__norm' : ('l1', 'l2'),
        'clf__estimator__leaf_size': (20, 30, 50),
        'clf__n_jobs': (1, 2, 3),
    }

cv = GridSearchCV(pipeline, param_grid=parameters,  verbose = 3 )
pipeline.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'vectorizer', 'tfidf', 'clf', 'vectorizer__analyzer', 'vectorizer__binary', 'vectorizer__decode_error', 'vectorizer__dtype', 'vectorizer__encoding', 'vectorizer__input', 'vectorizer__lowercase', 'vectorizer__max_df', 'vectorizer__max_features', 'vectorizer__min_df', 'vectorizer__ngram_range', 'vectorizer__preprocessor', 'vectorizer__stop_words', 'vectorizer__strip_accents', 'vectorizer__token_pattern', 'vectorizer__tokenizer', 'vectorizer__vocabulary', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'clf__estimator__algorithm', 'clf__estimator__leaf_size', 'clf__estimator__metric', 'clf__estimator__metric_params', 'clf__estimator__n_jobs', 'clf__estimator__n_neighbors', 'clf__estimator__p', 'clf__estimator__weights', 'clf__estimator', 'clf__n_jobs'])

### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.  

Since this project focuses on code quality, process, and  pipelines, there is no minimum performance metric needed to pass. However, make sure to fine tune your models for accuracy, precision and recall to make your project stand out - especially for your portfolio!

In [12]:
tic = time.perf_counter()
cv.fit(x_train, y_train)
toc = time.perf_counter()
print(f"fit the function in {toc - tic:0.4f} seconds")
#y_pred = cv.predict(x_test)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.045, total= 4.5min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.5min remaining:    0.0s


[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.040, total= 3.3min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  7.8min remaining:    0.0s


[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.047, total= 2.7min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.044, total= 2.7min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.043, total= 2.8min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 2), score=0.021, total= 2.6min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=1, tfidf__norm=l1, vectorizer__ngram_range=(1, 2), score

[CV]  clf__estimator__leaf_size=20, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.239, total= 2.1min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.045, total= 1.9min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.040, total= 1.9min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score=0.047, total= 1.9min
[CV] clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=20, clf__n_jobs=3, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), score

[CV]  clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.201, total=153.5min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.235, total=63.5min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.215, total=63.5min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=1, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.239, total=63.5min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=2, tfidf__norm=l1, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=2, tfidf__norm=l1, vectorizer__ngram_range=(1, 1), scor

[CV]  clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.225, total= 2.9min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.227, total= 3.2min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.225, total= 2.3min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score=0.201, total= 2.1min
[CV] clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2) 
[CV]  clf__estimator__leaf_size=30, clf__n_jobs=3, tfidf__norm=l2, vectorizer__ngram_range=(1, 2), score

[CV]  clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.221, total= 2.2min
[CV] clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.206, total= 2.1min
[CV] clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.230, total= 2.2min
[CV] clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score=0.225, total= 2.1min
[CV] clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1) 
[CV]  clf__estimator__leaf_size=50, clf__n_jobs=2, tfidf__norm=l2, vectorizer__ngram_range=(1, 1), score

[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed: 1485.0min finished


fit the function in 27910.0249 seconds


In [13]:
print("Best parameters set found on development set:")
print()
print(cv.best_params_)
print()
print("Grid scores on development set:")
print()
means = cv.cv_results_['mean_test_score']
stds = cv.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, cv.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
            % (mean, std * 2, params))
print()

print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
for index ,column in enumerate(y_test, 0):
    f1 = classification_report(y_test[column].values, [row[index] for row in y_pred])
    print( column)
    print(f1)


Best parameters set found on development set:

{'clf__estimator__leaf_size': 20, 'clf__n_jobs': 1, 'tfidf__norm': 'l2', 'vectorizer__ngram_range': (1, 2)}

Grid scores on development set:

0.044 (+/-0.004) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 1, 'tfidf__norm': 'l1', 'vectorizer__ngram_range': (1, 1)}
0.019 (+/-0.005) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 1, 'tfidf__norm': 'l1', 'vectorizer__ngram_range': (1, 2)}
0.222 (+/-0.017) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 1, 'tfidf__norm': 'l2', 'vectorizer__ngram_range': (1, 1)}
0.223 (+/-0.028) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 1, 'tfidf__norm': 'l2', 'vectorizer__ngram_range': (1, 2)}
0.044 (+/-0.004) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 2, 'tfidf__norm': 'l1', 'vectorizer__ngram_range': (1, 1)}
0.019 (+/-0.005) for {'clf__estimator__leaf_size': 20, 'clf__n_jobs': 2, 'tfidf__norm': 'l1', 'vectorizer__ngram_range': (1, 2)}
0.222 (+/-0.017) for {'clf__estimator

money
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      8451
           1       0.63      0.08      0.15       201

    accuracy                           0.98      8652
   macro avg       0.80      0.54      0.57      8652
weighted avg       0.97      0.98      0.97      8652

missing_people
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      8551
           1       0.57      0.04      0.07       101

    accuracy                           0.99      8652
   macro avg       0.78      0.52      0.53      8652
weighted avg       0.98      0.99      0.98      8652

refugees
              precision    recall  f1-score   support

           0       0.97      0.97      0.97      8360
           1       0.12      0.11      0.11       292

    accuracy                           0.94      8652
   macro avg       0.55      0.54      0.54      8652
weighted avg       0.94      0.94      0.94 

  _warn_prf(average, modifier, msg_start, len(result))


fire
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      8567
           1       0.00      0.00      0.00        85

    accuracy                           0.99      8652
   macro avg       0.50      0.50      0.50      8652
weighted avg       0.98      0.99      0.99      8652

earthquake
              precision    recall  f1-score   support

           0       0.95      0.97      0.96      7853
           1       0.59      0.45      0.51       799

    accuracy                           0.92      8652
   macro avg       0.77      0.71      0.73      8652
weighted avg       0.91      0.92      0.92      8652

cold
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      8476
           1       0.44      0.08      0.13       176

    accuracy                           0.98      8652
   macro avg       0.71      0.54      0.56      8652
weighted avg       0.97      0.98      0.97      8652

### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

In [18]:
## New algorithm is tested ()
pipeline2 = Pipeline([
 ('vectorizer', HashingVectorizer(tokenizer=tokenize)),
#     ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(MLPClassifier(hidden_layer_sizes = 20, verbose = True,
                                               n_iter_no_change=3)))
])

pipeline2.get_params().keys()


dict_keys(['memory', 'steps', 'verbose', 'vectorizer', 'clf', 'vectorizer__alternate_sign', 'vectorizer__analyzer', 'vectorizer__binary', 'vectorizer__decode_error', 'vectorizer__dtype', 'vectorizer__encoding', 'vectorizer__input', 'vectorizer__lowercase', 'vectorizer__n_features', 'vectorizer__ngram_range', 'vectorizer__norm', 'vectorizer__preprocessor', 'vectorizer__stop_words', 'vectorizer__strip_accents', 'vectorizer__token_pattern', 'vectorizer__tokenizer', 'clf__estimator__activation', 'clf__estimator__alpha', 'clf__estimator__batch_size', 'clf__estimator__beta_1', 'clf__estimator__beta_2', 'clf__estimator__early_stopping', 'clf__estimator__epsilon', 'clf__estimator__hidden_layer_sizes', 'clf__estimator__learning_rate', 'clf__estimator__learning_rate_init', 'clf__estimator__max_fun', 'clf__estimator__max_iter', 'clf__estimator__momentum', 'clf__estimator__n_iter_no_change', 'clf__estimator__nesterovs_momentum', 'clf__estimator__power_t', 'clf__estimator__random_state', 'clf__esti

In [19]:
tic = time.perf_counter()
pipeline2.fit(x_train, y_train)
toc = time.perf_counter()
print(f"fit the function in {toc - tic:0.4f} seconds")



Iteration 1, loss = 0.74972201
Iteration 2, loss = 0.53041847
Iteration 3, loss = 0.45341016
Iteration 4, loss = 0.39324252
Iteration 5, loss = 0.34161767
Iteration 6, loss = 0.30146988
Iteration 7, loss = 0.26712302
Iteration 8, loss = 0.23813985
Iteration 9, loss = 0.21402829
Iteration 10, loss = 0.19413748
Iteration 11, loss = 0.17720734
Iteration 12, loss = 0.16234647
Iteration 13, loss = 0.14945281
Iteration 14, loss = 0.13820501
Iteration 15, loss = 0.12840909
Iteration 16, loss = 0.12005774
Iteration 17, loss = 0.11206944
Iteration 18, loss = 0.10524480
Iteration 19, loss = 0.09909823
Iteration 20, loss = 0.09384759
Iteration 21, loss = 0.08844312
Iteration 22, loss = 0.08392254
Iteration 23, loss = 0.07986421
Iteration 24, loss = 0.07610122
Iteration 25, loss = 0.07237545
Iteration 26, loss = 0.06943081
Iteration 27, loss = 0.06637266
Iteration 28, loss = 0.06355163
Iteration 29, loss = 0.06105759
Iteration 30, loss = 0.05877114
Iteration 31, loss = 0.05651297
Iteration 32, los



KeyboardInterrupt: 

In [None]:
for index , column in enumerate(y_test, 0):
    f1 = classification_report(y_test[column].values, [row[index] for row in y_pred2])
    print(column)
    print(f1)

### 9. Export your model as a pickle file

In [20]:
 import pickle

#
# Create your model here (same as above)
#

# Save to file in the current working directory
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(pipeline, file)

# Load from file
with open(pkl_filename, 'rb') as file:
    pickle_model = pickle.load(file)
    
##https://stackabuse.com/scikit-learn-save-and-restore-models/

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.