# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [1]:
# Import libraries
import sys
import pandas as pd
from sqlalchemy import create_engine
import re
import pickle
import nltk

from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, make_scorer
from sklearn.model_selection import GridSearchCV

In [3]:
# load data from database
engine = create_engine('sqlite:///Messages.db')
df = pd.read_sql("SELECT * FROM Messages", engine)

X = df['message']
Y = df.drop(['index', 'message', 'original', 'genre'], axis = 1)

cat_names = df.drop(["index", "message", "original", "genre"], axis=1).columns

df

Unnamed: 0,index,message,original,genre,related,request,offer,aid_related,medical_help,medical_products,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
0,0,Weather update - a cold front from Cuba that c...,Un front froid se retrouve sur Cuba ce matin. ...,direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,Is the Hurricane over or is it not over,Cyclone nan fini osinon li pa fini,direct,1,0,0,1,0,0,...,0,0,1,0,1,0,0,0,0,0
2,2,Looking for someone but no name,"Patnm, di Maryani relem pou li banm nouvel li ...",direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,UN reports Leogane 80-90 destroyed. Only Hospi...,UN reports Leogane 80-90 destroyed. Only Hospi...,direct,1,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
4,4,"says: west side of Haiti, rest of the country ...",facade ouest d Haiti et le reste du pays aujou...,direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26381,26381,The training demonstrated how to enhance micro...,,news,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26382,26382,A suitable candidate has been selected and OCH...,,news,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26383,26383,"Proshika, operating in Cox's Bazar municipalit...",,news,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26384,26384,"Some 2,000 women protesting against the conduc...",,news,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2. Write a tokenization function to process your text data

In [4]:
def tokenize(text):
    """
    The function converts the message into into tokens

    Input:
    Disaster response messages

    Output:
    Tokens to be used as inputs for the model
    """

    url_regex = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    detected_urls = re.findall(url_regex, text)
    for url in detected_urls:
        text = text.replace(url, "urlplaceholder")

    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()

    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)

    return clean_tokens

### 3. Build a machine learning pipeline
- You'll find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [5]:
pipeline = Pipeline([
    ('vect', CountVectorizer(tokenizer = tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))
])

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [6]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

pipeline.fit(X_train, Y_train)

Pipeline(steps=[('vect',
                 CountVectorizer(tokenizer=<function tokenize at 0x0000017AE699E940>)),
                ('tfidf', TfidfTransformer()),
                ('clf',
                 MultiOutputClassifier(estimator=RandomForestClassifier()))])

### 5. Test your model
Report the f1 score, precision and recall on both the training set and the test set. You can use sklearn's `classification_report` function here. 

In [9]:
#Evaluation for train set
Y_train_pred = pipeline.predict(X_train)

evaluation_metrics = []
    
# Calculate evaluation metrics for each set of labels
for i in range(len(cat_names)):
        accuracy = accuracy_score(Y_train.iloc[:, i].values, Y_train_pred[:, i])
        f1 = f1_score(Y_train.iloc[:, i].values, Y_train_pred[:, i], average='weighted')
        precision = precision_score(Y_train.iloc[:, i].values, Y_train_pred[:, i], average='weighted')
        recall = recall_score(Y_train.iloc[:, i].values, Y_train_pred[:, i], average='weighted')
                
        evaluation_metrics.append([accuracy, f1, precision, recall])
        
cols = ['Accuracy', 'F1', 'Precision', 'Recall']
evaluation_metrics_df = pd.DataFrame(data = evaluation_metrics, index = cat_names, columns = cols)                
   
evaluation_metrics_df

Unnamed: 0,Accuracy,F1,Precision,Recall
related,0.998673,0.998673,0.998673,0.998673
request,0.999526,0.999526,0.999526,0.999526
offer,0.999953,0.999953,0.999953,0.999953
aid_related,0.999431,0.999431,0.999432,0.999431
medical_help,0.999621,0.999621,0.999621,0.999621
medical_products,0.999763,0.999763,0.999763,0.999763
search_and_rescue,0.999905,0.999905,0.999905,0.999905
security,0.99981,0.99981,0.999811,0.99981
military,0.999668,0.999669,0.99967,0.999668
child_alone,1.0,1.0,1.0,1.0


In [10]:
#Evaluation for test set
Y_pred = pipeline.predict(X_test)

evaluation_metrics2 = []
    
# Calculate evaluation metrics for each set of labels
for i in range(len(cat_names)):
        accuracy = accuracy_score(Y_test.iloc[:, i].values, Y_pred[:, i])
        f1 = f1_score(Y_test.iloc[:, i].values, Y_pred[:, i], average='weighted')
        precision = precision_score(Y_test.iloc[:, i].values, Y_pred[:, i], average='weighted')
        recall = recall_score(Y_test.iloc[:, i].values, Y_pred[:, i], average='weighted')
                
        evaluation_metrics2.append([accuracy, f1, precision, recall])
        
evaluation_metrics_df2 = pd.DataFrame(data = evaluation_metrics2, index = cat_names, columns = cols)                
   
evaluation_metrics_df2

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Accuracy,F1,Precision,Recall
related,0.807124,0.771568,0.796149,0.807124
request,0.890299,0.874732,0.888271,0.890299
offer,0.995642,0.993468,0.991304,0.995642
aid_related,0.766957,0.760271,0.77228,0.766957
medical_help,0.920993,0.889238,0.892263,0.920993
medical_products,0.954149,0.935255,0.945665,0.954149
search_and_rescue,0.972149,0.960006,0.961159,0.972149
security,0.980485,0.971011,0.961719,0.980485
military,0.969307,0.956572,0.95988,0.969307
child_alone,1.0,1.0,1.0,1.0


Considering the test set, for the vast majority of the categories, the indicators Accuracy, F1, Precision and Recall are very good.

### 6. Improve your model
Use grid search to find better parameters. 

The following code is very handy to choose how to improve the model. 

In [11]:
pipeline.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'vect', 'tfidf', 'clf', 'vect__analyzer', 'vect__binary', 'vect__decode_error', 'vect__dtype', 'vect__encoding', 'vect__input', 'vect__lowercase', 'vect__max_df', 'vect__max_features', 'vect__min_df', 'vect__ngram_range', 'vect__preprocessor', 'vect__stop_words', 'vect__strip_accents', 'vect__token_pattern', 'vect__tokenizer', 'vect__vocabulary', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'clf__estimator__bootstrap', 'clf__estimator__ccp_alpha', 'clf__estimator__class_weight', 'clf__estimator__criterion', 'clf__estimator__max_depth', 'clf__estimator__max_features', 'clf__estimator__max_leaf_nodes', 'clf__estimator__max_samples', 'clf__estimator__min_impurity_decrease', 'clf__estimator__min_impurity_split', 'clf__estimator__min_samples_leaf', 'clf__estimator__min_samples_split', 'clf__estimator__min_weight_fraction_leaf', 'clf__estimator__n_estimators', 'clf__estimator__n_jobs', 'clf__estimator__oob_score', 'clf_

I use 'clf__estimator__class_weight': ['balanced'] to obtain more diversified results. In other words, to avoid possible issues of unbalanced data.

In [12]:
# Create grid search object

parameters = {'vect__min_df': [1, 5, 10],
                  'clf__estimator__n_estimators':[5, 10, 100], 
                  'clf__estimator__min_samples_split':[2, 5, 10],
                 'clf__estimator__class_weight': ['balanced']}
    
cv = GridSearchCV(pipeline, param_grid = parameters, verbose=15)

# Find best parameters
improved_model = cv.fit(X_train, Y_train)

Fitting 5 folds for each of 27 candidates, totalling 135 fits
[CV 1/5; 1/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1
[CV 1/5; 1/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1; total time=  28.1s
[CV 2/5; 1/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1
[CV 2/5; 1/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1; total time=  27.9s
[CV 3/5; 1/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1
[CV 3/5; 1/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, vect__min_df=1; total time=  25.4s
[CV 4/5; 1/27] START

[CV 2/5; 6/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10; total time=  31.9s
[CV 3/5; 6/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10
[CV 3/5; 6/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10; total time=  32.1s
[CV 4/5; 6/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10
[CV 4/5; 6/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10; total time=  33.0s
[CV 5/5; 6/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=10, vect__min_df=10
[CV 5/5; 6/27] END clf__estimator__class_weight=balanced, clf__estimat

[CV 4/5; 11/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=5; total time=  17.9s
[CV 5/5; 11/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=5
[CV 5/5; 11/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=5; total time=  17.8s
[CV 1/5; 12/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=10
[CV 1/5; 12/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=10; total time=  18.5s
[CV 2/5; 12/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, vect__min_df=10
[CV 2/5; 12/27] END clf__estimator__class_weight=balanced, clf__estimator

[CV 1/5; 17/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5; total time= 3.8min
[CV 2/5; 17/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5
[CV 2/5; 17/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5; total time= 3.7min
[CV 3/5; 17/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5
[CV 3/5; 17/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5; total time= 3.7min
[CV 4/5; 17/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, vect__min_df=5
[CV 4/5; 17/27] END clf__estimator__class_weight=balanced, clf__

[CV 3/5; 22/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=1; total time=  29.4s
[CV 4/5; 22/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=1
[CV 4/5; 22/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=1; total time=  29.9s
[CV 5/5; 22/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=1
[CV 5/5; 22/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=1; total time=  29.9s
[CV 1/5; 23/27] START clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=10, vect__min_df=5
[CV 1/5; 23/27] END clf__estimator__class_weight=balanced, clf__

[CV 5/5; 27/27] END clf__estimator__class_weight=balanced, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, vect__min_df=10; total time= 3.4min


In [13]:
# Get results of grid search
improved_model.cv_results_

{'mean_fit_time': array([ 25.55825329,  19.38959546,  18.45180459,  42.18878241,
         30.55902081,  30.31473575, 383.92760444, 269.53713989,
        248.64659367,  19.48001733,  16.11916809,  16.4491293 ,
         32.12345943,  26.23083386,  26.31637487, 281.46418147,
        213.23352289, 208.71761217,  17.06936746,  15.19377394,
         15.10148935,  27.68374815,  24.17402663,  24.18325038,
        224.31572628, 185.46194553, 193.81519351]),
 'std_fit_time': array([ 1.01162297,  0.88821016,  0.39889281,  0.68485288,  0.39689865,
         0.40929538, 14.46293815, 22.35830918,  0.86816545,  0.39999492,
         0.10514091,  0.4242119 ,  0.42307734,  0.54156135,  0.26795766,
        12.66799007,  3.67573757,  0.93714545,  0.41340489,  0.4965146 ,
         0.16819567,  0.16236872,  0.69967123,  0.17771626, 13.96082702,
         2.76585962,  6.03232954]),
 'mean_score_time': array([1.97560911, 1.83447833, 1.75222001, 2.26797581, 1.96205149,
        2.07253504, 9.08536949, 7.27058592,

In [14]:
improved_model.best_params_

{'clf__estimator__class_weight': 'balanced',
 'clf__estimator__min_samples_split': 10,
 'clf__estimator__n_estimators': 100,
 'vect__min_df': 10}

### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.

In [15]:
#Evaluation for improved model (test set)
Y_pred_improved_model = improved_model.predict(X_test)

evaluation_metrics3 = []
    
# Calculate evaluation metrics for each set of labels
for i in range(len(cat_names)):
        accuracy = accuracy_score(Y_test.iloc[:, i].values, Y_pred_improved_model[:, i])
        f1 = f1_score(Y_test.iloc[:, i].values, Y_pred_improved_model[:, i], average='weighted')
        precision = precision_score(Y_test.iloc[:, i].values, Y_pred_improved_model[:, i], average='weighted')
        recall = recall_score(Y_test.iloc[:, i].values, Y_pred_improved_model[:, i], average='weighted')
                
        evaluation_metrics3.append([accuracy, f1, precision, recall])
        
evaluation_metrics_df3 = pd.DataFrame(data = evaluation_metrics3, index = cat_names, columns = cols)                
   
evaluation_metrics_df3

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Accuracy,F1,Precision,Recall
related,0.812997,0.797293,0.797635,0.812997
request,0.902236,0.897904,0.897075,0.902236
offer,0.995263,0.993279,0.991302,0.995263
aid_related,0.779272,0.777409,0.778471,0.779272
medical_help,0.929898,0.916198,0.916006,0.929898
medical_products,0.959454,0.949886,0.951929,0.959454
search_and_rescue,0.972527,0.961188,0.96337,0.972527
security,0.980485,0.971011,0.961719,0.980485
military,0.971201,0.962115,0.964338,0.971201
child_alone,1.0,1.0,1.0,1.0


### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

In [16]:
# Using SVM instead of Random Forest Classifier
pipeline2 = Pipeline([
    ('vect', CountVectorizer(tokenizer = tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(KNeighborsClassifier()))
])
                 
parameters2 = {'clf__estimator__n_neighbors': [1,5],  'clf__estimator__leaf_size': [2,5], 'clf__estimator__p': [1,5]}

cv2 = GridSearchCV(pipeline2, param_grid = parameters2, verbose = 15)

# Find best parameters
improved_model2 = cv2.fit(X_train, Y_train)

Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV 1/5; 1/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 1/5; 1/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 2.6min
[CV 2/5; 1/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 2/5; 1/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 2.6min
[CV 3/5; 1/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 3/5; 1/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 2.6min
[CV 4/5; 1/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 4/5; 1/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 2.6min
[CV 5/5; 1/8] START clf__estimator__leaf_size=2, clf__estimator__n_n

Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 1/5; 2/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   6.5s
[CV 2/5; 2/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 2/5; 2/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   6.1s
[CV 3/5; 2/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 3/5; 2/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   5.8s
[CV 4/5; 2/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 4/5; 2/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   6.3s
[CV 5/5; 2/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 5/5; 2/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   6.1s
[CV 1/5; 3/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 1/5; 3/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 2.9min
[CV 2/5; 3/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 2/5; 3/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 2.9min
[CV 3/5; 3/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 3/5; 3/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 2.8min
[CV 4/5; 3/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 4/5; 3/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 2.9min
[CV 5/5; 3

Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 1/5; 4/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   5.6s
[CV 2/5; 4/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 2/5; 4/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   5.8s
[CV 3/5; 4/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 3/5; 4/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.1s
[CV 4/5; 4/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 4/5; 4/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   5.7s
[CV 5/5; 4/8] START clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 5/5; 4/8] END clf__estimator__leaf_size=2, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.4s
[CV 1/5; 5/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 1/5; 5/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time=642.6min
[CV 2/5; 5/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 2/5; 5/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 3.0min
[CV 3/5; 5/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 3/5; 5/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 3.1min
[CV 4/5; 5/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1
[CV 4/5; 5/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=1; total time= 3.4min
[CV 5/5; 

Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 1/5; 6/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   7.8s
[CV 2/5; 6/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 2/5; 6/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   7.3s
[CV 3/5; 6/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 3/5; 6/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   6.9s
[CV 4/5; 6/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 4/5; 6/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=   8.4s
[CV 5/5; 6/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 5/5; 6/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=1, clf__estimator__p=5; total time=  10.2s
[CV 1/5; 7/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 1/5; 7/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 3.2min
[CV 2/5; 7/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 2/5; 7/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 3.7min
[CV 3/5; 7/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 3/5; 7/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 3.3min
[CV 4/5; 7/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1
[CV 4/5; 7/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=1; total time= 3.7min
[CV 5/5; 7

Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 1/5; 8/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   7.4s
[CV 2/5; 8/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 2/5; 8/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.4s
[CV 3/5; 8/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 3/5; 8/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.6s
[CV 4/5; 8/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 4/5; 8/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.2s
[CV 5/5; 8/8] START clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5


Traceback (most recent call last):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 368, in fit
    super().fit(X, Y, sample_weight, **fit_params)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\sklearn\multioutput.py", line 178, in fit
    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\kaypa\anaconda3\lib\site-packages\joblib\parallel.py", line 784, in _d

[CV 5/5; 8/8] END clf__estimator__leaf_size=5, clf__estimator__n_neighbors=5, clf__estimator__p=5; total time=   6.1s


The task of improving the model is not easy, as the Random Forest classifier is an outstanding algorithm. The choice of the "KNeighborsClassifier()" mostly reflected that what was possible. Other algorithms, such as SVC, were very difficult to fit.

Again, the following code is very important to know how to build new gridseach parameters.

In [17]:
pipeline2.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'vect', 'tfidf', 'clf', 'vect__analyzer', 'vect__binary', 'vect__decode_error', 'vect__dtype', 'vect__encoding', 'vect__input', 'vect__lowercase', 'vect__max_df', 'vect__max_features', 'vect__min_df', 'vect__ngram_range', 'vect__preprocessor', 'vect__stop_words', 'vect__strip_accents', 'vect__token_pattern', 'vect__tokenizer', 'vect__vocabulary', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'clf__estimator__algorithm', 'clf__estimator__leaf_size', 'clf__estimator__metric', 'clf__estimator__metric_params', 'clf__estimator__n_jobs', 'clf__estimator__n_neighbors', 'clf__estimator__p', 'clf__estimator__weights', 'clf__estimator', 'clf__n_jobs'])

In [18]:
# Get results of grid search
improved_model2.cv_results_

{'mean_fit_time': array([6.72918453, 6.25690079, 6.50566206, 6.03421407, 8.78640513,
        8.23338628, 7.26301079, 6.65028253]),
 'std_fit_time': array([0.14819937, 0.2353198 , 0.40256539, 0.27445224, 2.90650204,
        1.14999137, 0.63207618, 0.47666302]),
 'mean_score_time': array([ 150.50334959,    0.        ,  167.38431988,    0.        ,
        7852.31762915,    0.        ,  203.66053762,    0.        ]),
 'std_score_time': array([9.90473153e-01, 0.00000000e+00, 1.83753582e+00, 0.00000000e+00,
        1.53473289e+04, 0.00000000e+00, 1.38142435e+01, 0.00000000e+00]),
 'param_clf__estimator__leaf_size': masked_array(data=[2, 2, 2, 2, 5, 5, 5, 5],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_clf__estimator__n_neighbors': masked_array(data=[1, 1, 5, 5, 1, 1, 5, 5],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object)

In [19]:
# Parameters for best mean test score
improved_model2.best_params_

{'clf__estimator__leaf_size': 2,
 'clf__estimator__n_neighbors': 5,
 'clf__estimator__p': 1}

In [20]:
# Calculate evaluation metrics for test set
Y_pred_improved_model2 = improved_model2.predict(X_test)

evaluation_metrics4 = []
    
# Calculate evaluation metrics for each set of labels
for i in range(len(cat_names)):
        accuracy = accuracy_score(Y_test.iloc[:, i].values, Y_pred_improved_model2[:, i])
        f1 = f1_score(Y_test.iloc[:, i].values, Y_pred_improved_model2[:, i], average='weighted')
        precision = precision_score(Y_test.iloc[:, i].values, Y_pred_improved_model2[:, i], average='weighted')
        recall = recall_score(Y_test.iloc[:, i].values, Y_pred_improved_model2[:, i], average='weighted')
                
        evaluation_metrics4.append([accuracy, f1, precision, recall])
        
evaluation_metrics_df4 = pd.DataFrame(data = evaluation_metrics4, index = cat_names, columns = cols)                
   
evaluation_metrics_df4



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Accuracy,F1,Precision,Recall
related,0.333081,0.458235,0.765538,0.333081
request,0.862448,0.828111,0.864837,0.862448
offer,0.995642,0.993468,0.991304,0.995642
aid_related,0.617279,0.519121,0.713036,0.617279
medical_help,0.920993,0.883854,0.907538,0.920993
medical_products,0.952823,0.931585,0.941161,0.952823
search_and_rescue,0.97177,0.958043,0.944699,0.97177
security,0.980674,0.971106,0.961722,0.980674
military,0.968549,0.953074,0.938087,0.968549
child_alone,1.0,1.0,1.0,1.0


From the above results it becomes clear that the Random Forest classifier tends to have a much better performance.

### 9. Export your model as a pickle file

In [21]:
# Pickle best model
pickle.dump(improved_model, open('2ndproject_model.sav', 'wb'))

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.