## Master of Applied Data Science
### University of Michigan - School of Information
### Capstone Project - Rapid Labeling of Text Corpus Using Information Retrieval Techniques
### Fall 2021
#### Team Members: Chloe Zhang, Michael Penrose, Carlo Tak

### Experiment Flow

Class label > Count vectorizer > 100 features > PyCaret

### Purpose

This notebook investigates how well a classifier can predict the **event type (i.e. 'earthquake', 'fire', 'flood', 'hurricane)** of the Tweets in the [Disaster tweets dataset](https://crisisnlp.qcri.org/humaid_dataset.html#).

This classifier is to be used as a baseline of classification performance. Two things are investigated:
- Is it possible to build a reasonable 'good' classifier of these tweets at all
- If it is possible to build a classifier how well does the classifier perform using all of the labels from the training data

If it is possible to build a classifier using all of the labels in the training dataset then it should be possible to implement a method for rapidly labeling the corpus of texts in the dataset. Here we think of rapid labeling as any process that does not require the user to label each text in the corpus, one at a time.

To measure the performance of the classifier we use a metric called the Area Under the Curve (AUC). This metric was used because we believe it is a good metric for the preliminary work in this project. If a specific goal emerges later that requires a different metric, then the appropriate metric can be used at that time. The consequence of false positives (texts classified as having a certain label, but are not that label) and false negatives should be considered. For example, a metric like precision can be used to minimize false positives. The AUC metric provides a value between zero and one, with a higher number indicating better classification performance. 


### Summary

The baseline classifier built using all the labels in the training dataset produced a classifier that had a fairly good AUC score for each of the 4 event type labels (i.e. earthquake, fire, flood, hurricane). All the AUC scores were above 0.98.

A simple vectorization (of texts) approach was implemented because we wanted the baseline classifier to be a basic solution – our feeling was that more complex techniques could be implemented at a later stage. A [count vectorizer]( https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) (with default settings) was used to convert the texts. The number of dimensions (features) was also reduced using feature selection ([SelectKBest]( https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)). This was to improve computational times – fewer dimensions means that there are fewer data to process. Also, this was a simpler method to implement than other techniques like removing stopwords, adjusting parameters like ‘stop_words’, ‘ngram_range’, ‘max_df’, ‘min_df’, and ‘max_features’.  The complexity of the classifier could be adjusted if required, but this simple implementation produced good results.

This notebook reduced the number of features to 100.

The feature importances were extracted from the classifier, to see if they made sense. This sense check was important because we made several assumptions in building this classifier, that had to be validated. For example, when the text was vectorized we used a simple approach that just counted the individual words (tokens) – are more complex classifier might use bi-grams (two words per feature), this would have had the advantage of preserving features like ‘’.

Examining the top features
 



In [1]:
# ! pip freeze > requirements.txt 

In [2]:
from utilities import dt_utilities as utils
from datetime import datetime

In [3]:
start_time = datetime.now()
start_time.strftime("%Y/%m/%d %H:%M:%S")

'2021/10/02 18:22:49'

In [4]:
consolidated_disaster_tweet_data_df = \
    utils.get_consolidated_disaster_tweet_data(root_directory="data/",
                                               event_type_directory="HumAID_data_event_type",
                                               events_set_directories=["HumAID_data_events_set1_47K",
                                                                       "HumAID_data_events_set2_29K"],
                                               include_meta_data=True)

In [5]:
consolidated_disaster_tweet_data_df

Unnamed: 0,tweet_id,class_label,event_type,data_type,tweet_text
0,798262465234542592,sympathy_and_support,earthquake,dev,RT @MissEarth: New Zealand need our prayers af...
1,771464543796985856,caution_and_advice,earthquake,dev,"@johnaglass65 @gordonluke Ah, woke up to a nig..."
2,797835622471733248,requests_or_urgent_needs,earthquake,dev,RT @terremotocentro: #eqnz if you need a tool ...
3,798021801540321280,other_relevant_information,earthquake,dev,RT @BarristerNZ: My son (4) has drawn a pictur...
4,798727277794033664,infrastructure_and_utility_damage,earthquake,dev,Due to earthquake damage our Defence Force is ...
...,...,...,...,...,...
76479,783991683188948992,infrastructure_and_utility_damage,hurricane,train,RT @LindsayLogue: 3500+ homes destroyed in Haiti
76480,783794225368276992,not_humanitarian,hurricane,train,"@ClintonFdn stay out of Haiti, you will not ge..."
76481,783399699994648576,other_relevant_information,hurricane,train,Hurricane-hit southern Haiti cut off after bri...
76482,783400762898391041,sympathy_and_support,hurricane,train,Please pray for these beautiful people.


In [6]:
train_df = consolidated_disaster_tweet_data_df[consolidated_disaster_tweet_data_df["data_type"]=="train"].reset_index(drop=True)
train_df

Unnamed: 0,tweet_id,class_label,event_type,data_type,tweet_text
0,798064896545996801,other_relevant_information,earthquake,train,I feel a little uneasy about the idea of work ...
1,797913886527602688,caution_and_advice,earthquake,train,#eqnz Interislander ferry docking aborted afte...
2,797867944546025472,other_relevant_information,earthquake,train,Much of New Zealand felt the earthquake after ...
3,797958935126773760,sympathy_and_support,earthquake,train,"Noticing a lot of aftershocks on eqnz site, bu..."
4,797813020567056386,infrastructure_and_utility_damage,earthquake,train,"RT @E2NZ: Mike Clements, NZ police, says obvio..."
...,...,...,...,...,...
53526,783991683188948992,infrastructure_and_utility_damage,hurricane,train,RT @LindsayLogue: 3500+ homes destroyed in Haiti
53527,783794225368276992,not_humanitarian,hurricane,train,"@ClintonFdn stay out of Haiti, you will not ge..."
53528,783399699994648576,other_relevant_information,hurricane,train,Hurricane-hit southern Haiti cut off after bri...
53529,783400762898391041,sympathy_and_support,hurricane,train,Please pray for these beautiful people.


In [7]:
test_df = consolidated_disaster_tweet_data_df[consolidated_disaster_tweet_data_df["data_type"]=="test"].reset_index(drop=True)
test_df

Unnamed: 0,tweet_id,class_label,event_type,data_type,tweet_text
0,798274825441538048,infrastructure_and_utility_damage,earthquake,test,The earthquake in New Zealand was massive. Bil...
1,798452064208568320,infrastructure_and_utility_damage,earthquake,test,These pictures show the alarming extent of the...
2,797804396767682560,sympathy_and_support,earthquake,test,Just woke to news of another earthquake! WTF N...
3,798434862830993408,not_humanitarian,earthquake,test,"When theres an actual earthquake, landslide an..."
4,797790705414377472,caution_and_advice,earthquake,test,"Tsunami warning for entire East Coast of NZ, b..."
...,...,...,...,...,...
15155,783985895493865472,caution_and_advice,hurricane,test,RT @rolandsmartin: 7AM ET on #NewsOneNow: Hurr...
15156,783746504129294336,sympathy_and_support,hurricane,test,RT @ILiveBeyond: Please pray for the people of...
15157,783864123167608832,injured_or_dead_people,hurricane,test,Hurricane Matthew kills 26 in Caribbean on des...
15158,783528309963579392,injured_or_dead_people,hurricane,test,An already struggling Haiti faced massive Hurr...


In [8]:
dev_df = consolidated_disaster_tweet_data_df[consolidated_disaster_tweet_data_df["data_type"]=="dev"].reset_index(drop=True)
dev_df

Unnamed: 0,tweet_id,class_label,event_type,data_type,tweet_text
0,798262465234542592,sympathy_and_support,earthquake,dev,RT @MissEarth: New Zealand need our prayers af...
1,771464543796985856,caution_and_advice,earthquake,dev,"@johnaglass65 @gordonluke Ah, woke up to a nig..."
2,797835622471733248,requests_or_urgent_needs,earthquake,dev,RT @terremotocentro: #eqnz if you need a tool ...
3,798021801540321280,other_relevant_information,earthquake,dev,RT @BarristerNZ: My son (4) has drawn a pictur...
4,798727277794033664,infrastructure_and_utility_damage,earthquake,dev,Due to earthquake damage our Defence Force is ...
...,...,...,...,...,...
7788,783947842092138497,rescue_volunteering_or_donation_effort,hurricane,dev,U.S. Nonprofit All Hands Volunteers Heads To H...
7789,783838361425440768,other_relevant_information,hurricane,dev,RT @Aaylin_xoxo: It breaks my heart to see how...
7790,783860660388007937,other_relevant_information,hurricane,dev,TBH nobody know what hurricane Mathew gone do ...
7791,784689762032508928,injured_or_dead_people,hurricane,dev,Hurricane Matthew Barrels Up The East Coast Af...


In [9]:
train_df.groupby(["event_type"]).size().reset_index().rename(columns={0: "Count"}).sort_values("Count", ascending=False)

Unnamed: 0,event_type,Count
3,hurricane,31674
2,flood,7815
1,fire,7792
0,earthquake,6250


In [10]:
train_df.groupby(["class_label"]).size().reset_index().rename(columns={0: "Count"}).sort_values("Count", ascending=False)

Unnamed: 0,class_label,Count
8,rescue_volunteering_or_donation_effort,14891
6,other_relevant_information,8501
9,sympathy_and_support,6250
2,infrastructure_and_utility_damage,5715
3,injured_or_dead_people,5110
5,not_humanitarian,4407
0,caution_and_advice,3774
1,displaced_people_and_evacuations,2800
7,requests_or_urgent_needs,1833
4,missing_or_found_people,250


In [11]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2, f_classif
from sklearn.pipeline import Pipeline
import pandas as pd
from scipy.sparse import coo_matrix, hstack
import scipy.sparse
import numpy as np
from collections import Counter

In [12]:
num_features = 100
target_column = "class_label"
# vectorizer = TfidfVectorizer(max_features=num_features)
# count_vectorizer = CountVectorizer(max_features=num_features)

vectorizer = Pipeline([
    ("vectorizer", CountVectorizer()),
    ("reduce", SelectKBest(score_func=f_classif, k=num_features)), # chi2, f_classif
])

In [13]:
vectorizer.fit(train_df["tweet_text"], train_df[target_column])

Pipeline(steps=[('vectorizer', CountVectorizer()),
                ('reduce', SelectKBest(k=100))])

In [14]:
def vectorized_tweet_data(fitted_vectorizer, source_df, text_column, target_column, 
                          vectorizer_name="vectorizer", reducer_name="reduce"):
    vectorized_data = fitted_vectorizer.transform(source_df[text_column])
    
    vectorized_df = pd.DataFrame.sparse.from_spmatrix(vectorized_data)
    
    all_feature_names = fitted_vectorizer.named_steps[vectorizer_name].get_feature_names()
    support = vectorizer.named_steps[reducer_name].get_support()
    feature_names = np.array(all_feature_names)[support]
    vectorized_df.columns = feature_names

    vectorized_df = vectorized_df.sparse.to_dense()

    # vectorized_df = vectorized_df.apply(pd.to_numeric)
    vectorized_df = vectorized_df.astype(float)

    vectorized_df["tweet_id"] = source_df["tweet_id"]
    vectorized_df["tweet_text"] = source_df["tweet_text"]
    vectorized_df[target_column] = source_df[target_column]

    return vectorized_df

In [15]:
train_vectorized_event_type_df = vectorized_tweet_data(fitted_vectorizer=vectorizer, 
                                                       source_df=train_df, 
                                                       text_column="tweet_text", 
                                                       target_column=target_column, 
                                                       vectorizer_name="vectorizer", 
                                                       reducer_name="reduce")
train_vectorized_event_type_df

Unnamed: 0,affected,all,allah,and,at,by,california,confirmed,damage,damaged,dead,deadliest,death,deaths,declared,destroyed,died,displaced,donate,donated,donating,donation,donations,dorian,earthquake,eddison,edt,efforts,ellicott,emergency,evacuate,evacuated,evacuating,evacuation,evacuations,evacuees,everyone,fires,flash,flooding,florence,food,for,fund,guardsman,heart,help,hermond,hurricane,injured,irma,issued,kerala,keralafloodrelief,keralafloods,killed,kills,least,mandatory,maryland,missing,my,need,needed,needs,ordered,orders,our,people,please,pray,prayers,praying,realdonaldtrump,relief,rescue,risen,rises,safe,stay,storm,support,swept,those,thoughts,to,toll,tornado,tropical,trump,tsunami,urgent,victims,volunteers,warning,warnings,water,wildfires,winds,you,tweet_id,tweet_text,class_label
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,798064896545996801,I feel a little uneasy about the idea of work ...,other_relevant_information
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797913886527602688,#eqnz Interislander ferry docking aborted afte...,caution_and_advice
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797867944546025472,Much of New Zealand felt the earthquake after ...,other_relevant_information
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797958935126773760,"Noticing a lot of aftershocks on eqnz site, bu...",sympathy_and_support
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797813020567056386,"RT @E2NZ: Mike Clements, NZ police, says obvio...",infrastructure_and_utility_damage
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783991683188948992,RT @LindsayLogue: 3500+ homes destroyed in Haiti,infrastructure_and_utility_damage
53527,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,783794225368276992,"@ClintonFdn stay out of Haiti, you will not ge...",not_humanitarian
53528,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783399699994648576,Hurricane-hit southern Haiti cut off after bri...,other_relevant_information
53529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783400762898391041,Please pray for these beautiful people.,sympathy_and_support


In [16]:
test_vectorized_event_type_df = vectorized_tweet_data(fitted_vectorizer=vectorizer, 
                                                       source_df=test_df, 
                                                       text_column="tweet_text", 
                                                       target_column=target_column)
test_vectorized_event_type_df

Unnamed: 0,affected,all,allah,and,at,by,california,confirmed,damage,damaged,dead,deadliest,death,deaths,declared,destroyed,died,displaced,donate,donated,donating,donation,donations,dorian,earthquake,eddison,edt,efforts,ellicott,emergency,evacuate,evacuated,evacuating,evacuation,evacuations,evacuees,everyone,fires,flash,flooding,florence,food,for,fund,guardsman,heart,help,hermond,hurricane,injured,irma,issued,kerala,keralafloodrelief,keralafloods,killed,kills,least,mandatory,maryland,missing,my,need,needed,needs,ordered,orders,our,people,please,pray,prayers,praying,realdonaldtrump,relief,rescue,risen,rises,safe,stay,storm,support,swept,those,thoughts,to,toll,tornado,tropical,trump,tsunami,urgent,victims,volunteers,warning,warnings,water,wildfires,winds,you,tweet_id,tweet_text,class_label
0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,798274825441538048,The earthquake in New Zealand was massive. Bil...,infrastructure_and_utility_damage
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,798452064208568320,These pictures show the alarming extent of the...,infrastructure_and_utility_damage
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,797804396767682560,Just woke to news of another earthquake! WTF N...,sympathy_and_support
3,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,798434862830993408,"When theres an actual earthquake, landslide an...",not_humanitarian
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,797790705414377472,"Tsunami warning for entire East Coast of NZ, b...",caution_and_advice
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15155,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783985895493865472,RT @rolandsmartin: 7AM ET on #NewsOneNow: Hurr...,caution_and_advice
15156,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783746504129294336,RT @ILiveBeyond: Please pray for the people of...,sympathy_and_support
15157,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783864123167608832,Hurricane Matthew kills 26 in Caribbean on des...,injured_or_dead_people
15158,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783528309963579392,An already struggling Haiti faced massive Hurr...,injured_or_dead_people


In [17]:
dev_vectorized_event_type_df = vectorized_tweet_data(fitted_vectorizer=vectorizer, 
                                                       source_df=dev_df, 
                                                       text_column="tweet_text", 
                                                       target_column=target_column)
dev_vectorized_event_type_df

Unnamed: 0,affected,all,allah,and,at,by,california,confirmed,damage,damaged,dead,deadliest,death,deaths,declared,destroyed,died,displaced,donate,donated,donating,donation,donations,dorian,earthquake,eddison,edt,efforts,ellicott,emergency,evacuate,evacuated,evacuating,evacuation,evacuations,evacuees,everyone,fires,flash,flooding,florence,food,for,fund,guardsman,heart,help,hermond,hurricane,injured,irma,issued,kerala,keralafloodrelief,keralafloods,killed,kills,least,mandatory,maryland,missing,my,need,needed,needs,ordered,orders,our,people,please,pray,prayers,praying,realdonaldtrump,relief,rescue,risen,rises,safe,stay,storm,support,swept,those,thoughts,to,toll,tornado,tropical,trump,tsunami,urgent,victims,volunteers,warning,warnings,water,wildfires,winds,you,tweet_id,tweet_text,class_label
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,798262465234542592,RT @MissEarth: New Zealand need our prayers af...,sympathy_and_support
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,771464543796985856,"@johnaglass65 @gordonluke Ah, woke up to a nig...",caution_and_advice
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,797835622471733248,RT @terremotocentro: #eqnz if you need a tool ...,requests_or_urgent_needs
3,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,798021801540321280,RT @BarristerNZ: My son (4) has drawn a pictur...,other_relevant_information
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,798727277794033664,Due to earthquake damage our Defence Force is ...,infrastructure_and_utility_damage
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7788,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,783947842092138497,U.S. Nonprofit All Hands Volunteers Heads To H...,rescue_volunteering_or_donation_effort
7789,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783838361425440768,RT @Aaylin_xoxo: It breaks my heart to see how...,other_relevant_information
7790,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783860660388007937,TBH nobody know what hurricane Mathew gone do ...,other_relevant_information
7791,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,784689762032508928,Hurricane Matthew Barrels Up The East Coast Af...,injured_or_dead_people


In [18]:
import pycaret.classification as pc_class
RND_SEED = 39674
N_JOBS = 2
include_models = ["nb", "lr", "gbc", "lightgbm"] # , "xgboost"
exclude_models = ["knn", "svm", "ridge"]

In [19]:
exp_00 = pc_class.setup(train_vectorized_event_type_df, 
                        
#                         numeric_features=numeric_features_adj,
#                         categorical_features=categorical_features,


                        silent=True,
                        verbose=False,

                        ignore_features=["tweet_id", "tweet_text"],
                        target=target_column, # "event_type", # "class_label"
                        session_id=RND_SEED,

                        n_jobs=N_JOBS)

INFO - PyCaret Supervised Module
INFO - ML Usecase: classification
INFO - version 2.3.0
INFO - Initializing setup()
INFO - setup(display=None, profile_kwargs=None, profile=False, verbose=False, silent=True, log_data=False, log_profile=False, log_plots=False, experiment_name=None, log_experiment=False, session_id=39674, html=True, custom_pipeline=None, use_gpu=False, n_jobs=2, fold_groups=None, fold_shuffle=False, fold=10, fold_strategy=stratifiedkfold, data_split_stratify=False, data_split_shuffle=True, transform_target_method=box-cox, transform_target=False, fix_imbalance_method=None, fix_imbalance=False, interaction_threshold=0.01, feature_ratio=False, feature_interaction=False, feature_selection_method=classic, feature_selection_threshold=0.8, feature_selection=False, group_names=None, group_features=None, polynomial_threshold=0.1, trigonometry_features=False, polynomial_degree=2, polynomial_features=False, cluster_iter=20, create_clusters=False, remove_perfect_collinearity=True, mu

In [None]:
best_model = pc_class.compare_models(sort="AUC",
#                                      include=include_models,
                                     exclude=exclude_models,
                                     turbo=True
                                       )
best_model

INFO - Initializing compare_models()
INFO - compare_models(display=None, verbose=True, groups=None, fit_kwargs=None, errors=ignore, turbo=True, budget_time=None, n_select=1, sort=AUC, cross_validation=True, round=4, fold=None, include=None, exclude=['knn', 'svm', 'ridge'])
INFO - Checking exceptions
INFO - Preparing display monitor
INFO - Preparing display monitor


IntProgress(value=0, description='Processing: ', max=64)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lightgbm,Light Gradient Boosting Machine,0.6796,0.923,0.6377,0.6875,0.6751,0.6183,0.6213,1.833
xgboost,Extreme Gradient Boosting,0.679,0.9222,0.6362,0.6903,0.6741,0.6173,0.621,50.062
lr,Logistic Regression,0.6761,0.9191,0.6294,0.6884,0.6697,0.6131,0.6175,8.863
gbc,Gradient Boosting Classifier,0.6789,0.9181,0.6346,0.6937,0.6726,0.6169,0.6215,32.892
rf,Random Forest Classifier,0.6529,0.8962,0.6103,0.6532,0.6493,0.587,0.5883,3.147
lda,Linear Discriminant Analysis,0.6544,0.8949,0.6669,0.6913,0.6582,0.5929,0.5983,0.71
et,Extra Trees Classifier,0.6467,0.8779,0.6106,0.648,0.6448,0.5806,0.5816,5.001
qda,Quadratic Discriminant Analysis,0.3159,0.8071,0.3277,0.5403,0.3297,0.2488,0.2882,0.315
dt,Decision Tree Classifier,0.5929,0.7925,0.5645,0.6063,0.5976,0.5201,0.521,0.37
nb,Naive Bayes,0.262,0.7794,0.4288,0.6487,0.3033,0.2268,0.2717,0.136


INFO - Initializing Logistic Regression
INFO - Total runtime is 1.666545867919922e-05 minutes
INFO - Initializing create_model()
INFO - create_model(kwargs={}, display=<pycaret.internal.Display.Display object at 0x0000025176457C88>, metrics=None, system=False, verbose=False, refit=False, groups=None, fit_kwargs={}, predict=True, cross_validation=True, round=4, fold=StratifiedKFold(n_splits=10, random_state=None, shuffle=False), estimator=lr)
INFO - Checking exceptions
INFO - Importing libraries
INFO - Copying training dataset
INFO - Defining folds
INFO - Declaring metric variables
INFO - Importing untrained model
INFO - Logistic Regression Imported succesfully
INFO - Starting cross validation
INFO - Cross validating with StratifiedKFold(n_splits=10, random_state=None, shuffle=False), n_jobs=2
INFO - Calculating mean and std
INFO - Creating metrics dataframe
INFO - Uploading results into container
INFO - Uploading model into container now
INFO - create_model_container: 1
INFO - master_m

INFO - Initializing create_model()
INFO - create_model(kwargs={}, display=<pycaret.internal.Display.Display object at 0x0000025176457C88>, metrics=None, system=False, verbose=False, refit=False, groups=None, fit_kwargs={}, predict=True, cross_validation=True, round=4, fold=StratifiedKFold(n_splits=10, random_state=None, shuffle=False), estimator=ada)
INFO - Checking exceptions
INFO - Importing libraries
INFO - Copying training dataset
INFO - Defining folds
INFO - Declaring metric variables
INFO - Importing untrained model
INFO - Ada Boost Classifier Imported succesfully
INFO - Starting cross validation
INFO - Cross validating with StratifiedKFold(n_splits=10, random_state=None, shuffle=False), n_jobs=2
INFO - Calculating mean and std
INFO - Creating metrics dataframe
INFO - Uploading results into container
INFO - Uploading model into container now
INFO - create_model_container: 6
INFO - master_model_container: 6
INFO - display_container: 2
INFO - AdaBoostClassifier(algorithm='SAMME.R',

INFO - create_model() succesfully completed......................................
INFO - Creating metrics dataframe
INFO - Initializing Light Gradient Boosting Machine
INFO - Total runtime is 17.149918170770007 minutes
INFO - Initializing create_model()
INFO - create_model(kwargs={}, display=<pycaret.internal.Display.Display object at 0x0000025176457C88>, metrics=None, system=False, verbose=False, refit=False, groups=None, fit_kwargs={}, predict=True, cross_validation=True, round=4, fold=StratifiedKFold(n_splits=10, random_state=None, shuffle=False), estimator=lightgbm)
INFO - Checking exceptions
INFO - Importing libraries
INFO - Copying training dataset
INFO - Defining folds
INFO - Declaring metric variables
INFO - Importing untrained model
INFO - Light Gradient Boosting Machine Imported succesfully
INFO - Starting cross validation
INFO - Cross validating with StratifiedKFold(n_splits=10, random_state=None, shuffle=False), n_jobs=2
INFO - Calculating mean and std
INFO - Creating metri

In [None]:
# best_model = pc_class.created_model("nb")
# best_model = pc_class.created_model("lightgbm")
# best_model = pc_class.created_model("lr")

In [None]:
finalized_model = pc_class.finalize_model(best_model)

In [None]:
y_train = pc_class.get_config("y_train")
y_train

In [None]:
y = pc_class.get_config("y")
y

In [None]:
original_labels = train_df[target_column]
original_labels

In [None]:
Counter(original_labels)

In [None]:
labels_map = dict(zip(y, original_labels))
labels_map

In [None]:
try:
    pc_class.plot_model(finalized_model, "auc")
except:
    print(f"Could not plot model.")

In [None]:
try:
    pc_class.plot_model(finalized_model, "learning")
except:
    print(f"Could not plot model.")

In [None]:
try:
    pc_class.plot_model(finalized_model, "confusion_matrix")
except:
    print(f"Could not plot model.")

In [None]:
try:
    pc_class.plot_model(finalized_model, "feature")
except:
    print(f"Could not plot model.")

In [None]:
predictions_train = pc_class.predict_model(finalized_model)
predictions_train

In [None]:
test_vectorized_event_type_df

In [None]:
predictions_test = pc_class.predict_model(finalized_model, data=test_vectorized_event_type_df)
predictions_test

In [None]:
end_time = datetime.now()
end_time.strftime("%Y/%m/%d %H:%M:%S")

In [None]:
duration = end_time - start_time
print("duration :", duration)