**Dataset**
labeled datasset collected from twitter (Lab 1 - Hate Speech.tsv)

**Objective**
classify tweets containing hate speech from other tweets. <br>
0 -> no hate speech <br>
1 -> contains hate speech <br>

**Total Estimated Time = 90-120 Mins**

**Evaluation metric**
macro f1 score

### Import used libraries

In [8]:
pip install contractions



In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import random
import re
import contractions
import string
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import spacy
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity, cosine_distances


In [10]:
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

### Load Dataset

###### Note: search how to load the data from tsv file

In [11]:
data= pd.read_csv('Lab 1 - Hate Speech.tsv',sep='\t')



  ✉ Preview data

In [12]:
data.head(10)

Unnamed: 0,id,label,tweet
0,1,0,@user when a father is dysfunctional and is so...
1,2,0,@user @user thanks for #lyft credit i can't us...
2,3,0,bihday your majesty
3,4,0,#model i love u take with u all the time in ...
4,5,0,factsguide: society now #motivation
5,6,0,[2/2] huge fan fare and big talking before the...
6,7,0,@user camping tomorrow @user @user @user @user...
7,8,0,the next school year is the year for exams.ð...
8,9,0,we won!!! love the land!!! #allin #cavs #champ...
9,10,0,@user @user welcome here ! i'm it's so #gr8 !


### Data splitting

It is a good practice to split the data before EDA helps maintain the integrity of the machine learning process, prevents data leakage, simulates real-world scenarios more accurately, and ensures reliable model performance evaluation on unseen data.

In [13]:
#Extracting x, y from dataframe
x = data['tweet']
y = data['label']

In [14]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify = y, random_state = 42)

### EDA on training data

- check NaNs

In [15]:
# Concat x train and y train
train_data = pd.DataFrame(pd.concat([x_train, y_train], axis=1))


In [16]:
train_data


Unnamed: 0,tweet,label
9289,@user @user would like to wish you a #fathe...,0
17259,always enjoy life! and be grateful for what yo...,0
437,"kayak, sup, snorkel, swim...whatever your plea...",0
21546,@user what do you think of #alexjones saying #...,1
15968,well i guess i can't join servers mcpe 15.0. #...,0
...,...,...
16664,do you think it's #infected? i think it looks ...,0
24150,@user arise sir mo farah. this is my country y...,1
1837,@user today at @user for @user ðððð...,0
2613,first major spos championship win in 52 years ...,0


In [17]:
pd.isnull(train_data).sum()

tweet    0
label    0
dtype: int64

In [18]:
#No Null Values

- check duplicates

In [19]:
train_data.duplicated().sum()

1794

- show a representative sample of data texts to find out required preprocessing steps

In [20]:
Sample = train_data.sample(n = 20 )
Sample

Unnamed: 0,tweet,label
28411,nra continues to do so much damage out of gree...,0
16229,amen! #selfconfidence #believeinyourself #ch...,0
13700,@user @user in the name of fashion why are the...,0
7635,"so tonight, spagetti bolognaise and scrabble. ...",0
6458,#revol #cuisine #white #porcelain 7 #ounce #...,0
15200,"i despise hilary and sadly, hearing her speak ...",0
13482,literally had the best week ever. chilling in ...,0
16052,thank you lord for this day. keep us safe. .,0
20542,@user my photo's just been accepted by @user &...,0
13618,i am thankful for sunshine. #thankful #positive,0


- check dataset balancing

In [21]:
(train_data['label']== 0).sum()/len(train_data)

0.9298398604724909

In [22]:
(train_data['label']== 1).sum()/len(train_data)

0.07016013952750912

In [23]:
# imbalanced data >> class 0 represents more than 90% of the data


 ♻  Cleaning and Preprocessing Steps are:  
 ---
  1.   Drop Duplicates
  2.   Remove Emojis
  3.   Text Normalization
  4.   Remove usernames and tags
  5.   Remove newlines and tabs
  6.   Remove Extra Spaces
  7.   Remove contractions
  8.   Remove Punctuation
  9.   Remove Stopwords



### Cleaning and Preprocessing

---
⚓ REMOVE DUPLICTES

In [24]:
train_data = train_data.drop_duplicates()
train_data

Unnamed: 0,tweet,label
9289,@user @user would like to wish you a #fathe...,0
17259,always enjoy life! and be grateful for what yo...,0
437,"kayak, sup, snorkel, swim...whatever your plea...",0
21546,@user what do you think of #alexjones saying #...,1
15968,well i guess i can't join servers mcpe 15.0. #...,0
...,...,...
16664,do you think it's #infected? i think it looks ...,0
24150,@user arise sir mo farah. this is my country y...,1
1837,@user today at @user for @user ðððð...,0
2613,first major spos championship win in 52 years ...,0


---
⚓ REMOVE Emojis

In [25]:
def remove_emoji(string):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F700-\U0001F77F"  # alchemical symbols
                           u"\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
                           u"\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
                           u"\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
                           u"\U0001FA00-\U0001FA6F"  # Chess Symbols
                           u"\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           u"\U0001F004-\U0001F9CF"  # Miscellaneous Symbols and Pictographs
                           u"\U0001F600-\U0001F64F"  # Emoticons (iOS)
                           u"\U0001F680-\U0001F6FF"  # Transport and Map Symbols (iOS)
                           u"\U0001F300-\U0001F5FF"  # Other additional symbols (iOS)
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)


In [26]:
teweet_temporary = train_data['tweet'].copy()

In [27]:
teweet_temporary

9289      @user @user would like to wish you a   #fathe...
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546    @user what do you think of #alexjones saying #...
15968    well i guess i can't join servers mcpe 15.0. #...
                               ...                        
16664    do you think it's #infected? i think it looks ...
24150    @user arise sir mo farah. this is my country y...
1837     @user today at @user for @user ðððð...
2613     first major spos championship win in 52 years ...
18539    nobody deserves to be murdered yo, i mean nobo...
Name: tweet, Length: 23434, dtype: object

In [28]:
teweet_temporary= teweet_temporary.astype(str).apply(remove_emoji)


In [29]:
teweet_temporary

9289      @user @user would like to wish you a   #fathe...
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546    @user what do you think of #alexjones saying #...
15968    well i guess i can't join servers mcpe 15.0. #...
                               ...                        
16664    do you think it's #infected? i think it looks ...
24150    @user arise sir mo farah. this is my country y...
1837     @user today at @user for @user ðððð...
2613     first major spos championship win in 52 years ...
18539    nobody deserves to be murdered yo, i mean nobo...
Name: tweet, Length: 23434, dtype: object

---
⚓ Text Normalization

In [30]:
#Lower casing
def lower_case(text):
  text_lowercase = text.str.lower()
  return text_lowercase

In [31]:
teweet_temporary = lower_case(teweet_temporary)
teweet_temporary

9289      @user @user would like to wish you a   #fathe...
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546    @user what do you think of #alexjones saying #...
15968    well i guess i can't join servers mcpe 15.0. #...
                               ...                        
16664    do you think it's #infected? i think it looks ...
24150    @user arise sir mo farah. this is my country y...
1837     @user today at @user for @user ðððð...
2613     first major spos championship win in 52 years ...
18539    nobody deserves to be murdered yo, i mean nobo...
Name: tweet, Length: 23434, dtype: object

Note >> stemming is the next process of Normalization, it will be done after removing contractions

---
⚓ Remove User Names and Tags

In [32]:
def remove_usernames(text):
    # Remove user names starting with '@'
    return re.sub(r'@\w+\b', '', text)

In [33]:
def remove_tags(text):
    # Remove tags starting with '#'
    return re.sub(r'#\w+\b', '', text)

In [34]:
# Apply Remove user names
teweet_temporary= teweet_temporary.astype(str).apply(remove_usernames)
teweet_temporary

9289        would like to wish you a   #father's day :)...
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546     what do you think of #alexjones saying #drain...
15968    well i guess i can't join servers mcpe 15.0. #...
                               ...                        
16664    do you think it's #infected? i think it looks ...
24150     arise sir mo farah. this is my country you #i...
1837      today at  for  ðððððâ¤ï¸ #co...
2613     first major spos championship win in 52 years ...
18539    nobody deserves to be murdered yo, i mean nobo...
Name: tweet, Length: 23434, dtype: object

In [35]:
# Apply Remove tags
teweet_temporary= teweet_temporary.astype(str).apply(remove_tags)
teweet_temporary

9289            would like to wish you a   's day :),     
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546     what do you think of  saying ?   you had good...
15968    well i guess i can't join servers mcpe 15.0.  ...
                               ...                        
16664    do you think it's ? i think it looks like an   . 
24150     arise sir mo farah. this is my country you   ...
1837      today at  for  ðððððâ¤ï¸       
2613        first major spos championship win in 52 years 
18539    nobody deserves to be murdered yo, i mean nobo...
Name: tweet, Length: 23434, dtype: object

---
⚓ Remove Lines & Tabs & Extra Spaces

In [36]:
def remove_lines_tabs_extraspaces(text):
    # Replace newlines and tabs with a single space
    text = re.sub(r'[\t\n\r]+', ' ', text)
    # Replace multiple spaces with a single space
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

In [37]:
# Apply remove_lines_tabs_extraspaces
teweet_temporary= teweet_temporary.astype(str).apply(remove_lines_tabs_extraspaces)
teweet_temporary

9289                   would like to wish you a 's day :),
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546    what do you think of saying ? you had good vid...
15968         well i guess i can't join servers mcpe 15.0.
                               ...                        
16664       do you think it's ? i think it looks like an .
24150    arise sir mo farah. this is my country you hat...
1837               today at for ðððððâ¤ï¸
2613         first major spos championship win in 52 years
18539    nobody deserves to be murdered yo, i mean nobody.
Name: tweet, Length: 23434, dtype: object

In [38]:
pip install contractions



---
⚓ Remove contractions

In [39]:
def remove_contractions(text):
    preprocessed_text = contractions.fix(text)
    return preprocessed_text


In [40]:
# Apply remove_contractions
teweet_temporary= teweet_temporary.apply(remove_contractions)
teweet_temporary

9289                   would like to wish you a 's day :),
17259    always enjoy life! and be grateful for what yo...
437      kayak, sup, snorkel, swim...whatever your plea...
21546    what do you think of saying ? you had good vid...
15968        well i guess i cannot join servers mcpe 15.0.
                               ...                        
16664      do you think it is ? i think it looks like an .
24150    arise sir mo farah. this is my country you hat...
1837               today at for ðððððâ¤ï¸
2613         first major spos championship win in 52 years
18539    nobody deserves to be murdered yo, i mean nobody.
Name: tweet, Length: 23434, dtype: object

In [41]:
def remove_punctuation(text):
  output_text = text.translate(str.maketrans('', '', string.punctuation))
  return output_text

In [42]:
# Apply remove_punctuation
teweet_temporary= teweet_temporary.apply(remove_punctuation)
teweet_temporary

9289                       would like to wish you a s day 
17259    always enjoy life and be grateful for what you...
437      kayak sup snorkel swimwhatever your pleasure w...
21546    what do you think of saying  you had good vide...
15968          well i guess i cannot join servers mcpe 150
                               ...                        
16664        do you think it is  i think it looks like an 
24150    arise sir mo farah this is my country you hati...
1837               today at for ðððððâ¤ï¸
2613         first major spos championship win in 52 years
18539      nobody deserves to be murdered yo i mean nobody
Name: tweet, Length: 23434, dtype: object

In [43]:
# Remove all non-ASCII characters that still exists in the records

def remove_non_ascii(text):
    text = re.sub(r'[^\x00-\x7F]+', '', text)
    return text

In [44]:
teweet_temporary = teweet_temporary.apply(remove_non_ascii)
teweet_temporary

9289                       would like to wish you a s day 
17259    always enjoy life and be grateful for what you...
437      kayak sup snorkel swimwhatever your pleasure w...
21546    what do you think of saying  you had good vide...
15968          well i guess i cannot join servers mcpe 150
                               ...                        
16664        do you think it is  i think it looks like an 
24150    arise sir mo farah this is my country you hati...
1837                                         today at for 
2613         first major spos championship win in 52 years
18539      nobody deserves to be murdered yo i mean nobody
Name: tweet, Length: 23434, dtype: object

In [45]:
stop_words = set(stopwords.words('english'))

def remove_stop_words(text):
    words = word_tokenize(text)
    # Remove stop words
    filtered_words = [word for word in words if word.lower() not in stop_words]
    # Reconstruct the text without stop words
    return ' '.join(filtered_words)


In [46]:
teweet_temporary= teweet_temporary.apply(remove_stop_words)

In [47]:
teweet_temporary

9289                                   would like wish day
17259                           always enjoy life grateful
437      kayak sup snorkel swimwhatever pleasure put to...
21546                         think saying good videos got
15968                     well guess join servers mcpe 150
                               ...                        
16664                               think think looks like
24150             arise sir mo farah country hating chumps
1837                                                 today
2613            first major spos championship win 52 years
18539              nobody deserves murdered yo mean nobody
Name: tweet, Length: 23434, dtype: object

finally assign the temporary variable with all processes to original data

In [48]:
x_train  = teweet_temporary

In [49]:
y_train = train_data['label']


#**⚓ Same preprocessing for test data**

In [50]:
# Concat x test and y test
test_data = pd.DataFrame(pd.concat([x_test, y_test], axis=1))


In [51]:
tweet_temporary_test = test_data['tweet'].copy()

In [52]:
tweet_temporary_test = tweet_temporary_test.astype(str).apply(remove_emoji)


In [53]:
tweet_temporary_test = lower_case(tweet_temporary_test)
tweet_temporary_test

19568    father's day. daddy you will always be ma hero...
11539    i finally found a way how to delete old tweets...
31056    @user #jeffsessions thinks we won't notice as ...
18986    full week now and end is not in sight yet. thi...
3283                                           shut up rat
                               ...                        
18313                      @user @user @user got my ticket
1449     i laugh too much, it's problematic... it affec...
12185    whistling while i #workout hehe! ð join me ...
16596    crack corn. ó¾®ó¾¢ó¾  8.5x11" bic pen, outc...
28090    photo of d day: when children r used as aisans...
Name: tweet, Length: 6307, dtype: object

In [54]:
# Apply Remove user names
tweet_temporary_test= tweet_temporary_test.astype(str).apply(remove_usernames)
tweet_temporary_test

19568    father's day. daddy you will always be ma hero...
11539    i finally found a way how to delete old tweets...
31056     #jeffsessions thinks we won't notice as he re...
18986    full week now and end is not in sight yet. thi...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much, it's problematic... it affec...
12185    whistling while i #workout hehe! ð join me ...
16596    crack corn. ó¾®ó¾¢ó¾  8.5x11" bic pen, outc...
28090    photo of d day: when children r used as aisans...
Name: tweet, Length: 6307, dtype: object

In [55]:
# Apply Remove tags
tweet_temporary_test= tweet_temporary_test.astype(str).apply(remove_tags)
tweet_temporary_test

19568    father's day. daddy you will always be ma hero...
11539    i finally found a way how to delete old tweets...
31056      thinks we won't notice as he redacts decades...
18986    full week now and end is not in sight yet. thi...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much, it's problematic... it affec...
12185    whistling while i  hehe! ð join me for a  s...
16596    crack corn. ó¾®ó¾¢ó¾  8.5x11" bic pen, outc...
28090    photo of d day: when children r used as aisans...
Name: tweet, Length: 6307, dtype: object

In [56]:
# Apply remove_lines_tabs_extraspaces
tweet_temporary_test= tweet_temporary_test.astype(str).apply(remove_lines_tabs_extraspaces)
tweet_temporary_test

19568    father's day. daddy you will always be ma hero...
11539    i finally found a way how to delete old tweets...
31056    thinks we won't notice as he redacts decades o...
18986    full week now and end is not in sight yet. thi...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much, it's problematic... it affec...
12185    whistling while i hehe! ð join me for a ses...
16596    crack corn. ó¾®ó¾¢ó¾  8.5x11" bic pen, outc...
28090    photo of d day: when children r used as aisans...
Name: tweet, Length: 6307, dtype: object

In [57]:
# Apply remove_contractions
tweet_temporary_test= tweet_temporary_test.apply(remove_contractions)
tweet_temporary_test

19568    father's day. daddy you will always be ma hero...
11539    i finally found a way how to delete old tweets...
31056    thinks we will not notice as he redacts decade...
18986    full week now and end is not in sight yet. thi...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much, it is problematic... it affe...
12185    whistling while i hehe! ð join me for a ses...
16596    crack corn. ó¾®ó¾¢ó¾  8.5x11" bic pen, outc...
28090    photo of d day: when children r used as aisans...
Name: tweet, Length: 6307, dtype: object

In [58]:
# Apply remove_punctuation
tweet_temporary_test= tweet_temporary_test.apply(remove_punctuation)
tweet_temporary_test

19568    fathers day daddy you will always be ma hero r...
11539    i finally found a way how to delete old tweets...
31056    thinks we will not notice as he redacts decade...
18986    full week now and end is not in sight yet this...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much it is problematic it affects ...
12185    whistling while i hehe ð join me for a sess...
16596    crack corn ó¾®ó¾¢ó¾  85x11 bic pen outcast â¦
28090    photo of d day when children r used as aisans ...
Name: tweet, Length: 6307, dtype: object

In [59]:
tweet_temporary_test = tweet_temporary_test.apply(remove_non_ascii)
tweet_temporary_test

19568    fathers day daddy you will always be ma hero r...
11539    i finally found a way how to delete old tweets...
31056    thinks we will not notice as he redacts decade...
18986    full week now and end is not in sight yet this...
3283                                           shut up rat
                               ...                        
18313                                        got my ticket
1449     i laugh too much it is problematic it affects ...
12185    whistling while i hehe  join me for a session ...
16596                  crack corn   85x11 bic pen outcast 
28090    photo of d day when children r used as aisans ...
Name: tweet, Length: 6307, dtype: object

In [60]:
tweet_temporary_test= tweet_temporary_test.apply(remove_stop_words)

In [61]:
x_test  = tweet_temporary_test

#### Extra: use custom scikit-learn Transformers

Using custom transformers in scikit-learn provides flexibility, reusability, and control over the data transformation process, allowing you to seamlessly integrate with scikit-learn's pipelines, enabling you to combine multiple preprocessing steps and modeling into a single workflow. This makes your code more modular, readable, and easier to maintain.

##### link: https://www.andrewvillazon.com/custom-scikit-learn-transformers/

#### Example usage:

In [62]:
from sklearn.base import BaseEstimator, TransformerMixin

class CustomTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, parameter1, parameter2):
        self.parameter1 = parameter1
        self.parameter2 = parameter2

        # Add any initialization code here

    def fit(self, X, y=None):
        # Add code for fitting the transformer here
        return self

    def transform(self, X):
        # Add code for transforming the data here
        transformed_X = X.copy()  # Example: Just copying the data

        # Example transformation
        transformed_X['feature1'] = transformed_X['feature1'] * self.parameter1
        transformed_X['feature2'] = transformed_X['feature2'] * self.parameter2

        # Do all the needed transformations and data preprocessing here
        transformed_X = remove_emoji(X)
        transformed_X = lower_case(transformed_X)
        transformed_X = remove_usernames(transformed_X)
        transformed_X = remove_tags(transformed_X)
        transformed_X = remove_lines_tabs_extraspaces(transformed_X)
        transformed_X = remove_contractions(transformed_X)
        transformed_X = remove_punctuation (transformed_X)
        transformed_X = remove_non_ascii(transformed_X)
        transformed_X = remove_stop_words (transformed_X)
        return transformed_X

    def fit_transform(self, X, y=None):
        # This function combines fit and transform
        self.fit(X, y)
        return self.transform(X)

**You  are doing Great so far!**

### Modelling

#### Extra: use scikit-learn pipline

##### link: https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

Using pipelines in scikit-learn promotes better code organization, reproducibility, and efficiency in machine learning workflows.

#### Example usage:

In [63]:
from sklearn.pipeline import Pipeline

model = LogisticRegression()

# Create the pipeline
pipeline = Pipeline(steps=[
    ('preprocessing', CustomTransformer()),
    ('Vectorizing', Vectorizer()),
    ('model', model),
])

# Now you can use the pipeline for training and prediction
# pipeline.fit(X_train, y_train)
# pipeline.predict(X_test)

NameError: name 'LogisticRegression' is not defined

In [64]:
class_weights ={ 0: 0.1, 1 : 0.9}

In [65]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

vec = CountVectorizer()

In [66]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
model = LogisticRegression(class_weight= 'balanced')

pipe = make_pipeline(vec, model)


In [67]:

pipe.fit(x_train,y_train)

#### Evaluation

**Evaluation metric:**
macro f1 score

Macro F1 score is a useful metric in scenarios where you want to evaluate the overall performance of a multi-class classification model, **particularly when the classes are imbalanced**

![Calculation](https://assets-global.website-files.com/5d7b77b063a9066d83e1209c/639c3d934e82c1195cdf3c60_macro-f1.webp)

In [68]:
from sklearn import metrics

def print_report(pipe, x_test, y_test):
    y_pred = pipe.predict(x_test)
    report = metrics.classification_report(y_test, y_pred)
    print(report)
    print("macrof1score: {:0.3f}".format(metrics.f1_score (y_test, y_pred)))

print_report(pipe, x_test, y_test)


              precision    recall  f1-score   support

           0       0.97      0.95      0.96      5864
           1       0.49      0.64      0.55       443

    accuracy                           0.93      6307
   macro avg       0.73      0.79      0.76      6307
weighted avg       0.94      0.93      0.93      6307

macrof1score: 0.553


### Enhancement

- Using different N-grams
- Using different text representation technique
- Hyperparameter tuning

In [69]:
from sklearn.svm import LinearSVC

vec2 = TfidfVectorizer(analyzer='char_wb', ngram_range=(10, 50), min_df=.01, max_df=.3)
clf = LinearSVC()
pipe_tfidf = make_pipeline(vec2, clf)
pipe_tfidf.fit(x_train,y_train)

In [70]:
from sklearn import metrics

def print_report(pipe, x_test, y_test):
    y_pred = pipe_tfidf.predict(x_test)
    report = metrics.classification_report(y_test, y_pred)
    print(report)
    print("macrof1score: {:0.3f}".format(metrics.f1_score (y_test, y_pred)))

print_report(pipe_tfidf, x_test, y_test)


              precision    recall  f1-score   support

           0       0.93      1.00      0.96      5864
           1       0.00      0.00      0.00       443

    accuracy                           0.93      6307
   macro avg       0.46      0.50      0.48      6307
weighted avg       0.86      0.93      0.90      6307

macrof1score: 0.000


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### Conclusion and final results


* Precision for class 0 (negative class) is high (0.93)
* Precision for class 1 (positive class) is very low (0.00), suggesting that none of the instances predicted as class 1 are actually class 1.

* Recall for class 1 is very low (0.00), suggesting that among all actual instances of class 1, none are correctly predicted as class 1.
* The F1-score for class 0 is high (0.96), reflecting a balance between precision and recall for class 0.
* The F1-score for class 1 is very low (0.00), indicating poor performance in predicting class 1.

* Overall accuracy is 93%, but this is largely driven by the dominance of class 0 in the dataset.


#### Done!