In this notebook I am going to teach models on data from <br> https://www.kaggle.com/datasets/ujjwalaggarwal402/medicine-dataset/data <br>
to predict category of medicine (antidiabetic, antibiotic,  etc.). <br>
I intend to use the following classifiers:
<ul>
<li>Logistic regression</li>
<li>K nearest neighbors</li>
<li>Decision tree</li>
<li>Random forests</li>
<li>Naive Bayes classifier</li>
<li>Support vector machine</li>
</ul>
I intend to teach the models on different amounts of input data, and then make a comparative analysis of the results obtained for the classifiers depending on the data provided for learning.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px

In [None]:
# Loading dataset from kaggle after connect noteboook with drive
# https://www.kaggle.com/datasets/ujjwalaggarwal402/medicine-dataset/data
df_raw = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/machine_learning/ML_Projects/data/medicine_dataset.csv')
df_raw.head()

Unnamed: 0,Name,Category,Dosage Form,Strength,Manufacturer,Indication,Classification
0,Acetocillin,Antidiabetic,Cream,938 mg,Roche Holding AG,Virus,Over-the-Counter
1,Ibuprocillin,Antiviral,Injection,337 mg,CSL Limited,Infection,Over-the-Counter
2,Dextrophen,Antibiotic,Ointment,333 mg,Johnson & Johnson,Wound,Prescription
3,Clarinazole,Antifungal,Syrup,362 mg,AbbVie Inc.,Pain,Prescription
4,Amoxicillin,Antifungal,Tablet,802 mg,Teva Pharmaceutical Industries Ltd.,Wound,Over-the-Counter


In [None]:
df = df_raw.copy()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Name            50000 non-null  object
 1   Category        50000 non-null  object
 2   Dosage Form     50000 non-null  object
 3   Strength        50000 non-null  object
 4   Manufacturer    50000 non-null  object
 5   Indication      50000 non-null  object
 6   Classification  50000 non-null  object
dtypes: object(7)
memory usage: 2.7+ MB


In [None]:
df.describe(include='object')

Unnamed: 0,Name,Category,Dosage Form,Strength,Manufacturer,Indication,Classification
count,50000,50000,50000,50000,50000,50000,50000
unique,64,8,8,999,20,8,2
top,Metostatin,Antidepressant,Inhaler,347 mg,Boehringer Ingelheim GmbH,Infection,Over-the-Counter
freq,860,6354,6364,77,2587,6393,25015


In [None]:
df = df.drop_duplicates()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Name            50000 non-null  object
 1   Category        50000 non-null  object
 2   Dosage Form     50000 non-null  object
 3   Strength        50000 non-null  object
 4   Manufacturer    50000 non-null  object
 5   Indication      50000 non-null  object
 6   Classification  50000 non-null  object
dtypes: object(7)
memory usage: 2.7+ MB


In [None]:
df = df[['Category', 'Dosage Form', 'Strength', 'Indication', 'Classification']]
df.head()

Unnamed: 0,Category,Dosage Form,Strength,Indication,Classification
0,Antidiabetic,Cream,938 mg,Virus,Over-the-Counter
1,Antiviral,Injection,337 mg,Infection,Over-the-Counter
2,Antibiotic,Ointment,333 mg,Wound,Prescription
3,Antifungal,Syrup,362 mg,Pain,Prescription
4,Antifungal,Tablet,802 mg,Wound,Over-the-Counter


In [None]:
df['Indication'].value_counts()

Unnamed: 0_level_0,count
Indication,Unnamed: 1_level_1
Infection,6393
Fungus,6294
Virus,6292
Wound,6268
Fever,6246
Depression,6173
Diabetes,6171
Pain,6163


In [None]:
df['Category'].value_counts()

Unnamed: 0_level_0,count
Category,Unnamed: 1_level_1
Antidepressant,6354
Analgesic,6340
Antiseptic,6315
Antifungal,6289
Antipyretic,6280
Antiviral,6185
Antidiabetic,6171
Antibiotic,6066


In [None]:
mapped_values = {
    'Antidepressant': 0,
    'Analgesic': 1,
    'Antiseptic': 2,
    'Antifungal': 3,
    'Antipyretic': 4,
    'Antiviral': 5,
    'Antidiabetic': 6,
    'Antibiotic': 7
    }
df['Category'] = df['Category'].map(mapped_values)
df.head()

Unnamed: 0,Category,Dosage Form,Strength,Indication,Classification
0,6,Cream,938 mg,Virus,Over-the-Counter
1,5,Injection,337 mg,Infection,Over-the-Counter
2,7,Ointment,333 mg,Wound,Prescription
3,3,Syrup,362 mg,Pain,Prescription
4,3,Tablet,802 mg,Wound,Over-the-Counter


In [None]:
df['Strength'] = df['Strength'].str.replace(' mg', '').astype(int)
df = pd.get_dummies(df, columns=['Indication', 'Dosage Form', 'Classification'], drop_first=True)

In [None]:
df.head()

Unnamed: 0,Category,Strength,Indication_Diabetes,Indication_Fever,Indication_Fungus,Indication_Infection,Indication_Pain,Indication_Virus,Indication_Wound,Dosage Form_Cream,Dosage Form_Drops,Dosage Form_Inhaler,Dosage Form_Injection,Dosage Form_Ointment,Dosage Form_Syrup,Dosage Form_Tablet,Classification_Prescription
0,6,938,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False
1,5,337,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False
2,7,333,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True
3,3,362,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True
4,3,802,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False


In [None]:
# Setting Category as target feature to predict, then converting others to float.
target = df.pop('Category')
features = df.astype(float)
features.head()

Unnamed: 0,Strength,Indication_Diabetes,Indication_Fever,Indication_Fungus,Indication_Infection,Indication_Pain,Indication_Virus,Indication_Wound,Dosage Form_Cream,Dosage Form_Drops,Dosage Form_Inhaler,Dosage Form_Injection,Dosage Form_Ointment,Dosage Form_Syrup,Dosage Form_Tablet,Classification_Prescription
0,938.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,337.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,333.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,362.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
4,802.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


### Fuction which will create classifiers and comparing scores.

In [None]:
def complete_scores(features, target, train_size):

    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.naive_bayes import GaussianNB
    from sklearn.svm import SVC
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import classification_report


    classification_scores = {}
    features_train, features_test, target_train, target_test = train_test_split(features, target, train_size=train_size)

    print('features_train shape:', features_train.shape)
    print('features_test shape:', features_test.shape)
    print('target_train shape:', target_train.shape)
    print('target_test shape:', target_test.shape)
    print()
    print()

    # Logistic regression
    lr = LogisticRegression()
    lr_params = {
        'max_iter': [5000, 7500],
        'solver': ['lbfgs', 'newton-cg', 'sag', 'saga']
    }
    lr_grid = GridSearchCV(lr, lr_params, cv=3)
    lr_grid.fit(features_train, target_train)
    lr_pred = lr_grid.predict(features_test)
    lr_score = lr_grid.score(features_test, target_test)
    classification_scores['Logistic regression'] = lr_score
    print('************************')
    print(f'Logistic regression best params: {lr_grid.best_params_}')
    print('************************')
    print('Logistic regression classification report')
    print(classification_report(target_test, lr_pred))
    print('************************')
    print('************************')
    print('************************')

    # KNN
    knn = KNeighborsClassifier()
    knn_params = {
        'n_neighbors': range(3, 7),
    }
    knn_grid = GridSearchCV(knn, knn_params, cv=3)
    knn_grid.fit(features_train, target_train)
    knn_pred = knn_grid.predict(features_test)
    knn_score = knn_grid.score(features_test, target_test)
    classification_scores['KNN'] = knn_score
    print('************************')
    print(f'KNN best params: {knn_grid.best_params_}')
    print('************************')
    print('KNN classification report')
    print(classification_report(target_test, knn_pred))
    print('************************')
    print('************************')
    print('************************')

    # Decision tree
    dt = DecisionTreeClassifier()
    dt_params = {
        'criterion': ['gini', 'entropy'],
        'max_depth': range(3, 10),
        'min_samples_split': range(2, 6),
        'min_samples_leaf': range(1, 6)
    }
    dt_grid = GridSearchCV(dt, dt_params, cv=3)
    dt_grid.fit(features_train, target_train)
    dt_pred = dt_grid.predict(features_test)
    dt_score = dt_grid.score(features_test, target_test)
    classification_scores['Decision Tree'] = dt_score
    print('************************')
    print(f'Decision tree best params: {dt_grid.best_params_}')
    print('************************')
    print('Decision tree classification report')
    print(classification_report(target_test, dt_pred))
    print('************************')
    print('************************')
    print('************************')

    # Random forest
    rf = RandomForestClassifier()
    rf_params = {
        'criterion': ['gini', 'entropy'],
        'n_estimators': range(50, 101, 10),
        'max_depth': range(3, 10),
        'min_samples_split': range(3, 6),
        'min_samples_leaf': range(3, 6)
    }
    rf_grid = GridSearchCV(rf, rf_params, cv=3)
    rf_grid.fit(features_train, target_train)
    rf_pred = rf_grid.predict(features_test)
    rf_score = rf_grid.score(features_test, target_test)
    classification_scores['Random Forest'] = rf_score
    print('************************')
    print(f'Random forest best params: {rf_grid.best_params_}')
    print('************************')
    print('Random forest classification report')
    print(classification_report(target_test, rf_pred))
    print('************************')
    print('************************')
    print('************************')

    # Naive Bayes
    bayes = GaussianNB()
    bayes.fit(features_train, target_train)
    bayes_pred = bayes.predict(features_test)
    bayes_score = bayes.score(features_test, target_test)
    classification_scores['Naive Bayes'] = bayes_score
    print('************************')
    print('Naive Bayes classification report')
    print(classification_report(target_test, bayes_pred))
    print('************************')
    print('************************')
    print('************************')

    # Support Vector Machine
    svc = SVC()
    svc_params = {
        'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    }
    # Based on my previous experience, this classifier take a lot of time so
    # there will be only 2 cross validation
    svc_grid = GridSearchCV(svc, svc_params, cv=2)
    svc_grid.fit(features_train, target_train)
    svc_pred = svc_grid.predict(features_test)
    svc_score = svc_grid.score(features_test, target_test)
    classification_scores['Support Vector Machine'] = svc_score
    print('************************')
    print(f'Support vector machine best params: {svc_grid.best_params_}')
    print('************************')
    print('Support vector machine classification report')
    print(classification_report(target_test, svc_pred))
    print('************************')
    print('************************')
    print('************************')

    return classification_scores


### Attempts with different variants of data size for learning of 50k rows dataset

In [None]:
scores_75_percent_train_size = complete_scores(features, target, 0.75)
print(scores_75_percent_train_size)

features_train shape: (37500, 16)
features_test shape: (12500, 16)
target_train shape: (37500,)
target_test shape: (12500,)


************************
Logistic regression best params: {'max_iter': 5000, 'solver': 'lbfgs'}
************************
Logistic regression classification report
              precision    recall  f1-score   support

           0       0.13      0.25      0.17      1559
           1       0.12      0.25      0.16      1571
           2       0.13      0.03      0.04      1655
           3       0.13      0.11      0.12      1593
           4       0.13      0.09      0.10      1577
           5       0.13      0.14      0.13      1491
           6       0.10      0.04      0.06      1566
           7       0.12      0.11      0.12      1488

    accuracy                           0.13     12500
   macro avg       0.13      0.13      0.11     12500
weighted avg       0.13      0.13      0.11     12500

************************
************************
**********

  _data = np.array(data, dtype=dtype, copy=copy,


************************
Decision tree best params: {'criterion': 'gini', 'max_depth': 8, 'min_samples_leaf': 1, 'min_samples_split': 4}
************************
Decision tree classification report
              precision    recall  f1-score   support

           0       0.13      0.15      0.14      1559
           1       0.12      0.33      0.17      1571
           2       0.13      0.09      0.10      1655
           3       0.12      0.04      0.06      1593
           4       0.15      0.02      0.04      1577
           5       0.13      0.04      0.06      1491
           6       0.10      0.03      0.04      1566
           7       0.12      0.29      0.17      1488

    accuracy                           0.12     12500
   macro avg       0.13      0.12      0.10     12500
weighted avg       0.13      0.12      0.10     12500

************************
************************
************************


  _data = np.array(data, dtype=dtype, copy=copy,


************************
Random forest best params: {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 5, 'min_samples_split': 3, 'n_estimators': 70}
************************
Random forest classification report
              precision    recall  f1-score   support

           0       0.12      0.42      0.19      1559
           1       0.12      0.36      0.18      1571
           2       0.07      0.00      0.01      1655
           3       0.15      0.04      0.06      1593
           4       0.13      0.08      0.10      1577
           5       0.13      0.07      0.09      1491
           6       0.15      0.01      0.01      1566
           7       0.08      0.00      0.01      1488

    accuracy                           0.12     12500
   macro avg       0.12      0.12      0.08     12500
weighted avg       0.12      0.12      0.08     12500

************************
************************
************************
************************
Naive Bayes classification r

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
scores_50_percent_train_size = complete_scores(features, target, 0.5)
print(scores_50_percent_train_size)

features_train shape: (25000, 16)
features_test shape: (25000, 16)
target_train shape: (25000,)
target_test shape: (25000,)


************************
Logistic regression best params: {'max_iter': 5000, 'solver': 'lbfgs'}
************************
Logistic regression classification report
              precision    recall  f1-score   support

           0       0.12      0.22      0.16      3175
           1       0.13      0.21      0.16      3184
           2       0.12      0.09      0.10      3151
           3       0.12      0.10      0.11      3116
           4       0.11      0.10      0.10      3142
           5       0.12      0.16      0.14      3074
           6       0.11      0.03      0.05      3117
           7       0.13      0.07      0.09      3041

    accuracy                           0.12     25000
   macro avg       0.12      0.12      0.11     25000
weighted avg       0.12      0.12      0.11     25000

************************
************************
**********

  _data = np.array(data, dtype=dtype, copy=copy,


************************
Random forest best params: {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 4, 'min_samples_split': 5, 'n_estimators': 60}
************************
Random forest classification report
              precision    recall  f1-score   support

           0       0.12      0.30      0.17      3175
           1       0.13      0.14      0.14      3184
           2       0.12      0.18      0.14      3151
           3       0.12      0.21      0.16      3116
           4       0.12      0.09      0.10      3142
           5       0.12      0.06      0.08      3074
           6       0.08      0.00      0.00      3117
           7       0.14      0.00      0.00      3041

    accuracy                           0.12     25000
   macro avg       0.12      0.12      0.10     25000
weighted avg       0.12      0.12      0.10     25000

************************
************************
************************
************************
Naive Bayes classification r

In [None]:
scores_25_percent_train_size = complete_scores(features, target, 0.25)
print(scores_25_percent_train_size)

features_train shape: (12500, 16)
features_test shape: (37500, 16)
target_train shape: (12500,)
target_test shape: (37500,)






************************
Logistic regression best params: {'max_iter': 7500, 'solver': 'saga'}
************************
Logistic regression classification report
              precision    recall  f1-score   support

           0       0.13      0.19      0.15      4731
           1       0.13      0.13      0.13      4812
           2       0.13      0.09      0.11      4749
           3       0.12      0.14      0.13      4700
           4       0.11      0.08      0.09      4740
           5       0.13      0.19      0.15      4637
           6       0.11      0.03      0.05      4610
           7       0.12      0.17      0.14      4521

    accuracy                           0.13     37500
   macro avg       0.13      0.13      0.12     37500
weighted avg       0.13      0.13      0.12     37500

************************
************************
************************
************************
KNN best params: {'n_neighbors': 6}
************************
KNN classification report


  _data = np.array(data, dtype=dtype, copy=copy,


************************
Decision tree best params: {'criterion': 'entropy', 'max_depth': 9, 'min_samples_leaf': 5, 'min_samples_split': 2}
************************
Decision tree classification report
              precision    recall  f1-score   support

           0       0.13      0.40      0.19      4731
           1       0.13      0.14      0.14      4812
           2       0.12      0.05      0.07      4749
           3       0.12      0.09      0.11      4700
           4       0.12      0.11      0.11      4740
           5       0.12      0.08      0.09      4637
           6       0.12      0.10      0.11      4610
           7       0.12      0.02      0.03      4521

    accuracy                           0.12     37500
   macro avg       0.12      0.12      0.11     37500
weighted avg       0.12      0.12      0.11     37500

************************
************************
************************


  _data = np.array(data, dtype=dtype, copy=copy,


************************
Random forest best params: {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 5, 'min_samples_split': 4, 'n_estimators': 60}
************************
Random forest classification report
              precision    recall  f1-score   support

           0       0.13      0.64      0.21      4731
           1       0.13      0.06      0.08      4812
           2       0.14      0.04      0.06      4749
           3       0.13      0.09      0.10      4700
           4       0.10      0.01      0.02      4740
           5       0.13      0.11      0.12      4637
           6       0.12      0.05      0.07      4610
           7       0.13      0.01      0.02      4521

    accuracy                           0.13     37500
   macro avg       0.13      0.13      0.08     37500
weighted avg       0.13      0.13      0.09     37500

************************
************************
************************
************************
Naive Bayes classification r

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
scores_10_percent_train_size = complete_scores(features, target, 0.1)
print(scores_10_percent_train_size)

features_train shape: (5000, 16)
features_test shape: (45000, 16)
target_train shape: (5000,)
target_test shape: (45000,)






************************
Logistic regression best params: {'max_iter': 5000, 'solver': 'saga'}
************************
Logistic regression classification report
              precision    recall  f1-score   support

           0       0.11      0.00      0.00      5754
           1       0.13      0.07      0.09      5718
           2       0.13      0.15      0.14      5671
           3       0.13      0.22      0.16      5653
           4       0.12      0.26      0.17      5615
           5       0.13      0.07      0.09      5581
           6       0.12      0.16      0.14      5553
           7       0.13      0.09      0.11      5455

    accuracy                           0.13     45000
   macro avg       0.12      0.13      0.11     45000
weighted avg       0.12      0.13      0.11     45000

************************
************************
************************
************************
KNN best params: {'n_neighbors': 3}
************************
KNN classification report


  _data = np.array(data, dtype=dtype, copy=copy,


************************
Decision tree best params: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 5, 'min_samples_split': 2}
************************
Decision tree classification report
              precision    recall  f1-score   support

           0       0.14      0.04      0.07      5754
           1       0.13      0.07      0.09      5718
           2       0.13      0.03      0.04      5671
           3       0.11      0.06      0.08      5653
           4       0.13      0.45      0.20      5615
           5       0.13      0.09      0.11      5581
           6       0.12      0.18      0.15      5553
           7       0.12      0.09      0.11      5455

    accuracy                           0.13     45000
   macro avg       0.13      0.13      0.10     45000
weighted avg       0.13      0.13      0.10     45000

************************
************************
************************


  _data = np.array(data, dtype=dtype, copy=copy,


************************
Random forest best params: {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 5, 'min_samples_split': 5, 'n_estimators': 90}
************************
Random forest classification report
              precision    recall  f1-score   support

           0       0.17      0.00      0.00      5754
           1       0.12      0.05      0.08      5718
           2       0.12      0.16      0.14      5671
           3       0.12      0.12      0.12      5653
           4       0.12      0.48      0.20      5615
           5       0.10      0.01      0.02      5581
           6       0.12      0.13      0.12      5553
           7       0.13      0.03      0.05      5455

    accuracy                           0.12     45000
   macro avg       0.13      0.12      0.09     45000
weighted avg       0.13      0.12      0.09     45000

************************
************************
************************
************************
Naive Bayes classification repo

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
scores_5_percent_train_size = complete_scores(features, target, 0.05)
print(scores_5_percent_train_size)

features_train shape: (2500, 16)
features_test shape: (47500, 16)
target_train shape: (2500,)
target_test shape: (47500,)






************************
Logistic regression best params: {'max_iter': 7500, 'solver': 'saga'}
************************
Logistic regression classification report
              precision    recall  f1-score   support

           0       0.12      0.07      0.09      6034
           1       0.13      0.17      0.15      6025
           2       0.12      0.42      0.19      5969
           3       0.13      0.14      0.13      5977
           4       0.11      0.02      0.04      5990
           5       0.00      0.00      0.00      5888
           6       0.13      0.04      0.06      5859
           7       0.13      0.13      0.13      5758

    accuracy                           0.13     47500
   macro avg       0.11      0.12      0.10     47500
weighted avg       0.11      0.13      0.10     47500

************************
************************
************************
************************
KNN best params: {'n_neighbors': 4}
************************
KNN classification report


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


************************
Random forest best params: {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 3, 'min_samples_split': 4, 'n_estimators': 60}
************************
Random forest classification report
              precision    recall  f1-score   support

           0       0.12      0.06      0.08      6034
           1       0.13      0.08      0.10      6025
           2       0.12      0.60      0.20      5969
           3       0.12      0.11      0.12      5977
           4       0.19      0.00      0.01      5990
           5       0.00      0.00      0.00      5888
           6       0.12      0.04      0.06      5859
           7       0.12      0.09      0.11      5758

    accuracy                           0.12     47500
   macro avg       0.12      0.12      0.08     47500
weighted avg       0.12      0.12      0.08     47500

************************
************************
************************


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


************************
Naive Bayes classification report
              precision    recall  f1-score   support

           0       0.13      0.08      0.10      6034
           1       0.13      0.22      0.16      6025
           2       0.12      0.20      0.15      5969
           3       0.13      0.09      0.10      5977
           4       0.12      0.12      0.12      5990
           5       0.13      0.01      0.01      5888
           6       0.13      0.11      0.12      5859
           7       0.13      0.19      0.15      5758

    accuracy                           0.13     47500
   macro avg       0.13      0.13      0.11     47500
weighted avg       0.13      0.13      0.11     47500

************************
************************
************************
************************
Support vector machine best params: {'kernel': 'poly'}
************************
Support vector machine classification report
              precision    recall  f1-score   support

          

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


### Summary: Each model was drastically ineffective in its predictions, settling at 12.5% each time. In other notebooks, I will try to make an analysis by examining other characteristics and modifying the parameters of the classification models.