In multi-class classification, each input will have only one output class, but in multi-label classification, each input can have multi-output classes.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## MultiClass classification

In [23]:
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=120, 
                           n_features=20, 
                           n_redundant=0,
                           n_classes=5, 
                           n_informative=4, 
                           random_state=1, 
                           n_clusters_per_class=1)

In [24]:
print(f"{X.shape}    ,   {y.shape}")

(120, 20)    ,   (120,)


* One-vs-Rest (OvR) multiclass strategy
* One-vs-One (OvO) multiclass strategy
* Multinomial method

### SOLVER
Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

    - For small datasets, 'liblinear' is a good choice, whereas 'sag'
        and 'saga' are faster for large ones;
    - For multiclass problems, only 'newton-cg', 'sag', 'saga' and
        'lbfgs' handle multinomial loss;
    - 'liblinear' is limited to one-versus-rest schemes.
    - 'newton-cholesky' is a good choice for `n_samples` >> `n_features`,
        especially with one-hot encoded categorical features with rare
        categories. Note that it is limited to binary classification and the
        one-versus-rest reduction for multiclass classification. Be aware that
        the memory usage of this solver has a quadratic dependency on
        `n_features` because it explicitly computes the Hessian matrix.

* lbfgs: Limited-memory Broyden–Fletcher–Goldfarb–Shanno
* liblinear: library of large linear classification
* sag: Stochastic Average Gradient
* saga: Stochastic Average Gradient Accelerated Method

### Multi_class:
    - If the option chosen is 'ovr', then a binary problem is fit for each
    label. For 'multinomial' the loss minimised is the multinomial loss fit
    across the entire probability distribution, *even when the data is
    binary*. 'multinomial' is unavailable when solver='liblinear'.
    'auto' selects 'ovr' if the data is binary, or if solver='liblinear',
    and otherwise selects 'multinomial'.



### OvR MOdelling

In [39]:
# Create one-vs-rest logistic regression instance
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(
    penalty="l2", # l1, l2, elasticnet
    dual=False,
    tol=1e-4,
    C=1.0,
    fit_intercept=True,
    intercept_scaling=1,
    class_weight=None, # dict or 'balanced', either all same. Or when 'balanced' based on class frequency it will be calculated
    random_state=None,
    solver="liblinear", #{'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}
    max_iter=100,
    multi_class="ovr", ## one-vs-rest (OvR) or cross-entropy loss if multinomial, {'auto', 'ovr', 'multinomial'}, default='auto'
    verbose=1,
    warm_start=False,
    n_jobs=None,
    l1_ratio=None)

# Train the model
model.fit(X[:100], y[:100])

# Make predictions
y_pred = model.predict(X[100:])

[LibLinear]iter  1 act 3.480e+01 pre 3.053e+01 delta 1.895e+00 f 6.931e+01 |g| 4.609e+01 CG   4
iter  2 act 5.331e+00 pre 4.433e+00 delta 1.895e+00 f 3.451e+01 |g| 1.190e+01 CG   4
iter  3 act 8.346e-01 pre 7.496e-01 delta 1.895e+00 f 2.918e+01 |g| 3.554e+00 CG   4
iter  4 act 3.523e-02 pre 3.423e-02 delta 1.895e+00 f 2.835e+01 |g| 6.539e-01 CG   4
iter  5 act 1.182e-04 pre 1.180e-04 delta 1.895e+00 f 2.831e+01 |g| 4.468e-02 CG   5
iter  6 act 2.771e-07 pre 2.771e-07 delta 1.895e+00 f 2.831e+01 |g| 2.759e-03 CG   4
iter  1 act 4.205e+01 pre 3.617e+01 delta 1.394e+00 f 6.931e+01 |g| 7.104e+01 CG   3
iter  2 act 1.012e+01 pre 8.007e+00 delta 1.394e+00 f 2.727e+01 |g| 2.069e+01 CG   3
iter  3 act 3.391e+00 pre 2.819e+00 delta 1.394e+00 f 1.714e+01 |g| 7.667e+00 CG   4
iter  4 act 5.051e-01 pre 4.600e-01 delta 1.394e+00 f 1.375e+01 |g| 2.295e+00 CG   4
iter  5 act 1.450e-02 pre 1.421e-02 delta 1.394e+00 f 1.325e+01 |g| 3.791e-01 CG   4
iter  6 act 7.991e-05 pre 7.986e-05 delta 1.394e+00 f 

In [36]:
model.classes_

array([0, 1, 2, 3, 4])

In [44]:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

# Create a logistic regression instance
model = LogisticRegression()

# Define the OvR strategy
ovr = OneVsRestClassifier(estimator = model) ## Also used for multilabel classification

# Train the model
ovr.fit(X[:100], y[:100])

# Make predictions
y_pred = ovr.predict(X[100:])
y_pred

array([2, 0, 4, 3, 4, 0, 2, 3, 3, 1, 1, 1, 4, 0, 2, 1, 0, 0, 2, 1])

### OvO modelling

In [43]:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier

# Create a logistic regression instance
model = LogisticRegression(verbose=1)

# Define the OvO strategy
ovo = OneVsOneClassifier(estimator = model)

# Train the model
ovo.fit(X[:100], y[:100])

# Make predictions
y_pred = ovo.predict(X[100:])
y_pred

array([2, 0, 4, 3, 4, 0, 4, 3, 3, 1, 1, 1, 4, 0, 2, 1, 2, 0, 2, 1])

### Multinomial

In [45]:
# Create multinomial logistic regression instance
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(multi_class="multinomial")

# Train the model
model.fit(X[:100], y[:100])

# Make predictions
y_pred = model.predict(X[100:])

## MultiLabel classification (OneVsRestClassifier is a good option)

Basically, there are three methods to solve a multi-label classification problem, namely:

* Problem Transformation
* Adapted Algorithm
* Ensemble approaches

In [25]:
from sklearn.datasets import make_multilabel_classification

X, y = make_multilabel_classification(n_samples= 120,
                                        n_features= 20,
                                        n_classes= 5,
                                        n_labels= 2, # avg no. of labels per sample can have
                                        length= 50,
                                        # allow_unlabeled= True,
                                        sparse= False,
                                        return_indicator= "dense",
                                        return_distributions= False,
                                        random_state= 42)

In [26]:
print(f"{X.shape}    ,   {y.shape}")

(120, 20)    ,   (120, 5)


In [None]:
# Create one-vs-rest logistic regression instance
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(
    penalty="l2", # l1, l2, elasticnet
    dual=False,
    tol=1e-4,
    C=1.0,
    fit_intercept=True,
    intercept_scaling=1,
    class_weight=None, # dict or 'balanced', either all same. Or when 'balanced' based on class frequency it will be calculated
    random_state=None,
    solver="lbfgs", #{'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}
    max_iter=100,
    multi_class="ovr", ## one-vs-rest (OvR) or cross-entropy loss if multinomial, {'auto', 'ovr', 'multinomial'}, default='auto'
    verbose=1,
    warm_start=False,
    n_jobs=None,
    l1_ratio=None)

# Train the model
model.fit(X[:100], y[:100])

# Make predictions
y_pred = model.predict(X[100:])

In [None]:
# Define the OvO strategy
from sklearn.multiclass import OneVsOneClassifier
ovr = OneVsOneClassifier(estimator = model) ## Also used for multilabel classification

# Train the model
ovr.fit(X[:100], y[:100])

# Make predictions
y_pred = ovr.predict(X[100:])
y_pred

In [28]:
# Define the OvR strategy
ovr = OneVsRestClassifier(estimator = model) ## Also used for multilabel classification

# Train the model
ovr.fit(X[:100], y[:100])

# Make predictions
y_pred = ovr.predict(X[100:])
y_pred

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           21     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  6.93147D-01    |proj g|=  1.04500D+00

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   21     40     44      1     0     0   7.780D-05   3.103D-01
  F =  0.31028163646785806     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           21     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  6.9

 This problem is unconstrained.
 This problem is unconstrained.
 This problem is unconstrained.
 This problem is unconstrained.
 This problem is unconstrained.


array([[0, 1, 1, 0, 0],
       [0, 1, 0, 1, 0],
       [0, 0, 1, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 0],
       [1, 1, 0, 1, 0],
       [0, 0, 1, 0, 1],
       [0, 0, 1, 0, 1],
       [0, 1, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 1, 0, 0, 1],
       [0, 1, 0, 1, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 1, 0],
       [1, 1, 1, 0, 0],
       [0, 0, 0, 0, 1],
       [1, 1, 0, 1, 0],
       [0, 1, 0, 1, 0]])

### With Sample Dataset

In [31]:
df=pd.read_csv("data/sentiment/twitter_training.csv", header= None, names=["id","user","sentiment","text"], nrows=20000)
df.dropna(inplace=True)
df = df.reset_index(drop=True)
df.head()

Unnamed: 0,id,user,sentiment,text
0,2401,Borderlands,Positive,im getting on borderlands and i will murder yo...
1,2401,Borderlands,Positive,I am coming to the borders and I will kill you...
2,2401,Borderlands,Positive,im getting on borderlands and i will kill you ...
3,2401,Borderlands,Positive,im coming on borderlands and i will murder you...
4,2401,Borderlands,Positive,im getting on borderlands 2 and i will murder ...


In [32]:
df.sentiment.value_counts()

sentiment
Positive      6064
Negative      5418
Neutral       4839
Irrelevant    3496
Name: count, dtype: int64

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19817 entries, 0 to 19816
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         19817 non-null  int64 
 1   user       19817 non-null  object
 2   sentiment  19817 non-null  object
 3   text       19817 non-null  object
dtypes: int64(1), object(3)
memory usage: 619.4+ KB


In [34]:
words = []
for idx, row in df.iterrows():
    # print(row["text"])
    splitt = str(row["text"]).split()
    words.extend(splitt)

words = list(set(words))
len(words)

28533

In [35]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(min_df=5, max_df=0.75)

In [36]:
tfidf_matrix = tfidf.fit_transform(df["text"])
columns = tfidf.get_feature_names_out()
tf=pd.DataFrame(columns=columns, data=tfidf_matrix.toarray())

In [37]:
from sklearn.preprocessing import OneHotEncoder

one = OneHotEncoder()
y = one.fit_transform(df[["sentiment"]])
columns=one.get_feature_names_out()
yy=pd.DataFrame(columns=columns, data=y.toarray())
df = pd.concat([tf,yy], axis=1)

In [38]:
df.shape

(19817, 7254)

In [39]:
from sklearn.model_selection import train_test_split

trainX,testX,trainy,testy = train_test_split(tf,yy,test_size=0.2,random_state=42)
print(trainX.shape, trainy.shape, testX.shape,  testy.shape)

(15853, 7250) (15853, 4) (3964, 7250) (3964, 4)


In [40]:
# Create one-vs-rest logistic regression instance
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(
    penalty="l2", # l1, l2, elasticnet
    dual=False,
    tol=1e-4,
    C=1.0,
    fit_intercept=True,
    intercept_scaling=1,
    class_weight=None, # dict or 'balanced', either all same. Or when 'balanced' based on class frequency it will be calculated
    random_state=None,
    solver="lbfgs", #{'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}
    max_iter=100,
    multi_class="auto", ## one-vs-rest (OvR) or cross-entropy loss if multinomial, {'auto', 'ovr', 'multinomial'}, default='auto'
    verbose=0,
    warm_start=False,
    n_jobs=-1,
    l1_ratio=None)


In [41]:
from sklearn.multiclass import OneVsRestClassifier
# Define the OvR strategy
ovr = OneVsRestClassifier(estimator = model) ## Also used for multilabel classification

# Train the model
ovr.fit(trainX,trainy)

In [42]:

# Make predictions
y_pred = ovr.predict_proba(testX)
y_pred

array([[0.03104437, 0.26432231, 0.05375648, 0.60118516],
       [0.41919138, 0.26357876, 0.12820511, 0.14330034],
       [0.28705395, 0.10052795, 0.14597941, 0.40936432],
       ...,
       [0.13869504, 0.05776643, 0.23566328, 0.60335586],
       [0.10813051, 0.93656642, 0.02002914, 0.0203874 ],
       [0.13880064, 0.0235273 , 0.51938031, 0.25989339]])