<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

# Multi-Modal: Natural Language for MultiClass

Let's explore how to combine tabular data with other modalities, in this case, natural language text reviews. We'll try to predict a customer's 1-5 star review of a book:

We'll be using the data set from: https://www.kaggle.com/code/meetnagadia/amazon-kindle-book-sentiment-analysis

We'll treat each star as its own class category

## Imports

In [8]:
from autogluon.tabular import TabularDataset, TabularPredictor

## Data

In [12]:
data = TabularDataset("data/kindle_review/review.csv")

In [13]:
data.columns

Index(['asin', 'rating', 'reviewText', 'reviewerID', 'reviewerName'], dtype='object')

In [14]:
data.head()

Unnamed: 0,asin,rating,reviewText,reviewerID,reviewerName
0,B0033UV8HI,3,"Jace Rankin may be short, but he's nothing to ...",A3HHXRELK8BHQG,Ridley
1,B002HJV4DE,5,Great short read. I didn't want to put it dow...,A2RGNZ0TRF578I,Holly Butler
2,B002ZG96I4,3,I'll start by saying this is the first of four...,A3S0H2HV6U1I7F,Merissa
3,B002QHWOEU,3,Aggie is Angela Lansbury who carries pocketboo...,AC4OQW3GZ919J,Cleargrace
4,B001A06VJ8,4,I did not expect this type of book to be in li...,A3C9V987IQHOQD,Rjostler


Note how we won't clean up this data or need to get rid of unique identifiers, AutoGluon is smart enough to feature engineer based on detected natural language text and unique values.

## Train Test Split

In [15]:
train_size = int(len(data) * 0.8)
seed = 42
train_data = data.sample(train_size, random_state=seed)
test_data = data.drop(train_data.index)

In [17]:
save_path = 'book_rating'

In [18]:
predictor = TabularPredictor(label="rating", path=save_path)


Let's pass **multimodal** to the *hyperparameters* argument to tell autogluon that we want it to use its multimodal tuning procedure.
This additionally trains a Transformer Network.

Note that you need to have at least one nvidia GPU to be able to train a transformer in autogluon.
Otherwise just remove the multimodal parameter and we train autogluon as usual.

In [19]:
predictor.fit(train_data, hyperparameters="multimodal")

Beginning AutoGluon training ...
AutoGluon will save models to "book_rating\"
AutoGluon Version:  0.7.0
Python Version:     3.9.12
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.19045
Train Data Rows:    9600
Train Data Columns: 4
Label Column: rating
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	5 unique label values:  [3, 1, 2, 4, 5]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Train Data Class Count: 5
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    24628.19 MB
	Train Data (Original)  Memory Usage: 8.62 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feat

<autogluon.tabular.predictor.predictor.TabularPredictor at 0x2048dbaf6a0>

In [20]:
predictor.get_model_best()

'WeightedEnsemble_L2'

In [31]:
predictor.fit_summary();

*** Summary of fit() ***
Estimated performance of each model:
                 model  score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0  WeightedEnsemble_L2   0.617708       0.487705  232.714246                0.000000           0.162570            2       True          7
1             CatBoost   0.607292       0.335099  202.278498                0.335099         202.278498            1       True          3
2        LightGBMLarge   0.552083       0.292419  131.497060                0.292419         131.497060            1       True          6
3           LightGBMXT   0.542708       0.152606   30.273178                0.152606          30.273178            1       True          2
4              XGBoost   0.538542       0.099047  122.499334                0.099047         122.499334            1       True          4
5             LightGBM   0.534375       0.171542   28.795342                0.171542          28.795342 

## Validation on Test Set

In [22]:
y_test = test_data["rating"]
test_features = test_data.drop(columns=["rating"])

In [23]:
y_pred = predictor.predict(test_features)

In [24]:
metrics = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)

Evaluation: accuracy on test data: 0.58375
Evaluations on test data:
{
    "accuracy": 0.58375,
    "balanced_accuracy": 0.563489631606563,
    "mcc": 0.4728101182359241
}


In [25]:
metrics

{'accuracy': 0.58375,
 'balanced_accuracy': 0.563489631606563,
 'mcc': 0.4728101182359241}

Notice how the accuracy is 60%, this is actually very good for 5 classes, a random guess would be only 20% accurate! Plus this is a difficult problem with 3-4 stars being hard to discern.

We can use sklearn to obtain more in-depth metrics like a confusion matrix

In [26]:
from sklearn.metrics import confusion_matrix

In [27]:
confusion_matrix(y_test, y_pred)

array([[263,  89,  21,  18,  17],
       [116, 171,  48,  34,  13],
       [ 22,  60, 152, 148,  42],
       [  8,  14,  56, 329, 165],
       [  2,   1,   6, 119, 486]], dtype=int64)

We can see that the model mainly had difficulty in trying to differentiate between 3 and 4 as well as 4 and 5, which is very reasonable, as that can also be hard for human to discern and its very subjective!