***

# E-Commerce Reviews Classification with Cohere's Classify Endpoint

## Introduction

***

In this lab, we'll learn how to use Cohere's Classify cohere endpoint. This codelab is my own exercise after taking the [Classify endpoint lesson](https://docs.cohere.com/docs/classify-endpoint/) of LLM University.

LLMs have been pre-trained with a vast amount of training data, allowing them to capture how words are being used and how their meaning changes depending on the context. A very common application of this is text classification.<br> Cohere’s Classify endpoint makes it easy to do text classification. <br> There is a bunch of text that could be classified. In this notebook, we'll work with reviews text from an e-commerce website. The task could be associated to some kind of sentiment analyzis, usefulfor applications like categorizing product feedback.

## 1. Setting Up

Here the Cohere Python SDK isinstalled. Notice that an API key is needed, which can be generated from the Cohere [dashboard](https://os.cohere.ai/register) or [CLI tool](https://docs.cohere.ai/cli-key).

In [58]:
# Install the libraries
! pip install cohere altair umap-learn > /dev/null

In [59]:
# Import the libraries
import cohere
import pandas as pd
import numpy as np
import altair as alt
import textwrap as tr
from cohere.responses.classify import Example

# Setup the Cohere client
api_key = '66n3y4rkhk7Lw9AV6p5RjjxuxTHJYZnevyjBWrGK' # Paste your API key here. Remember to not share it publicly
co = cohere.Client(api_key)

## 2. About Dataset

#### Context :
This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

#### Description :    
This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

- Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
- Age: Positive Integer variable of the reviewers age.
- Title: String variable for the title of the review.
- Review Text: String variable for the review body.
- Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
- Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
- Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
- Division Name: Categorical name of the product high level division.
- Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.

## 3. Preprocessing

This step includes preprocessing step such as : dataset and libraries loading, feature engineering, feature extraction and other prior task depending on the aim of the notebook.

#### Importing dataset

In [60]:
dataset = pd.read_csv('/content/Womens Clothing E-Commerce Reviews.csv')
dataset.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


#### Feature Extraction 

This is just a quick explanation. So we don't need the overall dataset. The size and the dimensionnality are reduced in order to make the dataset suitable for our small study experiment.<br>Only **200** are taken and two features are extracted which are :
 - **Review Text** feature ;
 - **Rating** feature.

In [61]:
dataset = dataset.iloc[:200, 4:6]
dataset.head()

Unnamed: 0,Review Text,Rating
0,Absolutely wonderful - silky and sexy and comf...,4
1,Love this dress! it's sooo pretty. i happene...,5
2,I had such high hopes for this dress and reall...,3
3,"I love, love, love this jumpsuit. it's fun, fl...",5
4,This shirt is very flattering to all due to th...,5


#### Label Study and Encoding

First we have to check unique valu from the Rating feature. The check gives us ideas about rating range. The rating range is useful for the next step.

In [40]:
dataset.Rating.unique()

array([4, 5, 3, 2, 1])

The verification process shows that the rating range is from 1 to 5. After a look on data we can affirms that ratings are ascending which means :    
 - 1 is bad rating;
 - 5 is good rating.
 Since we've got five values, they will be encoded like following:
  - 1 means **Very Bad**;
  - 2 means **Bad**;
  - 2 means **Fair**;
  - 2 means **Good**;
  - 2 means **Very Good**.

*PS: Superlative is not used to qualify rating for comprehension purpose.*

In [62]:
encode = {1 : 'Very Bad', 2 : 'Bad', 3 : 'Fair', 4 : 'Good', 5: 'Very Good'}
dataset['Rating'] = dataset['Rating'].map(encode)

#### Feature Engineering

This section includes data splitting, training examples and inputs creation.
- Data Splitting: The dataset is split in two sets:
 - Training set (150 rows) used to created training examples.
 - Test set (remaining rows after dropping NaN) used to make inputs.
- Training examples are passed through the classifier allowing it to learn first from the data (Supervized Learn). That is the reason why training examples are labelled.
- Inputs: After learning from training example, then inputs are used to predict new label and evaluate the model performance.

In [63]:
# Data splitting
train_set = dataset.iloc[:150, :]
test_set = dataset.iloc[150:, :]

# Creating Training Examples
examples = []
for reviews, ratings in zip(train_set['Review Text'], train_set['Rating']):
  examples.append(Example(str(reviews), str(ratings)))

# Creating Inputs
test_set.dropna()
inputs = []
for r in test_set['Review Text']:
  inputs.append(str(r))

## 4. Classification Model and Prediction

Cohere’s Classify endpoint makes it easy to take a list of texts and predict their categories, or classes.

In [64]:
# Classification Function with the Classify endpoint
def text_classifier(input, example):
  classifier = co.classify(inputs = input,
                         model='embed-english-v2.0',
                         examples = example)
  classifications = classifier.classifications
  return classifications

In [65]:
# Predictions
predictions = text_classifier(inputs, examples)

# Display Classification Outcomes
classes = ['Very Bad', 'Bad', 'Fair', 'Good', 'Very Good']
for i, p in zip(inputs, predictions):
  class_pred = p.prediction
  class_idx = classes.index(class_pred)
  class_conf = p.confidence

  print(f'Input : {i}')
  print(f'Prediction : {class_pred}')
  print(f'Confidence : {class_conf:.2f}')
  print(f'-'*10)

Input : Like other reviewers noted, the pics don't do this skirt justice. it is truly beautiful with an intricate lace pattern and rich colors. can't wait to wear this to work!
Prediction : Very Good
Confidence : 1.00
----------
Input : Love this skirt. the detail is amazing. runs small i ordered a 12 i'm usually a 10, but still a little snug.
Prediction : Very Good
Confidence : 0.99
----------
Input : Not keeping this one. the fabric is a bit tacky-looking in person, the cut is odd and it's just not me. fit is fine and there are snaps to keep the neckline flat and shaped, the colors are as shown and it is a good length (falls to top of hip). i simply did not like it. too metallic looking maybe...
Prediction : Fair
Confidence : 0.57
----------
Input : The top as with most of ap's tops is well stitched. material is very uncomfortable. if you have large bust it is a little divulging. this may prompt you to wear something underneath to look modest and change the shape of the top!
Predicti

___

**Credits:**
- This material comes from the post [Hello, World! Meet Language AI](https://colab.research.google.com/github/cohere-ai/notebooks/blob/main/notebooks/Hello_World_Meet_Language_AI.ipynb#)
- [Cohere's Classify Endpoint Course](https://docs.cohere.com/docs/classify-endpoint)

___

*If you find this notebook useful please upvote and share with your peers to promote NLP and the outsatnding work from the Cohere Website*.<br>
***THANK YOU !!!***

___