# **ChatGPT User Reviews**

## Overview

ChatGPT is a chatbot and virtual assistant created by OpenAI, released on November 30, 2022. It utilizes large language models, allowing users to guide and shape conversations according to preferred length, format, style, detail, and language.

#### Data Description
This dataset comprises daily updates of user reviews and ratings for the ChatGPT Android App's section on the Google Play Store. It includes several key attributes that detail various aspects of the reviews, offering insights into user experiences and feedback over time.


- `userName`: The display name of the individual who submitted the review.
- `content`: The written text of the review, which includes the user's opinions, feedback, and detailed descriptions of their experience with the ChatGPT app.
- `score`: The rating assigned by the user, generally between 1 and 5. This field reflects the numerical evaluation given by the user, with higher scores indicating more positive experiences and lower scores suggesting dissatisfaction.
- `thumbsUpCount`: The count of thumbs up (likes) that the review has received. This metric indicates how many other users found the review useful or agreed with the sentiments expressed, serving as a gauge of the review‚Äôs relevance and influence.
- `at`: The timestamp when the review was posted. This field records the date and time of the review submission, which is essential for tracking the distribution of reviews and analyzing trends over time.



<a id="cont"></a>

## Table of Contents

- [1. Import Packages](#one)
- [2. Load Data](#two)
- [3. Exploratory Data Analysis (EDA)](#three)
- [4. Data Engineering](#four)
- [5. Modeling](#five)
- [6. Model Performance](#six)
- [7. Model Explanations](#seven)
- [8. Conclusion](#eight)

<a id="one"></a>
# 1. Import Packages
[Back to Table of Contents](#cont)

---

In [1]:
import pandas as pd
import numpy as np
import nltk
import string
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

<a id="two"></a>
# 2. Load Data
[Back to Table of Contents](#cont)

---

In [2]:
reviews = pd.read_csv("clean_chatgpt_reviews.csv")

<a id="three"></a>
# 3. Exploratory Data Analysis (EDA)
[Back to Table of Contents](#cont)

---

In [3]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,userName,content,score,thumbsUpCount,at
0,0,T H (Trudylh20),excellent Im impressed üëå üëè,5,0,28-06-2024 21:07
1,1,Muhammad bassam adam,perfect,5,0,28-06-2024 20:56
2,2,Chinaza Okoli,its been so helpful...love it,5,0,28-06-2024 20:54
3,3,Project House Group Ltd,It's amazing tools help me a lot with my work.,5,0,28-06-2024 20:51
4,4,Safoan Riyad,I enjoyed ChatGPT. But last update ruined ever...,1,0,28-06-2024 20:50


In [4]:
reviews.tail()

Unnamed: 0.1,Unnamed: 0,userName,content,score,thumbsUpCount,at
149714,149720,m.santhosh Kumar,Update 2023,5,0,27-07-2023 16:26
149715,149721,Andrew Bourgeois,its grear,5,0,23-09-2023 16:25
149716,149722,Dern Bob,Funtastic App,5,0,08-11-2023 13:57
149717,149723,Abdur rahman arif,hi all,5,0,25-07-2023 15:32
149718,149724,Tushar Deran,expert application,5,0,30-11-2023 18:11


In [5]:
# Display basic information about the training data
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149719 entries, 0 to 149718
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Unnamed: 0     149719 non-null  int64 
 1   userName       149719 non-null  object
 2   content        149719 non-null  object
 3   score          149719 non-null  int64 
 4   thumbsUpCount  149719 non-null  int64 
 5   at             149719 non-null  object
dtypes: int64(3), object(3)
memory usage: 6.9+ MB


In [6]:
# Display descriptive statistics of the training data
reviews.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,149719.0,74861.348252,43221.928693,0.0,37430.5,74860.0,112291.5,149724.0
score,149719.0,4.491848,1.096859,1.0,5.0,5.0,5.0,5.0
thumbsUpCount,149719.0,0.492937,12.285116,0.0,0.0,0.0,0.0,1193.0


In [7]:
# Check for missing values in the training data
reviews_missing_values = reviews.isnull().sum()
reviews_missing_values

Unnamed: 0       0
userName         0
content          0
score            0
thumbsUpCount    0
at               0
dtype: int64