>**Developer:** Mukesh Kumar

>**Email:** coldperformer@gmail.com

>**LinkedIn:** https://www.linkedin.com/in/mk09/

---
# **Table of Contents**
---

**1.** [**Introduction**](#Section1)<br>
**2.** [**Problem Statement**](#Section2)<br>
**3.** [**Installing & Importing Libraries**](#Section3)<br>
  - **3.1** [**Installing & Upgrading Libraries**](#Section31)
  - **3.2** [**Importing Libraries**](#Section32)

**4.** [**Data Acquisition & Description**](#Section4)<br>
**5.** [**Data Pre-processing**](#Section5)<br>
  - **5.1** [**Pre-Profiling Report**](#Section51)<br>
  - **5.2** [**Post-Profiling Report**](#Section52)<br>

**6.** [**Exploratory Data Analysis**](#Section6)<br>
**7.** [**Post Data Processing & Feature Selection**](#Section7)<br>
  - **7.1** [**Feature Encoding**](#Section71)<br>
  - **7.2** [**Feature Selection Using Random Forest**](#Section72)<br>
  - **7.3** [**Feature Scaling**](#Section73)<br>
  - **7.4** [**Data Splitting**](#Section74)<br>

**8.** [**Model Development & Evaluation**](#Section8)<br>
  - **8.1** [**Linear Regression**](#Section81)<br>
  - **8.2** [**Random Forest**](#Section82)<br>
  - **8.3** [**Final Predictions**](#Section83)<br>

**9.** [**Summarization**](#Section9)<br>
  - **9.1** [**Conclusion**](#Section91)<br>
  - **9.2** [**Actionable Insights**](#Section92)<br>

---
<a name = Section1></a>
# **1. Introduction**
---

- Sentiment Analysis refers to contextual mining of text which identifies and extracts subjective information in source material.

- It helps a business to understand the social sentiment of their brand, product or service while monitoring online conversations.

- With the recent advances in deep learning, the ability of algorithms to analyse text has improved considerably. 

- Creative use of advanced artificial intelligence techniques can be an effective tool for doing in-depth research.

- It is extremely important to classify incoming customer conversation about a brand based on following lines:

    - Key aspects of a brand’s product and service that customers care about.
    
    - Users’ underlying intentions and reactions concerning those aspects.

- There is also a playground available to test out the sentiment of the text by <a href="https://komprehend.io/">Komprehend</a>.

---
<a name = Section2></a>
# **2. Problem Statement**
---

- Sentiment Analysis works by analyzing an incoming message and tells whether the underlying sentiment is positive, negative, or neutral.

- The applications of sentiment analysis are endless. Let’s jump into a real-world example of how people show their sentiment on movies present on IMDB. 

<center><img src="https://d3caycb064h6u1.cloudfront.net/wp-content/uploads/2021/06/sentimentanalysishotelgeneric-2048x803-1.jpg"></center>

**<h4>Scenario (Hypothetical):</h4>**

- **Ronzaro**, is a British American company that **buys and sells first hand copy of movies**.

- They **initiated** their business in **late 90s** and have gain pretty good popularity over the years.

- Company has started **facing loss** in business due to the evolvement in the area.

- There are **several competitors in the market** who have been using enhanced techniques.

- As the company is pretty old, they are **failing to analyze customers behaviour** towards their services.

- They are **looking for a more robust way** to **understand** the **customer's** sentiment.

- Recently they get to know about **data scientists** who helps businesses to sort out such issues.

- They **decided to hire a team of data scientist**. Consider you are one of them…

---
<a name = Section3></a>
# **3. Installing & Importing Libraries**
---

<a name = Section31></a>
### **3.1 Installing & Upgrading Libraries**

In [None]:
!pip install -q --upgrade datascience                               # A package that is required by pandas-profiling library
!pip install -q --upgrade pandas-profiling                          # A library to generate basic statistics about data
!pip install -q --upgrade yellowbrick                               # Toolbox for Measuring Machine Performance

<a name = Section32></a>
### **3.2 Importing Libraries**

In [26]:
#-------------------------------------------------------------------------------------------------------------------------------
import re
import pandas as pd                                                 # Importing for panel data analysis
pd.set_option('display.max_columns', None)                          # Unfolding hidden features if the cardinality is high
pd.set_option('display.max_colwidth', None)                         # Unfolding the max feature width for better clearity
pd.set_option('mode.chained_assignment', None)                      # Removing restriction over chained assignments operations
#-------------------------------------------------------------------------------------------------------------------------------
import matplotlib.pyplot as plt                                     # Importing pyplot interface using matplotlib
import seaborn as sns                                               # Importin seaborm library for interactive visualization
%matplotlib inline
#-------------------------------------------------------------------------------------------------------------------------------
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from sklearn.utils import shuffle
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
#-------------------------------------------------------------------------------------------------------------------------------
import warnings                                                     # Importing warning to disable runtime warnings
warnings.filterwarnings("ignore")                                   # Warnings will appear only once

---
<a name = Section4></a>
# **4. Data Acquisition & Description**
---


- The data set is based on the sentiment of movie reviews present on IMDB provided by Ronzaro.

| Records | Features | Dataset Size |
| :-- | :-- | :-- |
| 50000 | 2 | 63.1 MB | 


| Id | Features | Description |
| :-- | :-- | :-- |
|01|**review**|Review of the movie.|
|02|**sentiment**|Sentiment concerning the movie.|

In [12]:
data = pd.read_csv(filepath_or_buffer='https://gitlab.com/coldperformer/multimedia/-/raw/main/machine-learning-projects/data/04-imdb-movies-reviews.csv')
print('Data Shape:', data.shape)
data.head()

Data Shape: (50000, 2)


Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",positive
1,"A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. <br /><br />The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.<br /><br />This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.<br /><br />This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter. <br /><br />This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.<br /><br />The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.<br /><br />The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.<br /><br />We wish Mr. Mattei good luck and await anxiously for his next work.",positive


---
<a name = Section5></a>
# **5. Data Pre-processing**
---

### **Identification and Handling of Missing Values**

In [13]:
data.isnull().sum()

review       0
sentiment    0
dtype: int64

### **Identification and Handling of Duplicate Values**

In [14]:
data.duplicated().sum()

418

In [15]:
print('Old Data Shape:', data.shape)
data.drop_duplicates(inplace=True)
print('Duplicate data rows dropped!')
print('New Data Shape:', data.shape)

Old Data Shape: (50000, 2)
Duplicate data rows dropped!
New Data Shape: (49582, 2)


In [16]:
data.duplicated().sum()

0

### **Text Cleaning**

In [30]:
stop_words = stopwords.words('english')
stop_words.append('movie')

def clean_data(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)',text)
    text = (re.sub('[\W]+', ' ', text.lower()) +' '.join(emoticons).replace('-', ''))
    rm_words = [w for w in text.split() if w.lower() not in stop_words]
    return ' '.join(rm_words)

clean_data('hello john!-;/.,<>? how are you.. @#$%^&*I am fine movie:')

'hello john fine'

In [34]:
stop_words = stopwords.words('english')
stop_words.append('movie')

def clean_data(text):
    """Cleans text data containing punctuation, stopwords, emoticons, stems, and lemmas."""

    # Removing punctuation, stopwords, and emoticons
    text = re.sub('<[^>]*>', '', text)
    text = re.sub(r'[0-9]', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)', text)
    text = (re.sub('[\W]+', ' ', text.lower()) +' '.join(emoticons).replace('-', ''))
    rm_words = [w for w in text.split() if w.lower() not in stop_words]
    
    # Performing Stemming
    stemmer = PorterStemmer()
    stem_words = [stemmer.stem(word) for word in rm_words]

    # Performing Lemmatization
    lemmatizer = WordNetLemmatizer()
    lemma_words = [lemmatizer.lemmatize(word) for word in stem_words]
    
    return ' '.join(lemma_words)

clean_data('hello john!-;/.,<>? how are you.. @#$%^&*I am fine movie:')

'hello john fine'

In [35]:
data['review'] = data['review'].apply(clean_data)
data.head()

Unnamed: 0,review,sentiment
0,one review mention watch oz episod hook right exactli happen first thing struck oz brutal unflinch scene violenc set right word go trust show faint heart timid show pull punch regard drug sex violenc hardcor classic use word call oz nicknam given oswald maximum secur state penitentari focu mainli emerald citi experi section prison cell glass front face inward privaci high agenda em citi home mani aryan muslim gangsta latino christian italian irish scuffl death stare dodgi deal shadi agreement never far away would say main appeal show due fact goe show dare forget pretti pictur paint mainstream audienc forget charm forget romanc oz mess around first episod ever saw struck nasti surreal say readi watch develop tast oz got accustom high level graphic violenc violenc injust crook guard sold nickel inmat kill order get away well manner middl class inmat turn prison bitch due lack street skill prison experi watch oz may becom comfort uncomfort view get touch darker side,positive
1,wonder littl product film techniqu unassum old time bbc fashion give comfort sometim discomfort sen realism entir piec actor extrem well chosen michael sheen got polari voic pat truli see seamless edit guid refer william diari entri well worth watch terrificli written perform piec master product one great master comedi life realism realli come home littl thing fantasi guard rather use tradit dream techniqu remain solid disappear play knowledg sen particularli scene concern orton halliwel set particularli flat halliwel mural decor everi surfac terribl well done,positive
2,thought wonder way spend time hot summer weekend sit air condit theater watch light heart comedi plot simplist dialogu witti charact likabl even well bread suspect serial killer may disappoint realiz match point risk addict thought proof woodi allen still fulli control style mani u grown love laugh one woodi comedi year dare say decad never impress scarlet johanson manag tone sexi imag jump right averag spirit young woman may crown jewel career wittier devil wear prada interest superman great comedi go see friend,positive
3,basic famili littl boy jake think zombi closet parent fight time slower soap opera suddenli jake decid becom rambo kill zombi ok first go make film must decid thriller drama drama watchabl parent divorc argu like real life jake closet total ruin film expect see boogeyman similar instead watch drama meaningless thriller spot well play parent descent dialog shot jake ignor,negative
4,petter mattei love time money visual stun film watch mr mattei offer u vivid portrait human relat seem tell u money power success peopl differ situat encount variat arthur schnitzler play theme director transfer action present time new york differ charact meet connect one connect one way anoth next person one seem know previou point contact stylishli film sophist luxuri look taken see peopl live world live habitat thing one get soul pictur differ stage lone one inhabit big citi exactli best place human relat find sincer fulfil one discern case peopl encount act good mr mattei direct steve buscemi rosario dawson carol kane michael imperioli adrian grenier rest talent cast make charact come aliv wish mr mattei good luck await anxiou next work,positive
