
#**<font color='white gray'>Business Analytics for Data Science Projects</font>**
## **<font color='white gray'>Data Science Project Template - Sentiment Analysis on User Reviews</font>**



# **Step 1: Defining Business Objectives and Data Requirements**

Defining business objectives and data requirements are essential steps in any Data Science project. They serve as the foundation upon which the project will be built and guide all subsequent activities.

## **<font color='red'>1.1 Defining Business Objectives</font>**

This step involves understanding and clearly defining what the company aims to achieve with the project. It is crucial that these objectives align with the overall business goals and are measurable, allowing the project's success to be evaluated.

Business objectives often address questions such as:

- What specific problem are we trying to solve?
- How will this project add value to the company?
- What are the KPIs (Key Performance Indicators) that will be used to measure the project's success?

**Example**:
- An online retailer may have a business objective of increasing sales by 20% per year by using data analysis to personalize customer offers.

**Another example**:
- A company aims to build a Machine Learning model with over 80% accuracy to classify whether a user review is positive, negative, or neutral, helping to adjust marketing campaigns.

## **<font color='red'>1.2 Data Requirements</font>**

Once the business objectives are defined, the next step is to determine the data requirements needed to achieve them. This includes identifying:

1. **What types of data are needed**:
   - For example: customer demographic data, purchase history, website navigation data, etc.

2. **Where the data will come from**:
   - Internal sources (e.g., company databases) or external sources (e.g., third-party data, public datasets).

3. **The quality of data required**:
   - Assessing the accuracy, completeness, and timeliness of the data.

4. **Legal and ethical considerations**:
   - Ensuring compliance with applicable data protection laws.

Data requirements should be clearly documented and agreed upon by all stakeholders. This includes defining:

- The necessary data structure,
- The data volume,
- The update frequency, among other critical aspects.


**Note**: Both steps require effective communication and collaboration between the Data Science team, business stakeholders, and IT teams. Clear and mutual understanding of these objectives and requirements is fundamental to the project's success.

---


# **Step 2: Mapping Data Flow and Business Processes**

Mapping data flow and business processes provides a clear understanding of how data moves across the company and how it is utilized in business processes. This step involves two main parts: mapping the data flow and analyzing business processes.

## **<font color='red'>2.1 Mapping the Data Flow</font>**

This phase focuses on identifying, documenting, and understanding the path data takes from its origin to its final consumption. This includes:

1. **Identifying data sources**:
   - Determining where the data originates, whether internally (ERP systems, CRM, operational databases) or externally (social media data, public datasets, third-party data).

2. **Data path**:
   - Understanding how data flows through the company's systems and processes. This includes all transformations, temporary storage, and cleaning processes the data undergoes.

3. **Point of consumption**:
   - Identifying where and how the data is used, whether for reports, analytics, Machine Learning, or other end uses.

The goal of data flow mapping is to ensure a complete understanding of how data is handled, transformed, and consumed, enabling the identification of potential bottlenecks, inefficiencies, or data quality issues.

## **<font color='red'>2.2 Analyzing Business Processes</font>**

Business process analysis involves understanding how data supports the company’s operations and decisions. This includes:

1. **Documenting business processes**:
   - Describing current business processes, identifying the steps involved, the responsibilities at each stage, and how data is used at each point.

2. **Identifying data requirements**:
   - Based on business processes, determining which data is needed, in what format, frequency, and quality, to support operations effectively.

3. **Identifying improvement opportunities**:
   - Analyzing business processes to identify opportunities where Data Science can optimize operations, improve decision-making, or create additional value.

This phase helps align Data Science projects with business objectives, ensuring that the proposed solutions are relevant and add value to the organization.

---

# **Step 3 - Exploratory Data Analysis (EDA) with Python**

## **Installing and Loading the Packages**

In [78]:
pip install nltk



In [79]:
#1 Imports
import re
import pickle
import nltk
import sklearn
import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import SnowballStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score

In [80]:
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [82]:
#3 Load the dataset
df = pd.read_csv('dataset.csv')

In [83]:
#4 Shape
df.shape

(50000, 2)

In [84]:
#5 Data sample
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as ...",positive
1,A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and g...,positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater...",positive
3,Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fightin...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portr...",positive


In [85]:
#6 Info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   review     50000 non-null  object
 1   sentiment  50000 non-null  object
dtypes: object(2)
memory usage: 781.4+ KB


In [86]:
#7 Count of records by class
df.sentiment.value_counts()

Unnamed: 0_level_0,count
sentiment,Unnamed: 1_level_1
positive,25000
negative,25000


## **Step 4 - Data Cleaning**

In [87]:
#8 Adjust the labels for numerical representation
df.sentiment.replace('positive', 1, inplace=True)
df.sentiment.replace('negative', 0, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df.sentiment.replace('positive', 1, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df.sentiment.replace('negative', 0, inplace=True)
  df.sentiment.replace('negative', 0, inplace=True)


In [88]:
#9 Data sample
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as ...",1
1,A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and g...,1
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater...",1
3,Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fightin...,0
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portr...",1


In [89]:
#10 Let's observe a user review
df.review[0]

"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fa

In [90]:
#11 General data cleaning function
def clean_data(text):
    cleaned = re.compile(r'<.*?>')
    return re.sub(cleaned, '', text)

In [91]:
#12 Testing the function
text_with_tags = "<p>This is an example <b>with</b> HTML tags.</p>"
cleaned_text = clean_data(text_with_tags)
print(cleaned_text)

This is an example with HTML tags.


In [92]:
#13 Apply the function to our dataset
df.review = df.review.apply(clean_data)

In [93]:
#14 View the first review
df.review[0]

"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.I would say the main appeal of the show is due to the fact that it goes where other shows wo

In [94]:
#15 Function for cleaning special characters
def clean_special_characters(text):
    rem = ''
    for i in text:
        if i.isalnum():
            rem = rem + i
        else:
            rem = rem + ' '

    return rem

In [95]:
#16 Testing the function
text_with_special_characters = "Hello, world! How are you?"
cleaned_text = clean_special_characters(text_with_special_characters)
print(cleaned_text)

Hello  world  How are you 


In [96]:
#17 Apply the function
df.review = df.review.apply(clean_special_characters)

In [97]:
#18 View the first review
df.review[0]

'One of the other reviewers has mentioned that after watching just 1 Oz episode you ll be hooked  They are right  as this is exactly what happened with me The first thing that struck me about Oz was its brutality and unflinching scenes of violence  which set in right from the word GO  Trust me  this is not a show for the faint hearted or timid  This show pulls no punches with regards to drugs  sex or violence  Its is hardcore  in the classic use of the word It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary  It focuses mainly on Emerald City  an experimental section of the prison where all the cells have glass fronts and face inwards  so privacy is not high on the agenda  Em City is home to many  Aryans  Muslims  gangstas  Latinos  Christians  Italians  Irish and more    so scuffles  death stares  dodgy dealings and shady agreements are never far away I would say the main appeal of the show is due to the fact that it goes where other shows wo

In [98]:
#19 Function to convert text to lowercase
def convert_to_lowercase(text):
    return text.lower()

In [99]:
#20 Testing the function
sentence = "This is a SENTENCE with UPPERCASE letters"
output_sentence = convert_to_lowercase(sentence)
print(output_sentence)

this is a sentence with uppercase letters


In [100]:
#21 Apply the function
df.review = df.review.apply(convert_to_lowercase)

In [101]:
#22 View the first review
df.review[0]

'one of the other reviewers has mentioned that after watching just 1 oz episode you ll be hooked  they are right  as this is exactly what happened with me the first thing that struck me about oz was its brutality and unflinching scenes of violence  which set in right from the word go  trust me  this is not a show for the faint hearted or timid  this show pulls no punches with regards to drugs  sex or violence  its is hardcore  in the classic use of the word it is called oz as that is the nickname given to the oswald maximum security state penitentary  it focuses mainly on emerald city  an experimental section of the prison where all the cells have glass fronts and face inwards  so privacy is not high on the agenda  em city is home to many  aryans  muslims  gangstas  latinos  christians  italians  irish and more    so scuffles  death stares  dodgy dealings and shady agreements are never far away i would say the main appeal of the show is due to the fact that it goes where other shows wo

In [102]:
#23 Download necessary NLTK resources
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [103]:
#24 Function to remove stopwords
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text)
    return [word for word in words if word.lower() not in stop_words]

In [104]:

# Example sentence
sentence = "They are right, as this is exactly what happened with me."

# Remove stopwords from the example sentence
output_sentence = remove_stopwords(sentence)
print(output_sentence)


['right', ',', 'exactly', 'happened', '.']


In [105]:
#26 Measure the time of execution
%%time
df.review = df.review.apply(remove_stopwords)

CPU times: user 59.7 s, sys: 1.53 s, total: 1min 1s
Wall time: 1min 1s


In [106]:
#27 View the first review
df.review[0]

['one',
 'reviewers',
 'mentioned',
 'watching',
 '1',
 'oz',
 'episode',
 'hooked',
 'right',
 'exactly',
 'happened',
 'first',
 'thing',
 'struck',
 'oz',
 'brutality',
 'unflinching',
 'scenes',
 'violence',
 'set',
 'right',
 'word',
 'go',
 'trust',
 'show',
 'faint',
 'hearted',
 'timid',
 'show',
 'pulls',
 'punches',
 'regards',
 'drugs',
 'sex',
 'violence',
 'hardcore',
 'classic',
 'use',
 'word',
 'called',
 'oz',
 'nickname',
 'given',
 'oswald',
 'maximum',
 'security',
 'state',
 'penitentary',
 'focuses',
 'mainly',
 'emerald',
 'city',
 'experimental',
 'section',
 'prison',
 'cells',
 'glass',
 'fronts',
 'face',
 'inwards',
 'privacy',
 'high',
 'agenda',
 'em',
 'city',
 'home',
 'many',
 'aryans',
 'muslims',
 'gangstas',
 'latinos',
 'christians',
 'italians',
 'irish',
 'scuffles',
 'death',
 'stares',
 'dodgy',
 'dealings',
 'shady',
 'agreements',
 'never',
 'far',
 'away',
 'would',
 'say',
 'main',
 'appeal',
 'show',
 'due',
 'fact',
 'goes',
 'shows',
 'da

In [107]:
#28 Function for stemming
def stemmer(text):
    stemmer_object = SnowballStemmer('english')
    return " ".join([stemmer_object.stem(w) for w in text])

In [108]:
#29 Testing the function
text = "The cats are running"
stemmed_text = stemmer(text.split())
print(stemmed_text)

the cat are run


In [109]:
#30 Measure the time of execution
%%time
df.review = df.review.apply(stemmer)

CPU times: user 1min 30s, sys: 245 ms, total: 1min 30s
Wall time: 1min 31s


In [110]:
#31 View the first review
df.review[0]

'one review mention watch 1 oz episod hook right exact happen first thing struck oz brutal unflinch scene violenc set right word go trust show faint heart timid show pull punch regard drug sex violenc hardcor classic use word call oz nicknam given oswald maximum secur state penitentari focus main emerald citi experiment section prison cell glass front face inward privaci high agenda em citi home mani aryan muslim gangsta latino christian italian irish scuffl death stare dodgi deal shadi agreement never far away would say main appeal show due fact goe show dare forget pretti pictur paint mainstream audienc forget charm forget romanc oz mess around first episod ever saw struck nasti surreal say readi watch develop tast oz got accustom high level graphic violenc violenc injustic crook guard sold nickel inmat kill order get away well manner middl class inmat turn prison bitch due lack street skill prison experi watch oz may becom comfort uncomfort view that get touch darker side'

## Step 5 - Data Preprocessing

In [111]:
#32 Increase the max_colwidth value to avoid truncation
pd.set_option('display.max_colwidth', 120)

In [112]:
#33 Load the original dataset
raw_data = pd.read_csv('dataset.csv')

In [113]:
#34 Sample of the raw data
raw_data.head(10)

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as ...",positive
1,A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and g...,positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater...",positive
3,Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fightin...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portr...",positive
5,"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's no...",positive
6,I sure would like to see a resurrection of a up dated Seahunt series with the tech they have today it would bring ba...,positive
7,"This show was an amazing, fresh & innovative idea in the 70's when it first aired. The first 7 or 8 years were brill...",negative
8,Encouraged by the positive comments about this film on here I was looking forward to watching this film. Bad mistake...,negative
9,If you like original gut wrenching laughter you will like this movie. If you are young or old then you will love thi...,positive


In [114]:
#35 Sample of the cleaned data
df.head(10)

Unnamed: 0,review,sentiment
0,one review mention watch 1 oz episod hook right exact happen first thing struck oz brutal unflinch scene violenc set...,1
1,wonder littl product film techniqu unassum old time bbc fashion give comfort sometim discomfort sens realism entir p...,1
2,thought wonder way spend time hot summer weekend sit air condit theater watch light heart comedi plot simplist dialo...,1
3,basic famili littl boy jake think zombi closet parent fight time movi slower soap opera sudden jake decid becom ramb...,0
4,petter mattei love time money visual stun film watch mr mattei offer us vivid portrait human relat movi seem tell us...,1
5,probabl time favorit movi stori selfless sacrific dedic nobl caus preachi bore never get old despit seen 15 time las...,1
6,sure would like see resurrect date seahunt seri tech today would bring back kid excit grew black white tv seahunt gu...,1
7,show amaz fresh innov idea 70 first air first 7 8 year brilliant thing drop 1990 show realli funni anymor continu de...,0
8,encourag posit comment film look forward watch film bad mistak seen 950 film truli one worst aw almost everi way edi...,0
9,like origin gut wrench laughter like movi young old love movi hell even mom like great camp,1


In [115]:
#36 We can delete the dataframe to free memory
del raw_data

In [116]:
#37 Extract the review text (input)
x = np.array(df.iloc[:, 0].values)

In [117]:
#38 Extract the sentiment (output)
y = np.array(df.sentiment.values)

In [118]:
#39 Split the data into training and testing sets with an 80/20 ratio
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

In [119]:
#40 Check the type of x_treino
type(x_train)

numpy.ndarray

In [120]:
#41 Create a vectorizer (it will convert the text data into numerical representation)
vectorizer = CountVectorizer(max_features=1000)

In [121]:
#42 Fit and transform the vectorizer with training data
x_train_final = vectorizer.fit_transform(x_train).toarray()

In [122]:
#43 Only transform the test data
x_test_final = vectorizer.transform(x_test).toarray()

In [123]:
#44 Print the shape of x_train_final and y_train
print("x_train_final:", x_train_final.shape)
print("y_train:", y_train.shape)

x_train_final: (40000, 1000)
y_train: (40000,)


In [124]:
#45 Print x_train_final
print(x_train_final)

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [1 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [125]:
#46 Print the shape of x_test_final and y_test
print("x_test_final:", x_test_final.shape)
print("y_test:", y_test.shape)

x_test_final: (10000, 1000)
y_test: (10000,)


## **Step 6 - Creation of Machine Learning Models**

### **Model 1 - `GaussianNB`**

`Gaussian Naive Bayes (GaussianNB)` is a probabilistic model based on Bayes' theorem, with the assumption that the predictor variables follow a `Gaussian` (normal) distribution. It is commonly used in classification problems where the features are continuous and are assumed to follow a normal distribution. This model is particularly useful for classifying data with features that vary continuously and fit easily to the `Gaussian` curve, making it suitable for many `Data Science` and `Machine Learning` scenarios, especially in classification tasks.

In [126]:
#47 Create the model
model_v1 = GaussianNB()

In [127]:
#48 Train the model
model_v1.fit(x_train_final, y_train)

### Model 2 - `MultinomialNB`

`Multinomial Naive Bayes (MultinomialNB)` is another model based on Bayes' theorem, but it is particularly suited for count data or features that represent counts or frequencies of events. This model is commonly used in text classification tasks, where the features can be, for example, word frequencies or term counts in documents. It handles well the fact that data is represented as vectors of counts or frequencies, making it highly effective for spam filtering, sentiment analysis, and document categorization.

In [128]:
#49 Create the model
model_v2 = MultinomialNB(alpha=1.0, fit_prior=True)

In [129]:
#51 Train the model
model_v2.fit(x_train_final, y_train)

### Model 3 - `BernoulliNB`

`Bernoulli Naive Bayes (BernoulliNB)` is a probabilistic model that also uses Bayes' theorem, but it is optimized for binary/boolean data. This model assumes that all features are independent and follow a Bernoulli distribution, meaning that each feature is represented by a binary random variable that can take only two possible outcomes (e.g., 0 or 1, true or false). It is particularly useful in situations where the features are explicitly binary, such as in text classification tasks, where only the presence or absence of a word in the document is considered, ignoring its frequency.

In [130]:
#52 Create the model
model_v3 = BernoulliNB(alpha=1.0, fit_prior=True)

In [131]:
#53 Train the model
model_v3.fit(x_train_final, y_train)

<!-- Projeto Desenvolvido na Data Science Academy - www.datascienceacademy.com.br -->

## **Step 7 - Evaluation, Interpretation, and Comparison of the Models**

In [132]:
#54 Predictions with test data
ypred_v1 = model_v1.predict(x_test_final)

In [133]:
#55 Predictions with test data
ypred_v2 = model_v2.predict(x_test_final)

In [134]:
#56 Predictions with test data
ypred_v3 = model_v3.predict(x_test_final)

In [135]:
#57 Print the accuracy of each model
print("Accuracy of GaussianNB Model = ", accuracy_score(y_test, ypred_v1) * 100)
print("Accuracy of MultinomialNB Model = ", accuracy_score(y_test, ypred_v2) * 100)
print("Accuracy of BernoulliNB Model = ", accuracy_score(y_test, ypred_v3) * 100)

Accuracy of GaussianNB Model =  79.06
Accuracy of MultinomialNB Model =  82.57
Accuracy of BernoulliNB Model =  83.02000000000001


Accuracy is a global metric ideal for comparing versions of the model from the same algorithm. For models with different algorithms, the AUC (Area Under The Curve) metric is ideal.

In [136]:
#58 Import
from sklearn.metrics import roc_auc_score

In [137]:
#59 AUC of GaussianNB
y_proba = model_v1.predict_proba(x_test_final)[:, 1]
auc = roc_auc_score(y_test, y_proba)
print("AUC of GaussianNB Model =", auc)

AUC of GaussianNB Model = 0.861081232980416


In [138]:
#60 AUC of MultinomialNB
y_proba = model_v2.predict_proba(x_test_final)[:, 1]
auc = roc_auc_score(y_test, y_proba)
print("AUC of MultinomialNB Model =", auc)

AUC of MultinomialNB Model = 0.8993217067636314


In [139]:
#61 AUC of BernoulliNB
y_proba = model_v3.predict_proba(x_test_final)[:, 1]
auc = roc_auc_score(y_test, y_proba)
print("AUC of BernoulliNB Model =", auc)

AUC of BernoulliNB Model = 0.9083430688103717


In [140]:
#62 Save the best model to disk
with open('model_v3.pkl', 'wb') as file:
    pickle.dump(model_v3, file)

## **Step 8 - Deploy and Use of the Model**

In [141]:
#63 Load the model from disk
with open('model_v3.pkl', 'rb') as file:
    final_model = pickle.load(file)

In [142]:
#64 User review text (this text has a positive sentiment)
review_text = """This is probably the fastest-paced and most action-packed of the German Edgar Wallace "krimi"
series, a cross between the Dr. Mabuse films of yore and 60's pop thrillers like Batman and the Man
from UNCLE. It reintroduces the outrageous villain from an earlier film who dons a stylish monk's habit and
breaks the necks of victims with the curl of a deadly whip. Set at a posh girls' school filled with lecherous
middle-aged professors, and with the cops fondling their hot-to-trot secretaries at every opportunity, it
certainly is a throwback to those wonderfully politically-incorrect times. There's a definite link to a later
Wallace-based film, the excellent giallo "Whatever Happened to Solange?", which also concerns female students
being corrupted by (and corrupting?) their elders. Quite appropriate to the monk theme, the master-mind villain
uses booby-trapped bibles here to deal some of the death blows, and also maintains a reptile-replete dungeon
to amuse his captive audiences. <br /><br />Alfred Vohrer was always the most playful and visually flamboyant
of the series directors, and here the lurid colour cinematography is the real star of the show. The Monk appears
in a raving scarlet cowl and robe, tastefully setting off the lustrous white whip, while appearing against
purplish-night backgrounds. There's also a voyeur-friendly turquoise swimming pool which looks great both
as a glowing milieu for the nubile students and as a shadowy backdrop for one of the murder scenes.
The trademark "kicker" of hiding the "Ende" card somewhere in the set of the last scene is also quite
memorable here. And there's a fine brassy and twangy score for retro-music fans.<br /><br />Fans of the series
will definitely miss the flippant Eddie Arent character in these later films. Instead, the chief inspector
Sir John takes on the role of buffoon, convinced that he has mastered criminal psychology after taking a few
night courses. Unfortunately, Klaus Kinski had also gone on to bigger and better things. The krimis had
lost some of their offbeat subversive charm by this point, and now worked on a much more blatant pop-culture
level, which will make this one quite accessible to uninitiated viewers."""

In [143]:
#65 Data transformation flow
task1 = clean_data(review_text)
task2 = clean_special_characters(task1)
task3 = convert_to_lowercase(task2)
task4 = remove_stopwords(task3)
task5 = stemmer(task4)

In [144]:
#66 Print the result
print(task5)

probabl fastest pace action pack german edgar wallac krimi seri cross dr mabus film yore 60 pop thriller like batman man uncl reintroduc outrag villain earlier film don stylish monk habit break neck victim curl dead whip set posh girl school fill lecher middl age professor cop fondl hot trot secretari everi opportun certain throwback wonder polit incorrect time definit link later wallac base film excel giallo whatev happen solang also concern femal student corrupt corrupt elder quit appropri monk theme master mind villain use boobi trap bibl deal death blow also maintain reptil replet dungeon amus captiv audienc alfr vohrer alway play visual flamboy seri director lurid colour cinematographi real star show monk appear rave scarlet cowl robe tast set lustrous white whip appear purplish night background also voyeur friend turquois swim pool look great glow milieu nubil student shadowi backdrop one murder scene trademark kicker hide end card somewher set last scene also quit memor fine bra

In [145]:
#67 Check the type of task5
type(task5)

str

In [146]:
#68 Converting the string to a Numpy array (as this is how the model was trained)
task5_array = np.array(task5)

In [147]:
#69 Check the type of task5_array
type(task5_array)

numpy.ndarray

In [148]:
#70 Apply the vectorizer with another conversion to NumPy array to adjust the shape from 0-d to 1-d
final_review = vectorizer.transform(np.array([task5_array])).toarray()

In [149]:
#71 Check the type of aval_final
type(final_review)

numpy.ndarray

In [151]:
#72 Prediction with the model
prediction = final_model.predict(final_review.reshape(1, 1000))

In [152]:
#73 Print the prediction
print(prediction)

[1]


In [153]:
#74 Conditional structure to check the value of prediction
if prediction == 1:
    print("The Text Indicates Positive Sentiment!")
else:
    print("The Text Indicates Negative Sentiment!")

The Text Indicates Positive Sentiment!


## **Step 9 - Communicating Insights with Data Storytelling**

