**1. Feature Engineering**

**1.1 Import the required libraries.**

In [13]:
'''Necessary libraries.'''
import pandas as pd                 # Pandas for tabular data manipulation.
import json                         # Module for working with JSON.
import ast                          # Module for evaluating Python literal expressions.
import re                           # Module for working with regular expressions.
from textblob import TextBlob       # I import TextBlob from the textblob library.
import nltk                         # Natural Language Toolkit.
import csv                          # I import the CSV module into Python.

'''Enable auto-reload of modules before executing a cell'''
%load_ext autoreload
%autoreload 2

'''Import the warning module and set it to ignore all warnings'''
import warnings
warnings.filterwarnings("ignore")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [14]:
'''We load our DataFrame to be able to work'''

with open (r'C:\\Users\\migue\\Optimizing Recommender Systems with an Advanced MLOps Pipeline\\clean_dataset\\Australian_user_reviews_clean1.csv', 'r', encoding='utf-8') as file:
    csv_file = csv.DictReader(file)
    df_reviews = pd.DataFrame(csv_file)
    
    
    
df_reviews.sample(5)


Unnamed: 0,user_id,user_url,reviews_item_id,reviews_helpful,reviews_recommend,reviews_review,reviews_date
6064,76561198068809841,http://steamcommunity.com/profiles/76561198068...,440,0 of 3 people (0%) found this review helpful,False,random crits will constantly make you quit the...,Invalid format
17667,Zejus,http://steamcommunity.com/id/Zejus,730,2 of 2 people (100%) found this review helpful,True,CS:GO <3,13-12-29
1730,76561198076271594,http://steamcommunity.com/profiles/76561198076...,4780,No ratings yet,True,Medieval II: Total War™ Kingdoms is a turn bas...,14-01-12
44818,76561198067546287,http://steamcommunity.com/profiles/76561198067...,218230,No ratings yet,True,one word. epic,13-01-10
45554,76561198066026014,http://steamcommunity.com/profiles/76561198066...,105600,No ratings yet,True,"This game is love, and this game is lifeIf thi...",15-06-21


**1.2 Auxiliary Function.**

**Sentiment Analysis Function.**  

This feature provides a basic way to categorize the sentiment of a text into positive, negative, or neutral based on the polarity calculated by TextBlob

In [15]:
def sentiment_analysis(review):                     # Check if the revision is None.
    if review is None:                              # If yes, it returns 1, which could be interpreted as a neutral sentiment.
        return 1
    
    analysis = TextBlob(review)                     # Creates an instanceof the TextBlob class with the provided revision.
    polarity = analysis.sentiment.polarity          # Gets the sentiment polarity from the TextBlob analysis.
    
    if polarity < -0.2:                             # Compare polarity with thresholds to determine overall sentiment.
        return 0                                    # If the polarity is less than -0.2, it is considered negative sentiment and 0 is returned.
    
    elif polarity > 0.2:                            # If the polarity is greater than 0.2, it is considered a positive sentiment and 2 is returned.
        return 2
    
    else:                                           # In other cases, 1 is returned, which could be interpreted as a neutral sentiment.
        return 1
    
     

**Analysis of Sentiment Review Examples Function.**

The 'example_review_by_sentiment' function is used to analize and present review examples classified according to their sentiments.  
The function takes two lists as parameters: reviews, which contains the reviews, and sentiments, which contains the sentiment values ​​associated with each review.  

0 Negative  
1 Neutral  
2 Positive  

At the same time, it shows examples of reviews corresponding to each category.  
For each category, the category number is printed and reviews that have that sentiment value are filtered out.  
Then, the first three examples of reviews from that category are presented. 



In [17]:
def example_review_by_sentiment(reviews, sentiment):
    for sentiment_value in range(3):
        print(f'For the sentiment analysis category {sentiment_value} there are these examples of reviews: ')
        sentiment_reviews = [review for review, sentiment in zip(reviews, sentiment) if sentiment == sentiment_value]
        
        for i, review in enumerate(sentiment_reviews[:3], start=1):
            print(f'Review {i}: {review}')
            
        print('\n')

**1.3 Feature Engineering Australian_user_reviews_clean1.csv**

In [16]:
df_reviews['sentiment_analysis'] = df_reviews['reviews_review'].apply(sentiment_analysis)
df_reviews.head()

Unnamed: 0,user_id,user_url,reviews_item_id,reviews_helpful,reviews_recommend,reviews_review,reviews_date,sentiment_analysis
0,76561197970982479,http://steamcommunity.com/profiles/76561197970...,1250,No ratings yet,True,Simple yet with great replayability. In my opi...,11-11-05,1
1,js41637,http://steamcommunity.com/id/js41637,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...,14-06-24,1
2,evcentric,http://steamcommunity.com/id/evcentric,248820,No ratings yet,True,A suitably punishing roguelike platformer. Wi...,Invalid format,2
3,doctr,http://steamcommunity.com/id/doctr,250320,2 of 2 people (100%) found this review helpful,True,This game... is so fun. The fight sequences ha...,13-10-14,2
4,maplemage,http://steamcommunity.com/id/maplemage,211420,35 of 43 people (81%) found this review helpful,True,Git gud,14-04-15,1


Let's look at some examples.  

In [19]:
example_review_by_sentiment(df_reviews['reviews_review'], df_reviews['sentiment_analysis'])

For the sentiment analysis category 0 there are these examples of reviews: 
Review 1: This game is Marvellous.
Review 2: Killed the Emperor, nobody cared and got away with it. Accidentally killed a chicken and everybody decided to gang up on me. 10/10
Review 3: This Game Doesn't Work


For the sentiment analysis category 1 there are these examples of reviews: 
Review 1: Simple yet with great replayability. In my opinion does "zombie" hordes and team work better than left 4 dead plus has a global leveling system. Alot of down to earth "zombie" splattering fun for the whole family. Amazed this sort of FPS is so rare.
Review 2: I know what you think when you see this title "Barbie Dreamhouse Party" but do not be intimidated by it's title, this is easily one of my GOTYs. You don't get any of that cliche game mechanics that all the latest games have, this is simply good core gameplay. Yes, you can't 360 noscope your friends, but what you can do is show them up with your bad ♥♥♥ dance moves 

We delete the 'reviews_review' column.

In [20]:
df_reviews = df_reviews.drop('reviews_review', axis=1)
df_reviews.sample(5)

Unnamed: 0,user_id,user_url,reviews_item_id,reviews_helpful,reviews_recommend,reviews_date,sentiment_analysis
54279,Nozomikat,http://steamcommunity.com/id/Nozomikat,437220,7 of 13 people (54%) found this review helpful,False,Invalid format,1
1318,kk_kyser,http://steamcommunity.com/id/kk_kyser,4000,1 of 1 people (100%) found this review helpful,True,13-12-15,1
14947,PogiGwaps,http://steamcommunity.com/id/PogiGwaps,383870,1 of 2 people (50%) found this review helpful,True,Invalid format,1
3480,76561198061650250,http://steamcommunity.com/profiles/76561198061...,233610,1 of 1 people (100%) found this review helpful,True,15-11-02,2
50812,76561198074519984,http://steamcommunity.com/profiles/76561198074...,244850,0 of 2 people (0%) found this review helpful,True,14-07-02,1


We save the file.

In [21]:
'''Especify the directory path and the CSV file name.'''

directory = 'C:\\Users\\migue\\Optimizing Recommender Systems with an Advanced MLOps Pipeline\\clean_dataset'
file_name = 'Australian_user_reviews_clean1.csv'
full_path = f'{directory}/{file_name}'


df_reviews.to_csv(full_path, index=False)

print(f'The file was successfully saved')

The file was successfully saved
