# Sentiment Analysis with Yelp Reviews

In this notebook, we will perform sentiment analysis on Yelp reviews. We will use a transformer-based model to determine whether a review is positive or negative. This notebook will walk through the steps of loading data, preprocessing text, and applying the sentiment analysis model.

## Using the pipeline function

In [None]:
# Import necessary libraries
import pandas as pd
from gensim.utils import simple_preprocess
from transformers import pipeline

# Load Yelp reviews dataset from Dropbox
url = 'https://www.dropbox.com/s/xds4lua69b7okw8/yelp.csv?dl=1'
data = pd.read_csv(url)
# We filter the dataset to include only reviews with 1 (negative) or 5 (positive) stars
data = data[data['stars'].isin([1, 5])]

# Preprocess the text data by tokenizing it into words and lowercasing
data['text'] = data['text'].apply(lambda x: ' '.join(simple_preprocess(x)))
# Create labels for sentiment: 1 for positive reviews, 0 for negative
data['labels'] = data['stars'].apply(lambda x: 1 if x == 5 else 0)


### Initializing the Sentiment Analysis Pipeline

We will use the `transformers` library to load a pre-trained sentiment analysis pipeline. This will allow us to classify the sentiment of each review as either positive or negative.

In [None]:
# Initialize the sentiment analysis pipeline
sentiment_analysis = None


### Applying Sentiment Analysis

Now, we will apply the sentiment analysis pipeline to a subset of reviews to see the results. We will process each review, analyze its sentiment, and display the results.

In [None]:
# Select a sample of reviews for sentiment analysis
examples = data['text'].sample(5).tolist()
# Analyze sentiment of each example
results = None
# Display the results along with the original review text
for review, result in zip(examples, results):
    print(f'Review: {review}\nSentiment: {result}\n')


In [None]:
# Save the examples and results for further use
example_results = list(zip(examples, results))

## Creating an LLM enhanced app

Now that we know how to use an LLM on a task with the pipeline function, let's leverage to make an app

In [None]:
%%writefile sentiment_analysis.py
# Setting up a Flask RESTful API
from flask import Flask, request, jsonify
from functools import lru_cache
from transformers import pipeline

app = Flask(__name__)

# Initialize sentiment analysis pipeline
sentiment_analysis = None

@app.route('/sentiment', methods=['POST'])
def get_sentiment():
    data = request.get_json()
    text = data.get('text')
    result = None # Get analysis on text
    return jsonify(result)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Overwriting sentiment_analysis.py


In [None]:
!nohup python sentiment_analysis.py &

nohup: appending output to 'nohup.out'


In [None]:
!sudo lsof -i -P -n | grep LISTEN

node        7 root   21u  IPv6  19583      0t0  TCP *:8080 (LISTEN)
kernel_ma  21 root    3u  IPv4  17245      0t0  TCP 172.28.0.12:6000 (LISTEN)
colab-fil  60 root    3u  IPv4  18783      0t0  TCP 127.0.0.1:3453 (LISTEN)
jupyter-n  79 root    7u  IPv4  19043      0t0  TCP 172.28.0.12:9000 (LISTEN)
pt_main_t 136 root   21u  IPv4  20736      0t0  TCP 127.0.0.1:46225 (LISTEN)
python3   170 root    3u  IPv4  20942      0t0  TCP 127.0.0.1:36761 (LISTEN)
python3   170 root    4u  IPv4  20943      0t0  TCP 127.0.0.1:34715 (LISTEN)


In [None]:
from requests import post
from socket import gethostname, gethostbyname
ip = gethostbyname(gethostname()) # 172.28.0.12
response = post(f"http://{ip}:5000/sentiment", json={'text':'This is the best day of my life'}).json()

In [None]:
response

[{'label': 'POSITIVE', 'score': 0.9998548030853271}]

### Exercise: Flask App with Caching for Sentiment Analysis

Now are creating a Flask application that performs sentiment analysis on text data. This app also includes a caching layer to store results for repeated inputs, which improves efficiency by avoiding re-analysis of the same text within a specified timeout period.