# Sentiment Analysis of Amazon Customer Reviews

The aim of this project is to develop a machine learning pipeline for sentiment analysis on Amazon customer reviews using natural language processing techniques. By analyzing review texts, the project seeks to automatically classify the sentiment expressed in each review as positive or negative, providing valuable insights into overall customer satisfaction and opinions.

## Step 1: Data Loading and Initial Exploration

Load the raw Amazon customer reviews dataset and perform a basic inspection to understand its structure and contents. This step establishes a foundation for further data cleaning and analysis.


In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/test.ft.txt', sep='\t', header=None, names=['review'])

# Display first five rows and dataset shape
print(df.head())
print(df.shape)

## Step 2: Label Extraction and Data Cleaning

Extract sentiment labels from the review text and clean reviews by removing label tags and extra spaces to prepare for analysis.

In [None]:
# Extract sentiment labels
df['label'] = df['review'].apply(lambda x: 1 if '__label__1' in x else 2)

# Remove label tags from review text
df['clean_review'] = df['review'].apply(lambda x: x.replace('__label__1', '').replace('__label__2', '').strip())

# Display cleaned data sample
print(df[['label', 'clean_review']].head())

## Step 3: Text Preprocessing

Apply text preprocessing techniques including lowercasing, removal of punctuation, and tokenization to prepare the review text for feature extraction and modeling.


In [None]:
!pip install nltk

In [None]:
import nltk
nltk.download('punkt')

In [None]:
import re

In [None]:
import nltk
nltk.download('punkt_tab')  # Download the new resource instead of 'punkt'

from nltk.tokenize import word_tokenize

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    tokens = word_tokenize(text)  # This now uses punkt_tab internally
    return ' '.join(tokens)

df['processed_review'] = df['clean_review'].apply(preprocess_text)

print(df[['processed_review']].head())

## Step 4: Feature Extraction using TF-IDF Vectorization

Convert the preprocessed text reviews into numerical features using TF-IDF vectorization to represent the importance of words in each review for model training.


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TF-IDF Vectorizer with common parameters
tfidf_vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')

# Fit and transform the processed reviews to create feature vectors
X = tfidf_vectorizer.fit_transform(df['processed_review'])

# Display the shape of the feature matrix
print(f"Feature matrix shape: {X.shape}")

## Step 5: Model Training and Evaluation

Split the feature matrix and labels into training and testing datasets. Train a machine learning classifier on the training set and evaluate its performance on the test set.


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, df['label'], test_size=0.2, random_state=42
)

# Initialize and train Logistic Regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Test Accuracy: {accuracy:.4f}")
print("Classification Report:")
print(report)

## Step 6: Model Saving and Loading

Save the trained sentiment analysis model and TF-IDF vectorizer to disk for future use, and demonstrate how to load them back for inference.


In [None]:
import joblib

# Save the trained model and vectorizer
joblib.dump(model, 'sentiment_model.pkl')
joblib.dump(tfidf_vectorizer, 'tfidf_vectorizer.pkl')

# Load the model and vectorizer (example)
loaded_model = joblib.load('sentiment_model.pkl')
loaded_vectorizer = joblib.load('tfidf_vectorizer.pkl')

# Example inference on new text
sample_text = "This product exceeded expectations!"

# Preprocess sample text (use the same preprocessing function defined earlier)
processed_text = preprocess_text(sample_text)

# Vectorize the processed sample text
sample_features = loaded_vectorizer.transform([processed_text])

# Predict sentiment label
prediction = loaded_model.predict(sample_features)

print(f"Predicted Sentiment Label: {prediction[0]}")

The model predicted the sentiment label **1** for the sample review "This product exceeded expectations!".

Typically, in this sentiment analysis setup:
- **Label 1** indicates **negative** sentiment.
- **Label 2** indicates **positive** sentiment.

This result suggests the model classified the sample review as negative. It is advisable to verify the label-to-sentiment mapping to ensure consistency.

## Step 7: Model Deployment and Inference Function Creation

Create a reusable inference function that loads the saved model and vectorizer, preprocesses new input text, and returns the predicted sentiment label for easy deployment integration.


In [None]:
import joblib

# Load saved model and vectorizer
loaded_model = joblib.load('sentiment_model.pkl')
loaded_vectorizer = joblib.load('tfidf_vectorizer.pkl')

def predict_sentiment(text):
    # Preprocess the input text (use the same preprocessing function defined earlier)
    processed_text = preprocess_text(text)
    # Vectorize the processed text
    features = loaded_vectorizer.transform([processed_text])
    # Predict and return the sentiment label
    prediction = loaded_model.predict(features)
    return prediction[0]

# Example usage
sample_input = "The product quality was outstanding and delivery was quick."
predicted_label = predict_sentiment(sample_input)
print(f"Predicted Sentiment Label: {predicted_label}")

The inference function created in Step 7 predicts sentiment labels for new texts. The output "Predicted Sentiment Label: 2" indicates the model identified the input as positive sentiment.  

In [None]:
def load_labeled_texts(filepath):
    texts = []
    labels = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            # Example line: "__label__pos Some positive review text here."
            line = line.strip()
            if not line:
                continue

            # Split label and text
            # Assuming FastText style: label starts with __label__ followed by space-separated text
            parts = line.split(' ', 1)
            if len(parts) != 2:
                continue  # skip malformed lines

            label_part, text_part = parts
            label = label_part.replace('__label__', '')
            texts.append(text_part)
            labels.append(label)
    return texts, labels

In [None]:
# Load dataset from the original file
raw_texts, labels = load_labeled_texts('/content/test.ft.txt')

# Now you can proceed with raw_texts and labels as needed
print(f"Loaded {len(raw_texts)} samples.")

In [None]:
def clean_original_dataset(input_path='/content/test.ft.txt', output_path='/content/new_data.txt'):
    import re

    label_pattern = re.compile(r'^__label__(\d+)\s+(.*)')  # extract label and text
    cleaned_lines = []

    with open(input_path, 'r', encoding='utf-8') as infile:
        for line_no, line in enumerate(infile, start=1):
            line = line.strip()
            if not line:
                continue
            match = label_pattern.match(line)
            if not match:
                print(f"Skipping line {line_no} with unexpected format.")
                continue

            label, text = match.groups()
            # Basic cleaning: lowercase and strip
            cleaned_text = text.lower().strip()

            cleaned_lines.append(f"{label}\t{cleaned_text}")

    with open(output_path, 'w', encoding='utf-8') as outfile:
        for cline in cleaned_lines:
            outfile.write(cline + '\n')

    print(f"Original data cleaned and saved to: {output_path}")
    print(f"Total cleaned lines: {len(cleaned_lines)}")

# Run the cleaning
clean_original_dataset()

In [None]:
def print_head(file_path, n=5):
    with open(file_path, 'r', encoding='utf-8') as f:
        for i, line in enumerate(f):
            if i >= n:
                break
            print(line.strip())

# Print the first 5 lines of the cleaned data
print_head('/content/new_data.txt', n=5)

In [None]:
import os

file_path = '/content/new_data.txt'

if os.path.exists(file_path):
    size = os.path.getsize(file_path)
    print(f"File '{file_path}' exists and is {size} bytes")
    if size > 0:
        # Print first few lines
        with open(file_path, 'r', encoding='utf-8') as f:
            for _ in range(5):
                print(f.readline().strip())
    else:
        print(f"File '{file_path}' is empty.")
else:
    print(f"File '{file_path}' does not exist.")


In [None]:
with open('/content/new_data.txt', 'r', encoding='utf-8') as f:
    for i, line in enumerate(f):
        print(line.strip())
        if i >= 9:  # print first 10 lines
            break

In [None]:
!ls -l /content/new_data.txt

In [None]:
!ls -l /content/new_data.txt
with open('/content/new_data.txt', 'r', encoding='utf-8') as f:
    for i, line in enumerate(f):
        print(line.strip())
        if i >= 9:
            break

## Step 8: Model Evaluation on New Data and Reporting

In this step, you evaluate the deployed model's performance on a new, unseen dataset (or a hold-out validation set) to verify its generalization ability. You also generate a detailed evaluation report.


In [None]:
def load_labeled_texts(filepath):
    texts = []
    labels = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if not line:
                continue

            # The first character is label (1 or 2), followed by text without a space
            label_char = line[0]
            text = line[1:].strip()

            if label_char not in ('1', '2') or not text:
                continue  # skip malformed lines

            labels.append(label_char)
            texts.append(text)
    if not texts:
        raise ValueError(f"No valid labeled data found in '{filepath}'")
    return texts, labels

In [None]:
raw_texts, labels = load_labeled_texts('/content/new_data.txt')
print(f"Loaded {len(raw_texts)} samples.")
print(f"Example label/text: {labels[0]} / {raw_texts[0][:100]} ...")

In [None]:
import joblib
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Paths to your files
dataset_path = '/content/new_data.txt'
model_path = 'sentiment_model.pkl'       # adjust if your model filename differs
vectorizer_path = 'tfidf_vectorizer.pkl' # adjust if your vectorizer filename differs

# Correct loader for your data format (label is first char, then text)
def load_labeled_texts(file_path):
    texts = []
    labels = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line_no, line in enumerate(f, start=1):
            line = line.strip()
            if not line:
                continue

            label_char = line[0]
            text = line[1:].strip()

            if label_char not in ('1', '2'):
                print(f"Warning: Invalid label '{label_char}' at line {line_no}. Skipping this line.")
                continue

            if not text:
                print(f"Warning: Empty text at line {line_no}. Skipping this line.")
                continue

            labels.append(int(label_char))
            texts.append(text)

    if not texts:
        raise ValueError(f"No valid labeled data found in '{file_path}'.")
    return texts, labels

# Load your data
new_raw_texts, new_labels = load_labeled_texts(dataset_path)
print(f"Loaded {len(new_raw_texts)} samples.")
print(f"Example label/text: {new_labels[0]} / {new_raw_texts[0][:100]} ...")

# Define your preprocessing function
def preprocess_text(text):
    return text.lower().strip()

# Preprocess all texts
new_X_processed = [preprocess_text(text) for text in new_raw_texts]

# Load saved vectorizer and model
loaded_vectorizer = joblib.load(vectorizer_path)
loaded_model = joblib.load(model_path)

# Transform texts into features using vectorizer
new_features = loaded_vectorizer.transform(new_X_processed)

# Predict labels on new data
new_y_pred = loaded_model.predict(new_features)

# Print classification report comparing true vs predicted labels
print("Classification Report on New Data:")
print(classification_report(new_labels, new_y_pred))

# Plot confusion matrix for deeper insight
cm = confusion_matrix(new_labels, new_y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Negative (1)', 'Positive (2)'],
            yticklabels=['Negative (1)', 'Positive (2)'])
plt.xlabel('Predicted Label')
plt.ylabel('Actual Label')
plt.title('Confusion Matrix on New Data')
plt.show()

The classification report shows perfect model performance with 100% precision, recall, and F1-score on the new data samples.  
All 3 test instances (1 negative, 2 positive) were correctly classified, indicating excellent accuracy on this small dataset.

## Step 9: Analyze Errors & Save Model Pipeline

- Review misclassified samples to understand model errors.  
- Save the combined vectorizer and model as a pipeline for easy reuse.


In [None]:
!ls -l /content/test.ft.txt /content/new_data.txt

In [None]:
def load_dataset(filepath):
    texts = []
    labels = []

    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            parts = line.strip().split(maxsplit=1)
            if len(parts) < 2:
                continue
            label_str, text = parts[0], parts[1]

            # If label starts with '__label__', trim and convert to int
            if label_str.startswith('__label__'):
                numeric_label_str = label_str.replace('__label__', '')
                try:
                    label = int(numeric_label_str)
                except ValueError:
                    label = label_str  # fallback to original if conversion fails
            else:
                # fallback if not prefixed label
                try:
                    label = int(label_str)
                except ValueError:
                    label = label_str

            labels.append(label)
            texts.append(text)

    return texts, labels

In [None]:
X_test, y_test = load_dataset('/content/test.ft.txt')
X_new, y_new = load_dataset('/content/new_data.txt')

print(f"Sample test label: {y_test[0]}")  # Should print 1 or 2, not '__label__1'
print(f"Sample test text: {X_test[0]}")

In [None]:
pip install pipeline

In [None]:
pip install joblib

In [None]:
import joblib

# Load the sentiment analysis model pipeline
pipeline = joblib.load('/content/sentiment_model.pkl')

# Load the TF-IDF vectorizer separately, if needed
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')

In [None]:
# List files in /content to check presence
!ls -l /content/

# Then load models
import joblib
pipeline = joblib.load('/content/sentiment_model.pkl')
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')

In [None]:
def load_dataset(filepath):
    texts = []
    labels = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            parts = line.strip().split(maxsplit=1)
            if len(parts) < 2:
                continue
            label_str, text = parts[0], parts[1]

            # Extract numeric label from strings like '__label__2'
            if label_str.startswith('__label__'):
                label_num_str = label_str.replace('__label__', '')
                try:
                    label = int(label_num_str)
                except ValueError:
                    label = label_str  # fallback to string if conversion fails
            else:
                # fallback to handle other unexpected label formats
                try:
                    label = int(label_str)
                except ValueError:
                    label = label_str

            labels.append(label)
            texts.append(text)
    return texts, labels

In [None]:
import joblib

# Load the sentiment analysis model pipeline
pipeline = joblib.load('/content/sentiment_model.pkl')

# Load TF-IDF vectorizer if needed (optional depending on your pipeline)
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')

In [None]:
X_test, y_test = load_dataset('/content/test.ft.txt')

print(f"Sample label: {y_test[0]}")  # Should now print 1 or 2 (an integer)
print(f"Sample text: {X_test[0]}")

In [None]:
import joblib

pipeline = joblib.load('/content/sentiment_model.pkl')

In [None]:
import joblib

# Load vectorizer and model separately
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')
sentiment_model = joblib.load('/content/sentiment_model.pkl')

# Transform text data to feature vectors (2D array)
X_test_tfidf = tfidf_vectorizer.transform(X_test)  # This returns a sparse matrix

# Predict with the model using transformed features
y_pred = sentiment_model.predict(X_test_tfidf)

In [None]:
import joblib

# Corrected load_dataset function to parse labels
def load_dataset(filepath):
    texts = []
    labels = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            parts = line.strip().split(maxsplit=1)
            if len(parts) < 2:
                continue
            label_str, text = parts[0], parts[1]

            if label_str.startswith('__label__'):
                label_num_str = label_str.replace('__label__', '')
                try:
                    label = int(label_num_str)
                except ValueError:
                    label = label_str
            else:
                try:
                    label = int(label_str)
                except ValueError:
                    label = label_str

            labels.append(label)
            texts.append(text)
    return texts, labels

# Load test dataset
X_test, y_test = load_dataset('/content/test.ft.txt')

# Load models
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')
sentiment_model = joblib.load('/content/sentiment_model.pkl')

# Transform raw test texts to TF-IDF feature vectors
X_test_tfidf = tfidf_vectorizer.transform(X_test)

# Predict labels
y_pred = sentiment_model.predict(X_test_tfidf)

# Error analysis
misclassified_indices = [i for i, (true, pred) in enumerate(zip(y_test, y_pred)) if true != pred]

print(f"Total samples: {len(X_test)}")
print(f"Number of misclassified samples: {len(misclassified_indices)}")

label_map = {1: "Negative", 2: "Positive"}

print("\nSome misclassified samples:")
for i in misclassified_indices[:5]:
    print(f"Sample index: {i}")
    print(f"Text: {X_test[i]}")
    print(f"True label: {label_map.get(y_test[i], y_test[i])}")
    print(f"Predicted label: {label_map.get(y_pred[i], y_pred[i])}")
    print("---")

In [None]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

The model achieved approximately 86% accuracy on a large 400,000-sample test set, demonstrating solid performance on a challenging sentiment classification task. Despite some misclassifications, the results indicate the model can reliably distinguish positive and negative sentiments in real-world reviews.

## Step 10: Model Deployment

In this step, you deploy your saved sentiment analysis pipeline as a simple web API to serve predictions. Deployment makes your model accessible for real-time inference by other applications or users. This example uses Flask, a lightweight Python web framework, to create an endpoint that accepts text input and returns predicted sentiment.

In [None]:
from flask import Flask

app = Flask(__name__)
# any other setup, such as route definitions, BEFORE calling app.run

In [None]:
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Add more setup: loading vectorizer, model, route definitions...
# Example:
tfidf_vectorizer = joblib.load('tfidf_vectorizer.pkl')
sentiment_model = joblib.load('sentiment_model.pkl')
label_map = {1: "Negative", 2: "Positive"}

@app.route('/predict', methods=['POST'])
def predict():
    # prediction code...
    pass

# Now *after* all that:
app.run(host='0.0.0.0', port=5001, use_reloader=False)

In [None]:
pip install pyngrok

In [None]:
from pyngrok import ngrok
ngrok.set_auth_token("30DzDLzHbU7gaimgih8PqfUrQYI_89BsgqqKyie4bMtU8jhvS")

In [None]:
from pyngrok import ngrok
ngrok.kill()  # Cleans up any old tunnels
public_url = ngrok.connect(5001)
print("Ngrok URL:", public_url)

In [None]:
from pyngrok import ngrok

ngrok.kill()

In [None]:
from pyngrok import ngrok
ngrok.kill()
public_url = ngrok.connect(5001)
print(public_url)

In [None]:
%%writefile app.py
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')
sentiment_model = joblib.load('/content/sentiment_model.pkl')
label_map = {1: "Negative", 2: "Positive"}

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    text = data.get('text', '')
    if not text:
        return jsonify({'error': 'No text provided'}), 400
    features = tfidf_vectorizer.transform([text])
    pred_label = sentiment_model.predict(features)[0]
    pred_sentiment = label_map.get(pred_label, str(pred_label))
    return jsonify({
        'text': text,
        'predicted_label': int(pred_label),
        'predicted_sentiment': pred_sentiment
    })

In [None]:
from pyngrok import ngrok
ngrok.set_auth_token("30DzDLzHbU7gaimgih8PqfUrQYI_89BsgqqKyie4bMtU8jhvS")  # if not set already
public_url = ngrok.connect(5000)
print(public_url)

In [None]:
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load your pre-trained TF-IDF vectorizer and sentiment model
# Replace these paths with the actual locations of your files
tfidf_vectorizer = joblib.load('/content/tfidf_vectorizer.pkl')
sentiment_model = joblib.load('/content/sentiment_model.pkl')

# Mapping from numeric label to sentiment text
label_map = {1: "Negative", 2: "Positive"}

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)

    text = data.get('text')
    if not text:
        return jsonify({'error': 'No text provided'}), 400

    # Vectorize input text
    features = tfidf_vectorizer.transform([text])

    # Predict sentiment
    pred_label = sentiment_model.predict(features)[0]
    pred_sentiment = label_map.get(pred_label, "Unknown")

    # Optionally get confidence if your model supports predict_proba
    confidence = None
    if hasattr(sentiment_model, 'predict_proba'):
        proba = sentiment_model.predict_proba(features)[0]
        confidence = max(proba)

    response = {
        'text': text,
        'predicted_label': int(pred_label),
        'predicted_sentiment': pred_sentiment,
    }
    if confidence is not None:
        response['confidence'] = round(confidence, 3)  # rounded to 3 decimals

    return jsonify(response)

if __name__ == '__main__':
    # Run the Flask app on port 5000
    app.run(host='0.0.0.0', port=5001, use_reloader=False)


In [None]:
!pip install transformers -q

from transformers import pipeline

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# Example texts to predict
texts = [
    "I love this product! It works great.",
    "This is the worst experience I have had.",
    "It's okay, not great."
]

# Get predictions
predictions = sentiment_pipeline(texts)

for text, pred in zip(texts, predictions):
    print(f"Text: {text}")
    print(f"Sentiment: {pred['label']}, Confidence: {pred['score']:.3f}")
    print("-" * 40)

In [None]:
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

class TextRequest(BaseModel):
    text: str

@app.post("/predict")
def predict(request: TextRequest):
    result = classifier(request.text)
    return {"label": result[0]['label'], "score": result[0]['score']}

### Step11: Exposing Your FastAPI App to the Web with ngrok

This step sets up ngrok in your Google Colab environment to create a public URL for your FastAPI server.  
It allows anyone to access your sentiment analysis API from anywhere, making it easy to share your demo for testing or showcasing.


In [None]:
!pip install fastapi uvicorn pyngrok transformers

In [None]:
app = FastAPI()
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

class TextRequest(BaseModel):
    text: str

@app.post("/predict")
def predict(request: TextRequest):
    result = classifier(request.text)
    return {"label": result[0]['label'], "score": result[0]['score']}

@app.get("/")
def root():
    return {"message": "Welcome to the sentiment analysis API. Use /predict with POST JSON."}
# Expose your local server to the web
public_url = ngrok.connect(8000)
print('Public URL:', public_url)

# Run the app (non-blocking, for Colab)
uvicorn.run(app, host='0.0.0.0', port=8000)

# Expose your local server to the web
public_url = ngrok.connect(8000)
print('Public URL:', public_url)

# Run the app (non-blocking, for Colab)
uvicorn.run(app, host='0.0.0.0', port=8000)

### FastAPI App Running with ngrok Public URL

The FastAPI server is successfully running and exposed to the internet using ngrok.  
Visiting the base URL returns a friendly JSON message confirming the API is live:  
`{"message":"Welcome to the sentiment analysis API. Use /predict with POST JSON."}`

This indicates your backend is reachable, and you can now send POST requests to the `/predict` endpoint to get sentiment analysis results.  
The ngrok tunnel provides a temporary public URL to share and demo your API easily.

### Step 12: Simple Front-End Webpage for Sentiment Analysis API

This step creates a basic HTML and JavaScript webpage that lets users input text and sends it to your FastAPI `/predict` endpoint. It displays the sentiment label and confidence score directly on the page for easy interaction.

In [None]:
!pip install fastapi uvicorn nest-asyncio pyngrok

In [None]:
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from transformers import pipeline
import nest_asyncio
from pyngrok import ngrok
import uvicorn
import threading

# Enable nested event loops for Colab async compatibility
nest_asyncio.apply()

app = FastAPI()

# Load your sentiment analysis pipeline
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

class TextRequest(BaseModel):
    text: str

# API endpoint for sentiment prediction
@app.post("/predict")
def predict(request: TextRequest):
    result = classifier(request.text)
    return {"label": result[0]['label'], "score": result[0]['score']}

# Serve your frontend HTML page
html_content = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>Sentiment Analysis Demo</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 40px; max-width: 600px; }
        textarea { width: 100%; height: 100px; font-size: 1rem; }
        button { margin-top: 10px; padding: 8px 16px; font-size: 1rem; }
        #result { margin-top: 20px; font-weight: bold; }
    </style>
</head>
<body>
    <h2>Sentiment Analysis</h2>
    <textarea id="inputText" placeholder="Type your text here..."></textarea><br />
    <button onclick="getSentiment()">Analyze Sentiment</button>
    <div id="result"></div>
<script>
async function getSentiment() {
    const text = document.getElementById('inputText').value.trim();
    if (!text) {
        alert('Please enter some text!');
        return;
    }
    const resultDiv = document.getElementById('result');
    resultDiv.textContent = "Analyzing...";
    try {
        const response = await fetch('/predict', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ text })
        });
        if (!response.ok) {
            throw new Error('Error from API: ' + response.status);
        }
        const data = await response.json();
        resultDiv.textContent = `Sentiment: ${data.label} (Confidence: ${(data.score * 100).toFixed(2)}%)`;
    } catch (error) {
        resultDiv.textContent = 'Error: ' + error.message;
    }
}
</script>
</body>
</html>
"""

@app.get("/", response_class=HTMLResponse)
def root():
    return html_content

# Start ngrok tunnel
public_url = ngrok.connect(8000)
print(f"Public URL: {public_url}")

# Run Uvicorn in a separate thread so Colab does not block
def run():
    uvicorn.run(app, host="0.0.0.0", port=8000)

thread = threading.Thread(target=run)
thread.start()

In [None]:
code = """
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from transformers import pipeline
import nest_asyncio
from pyngrok import ngrok
import uvicorn
import threading

# Enable nested event loops for Colab async compatibility
nest_asyncio.apply()

app = FastAPI()

# Load your sentiment analysis pipeline
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

class TextRequest(BaseModel):
    text: str

# API endpoint for sentiment prediction
@app.post("/predict")
def predict(request: TextRequest):
    result = classifier(request.text)
    return {"label": result[0]['label'], "score": result[0]['score']}

# Serve your frontend HTML page
html_content = \"\"\"
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>Sentiment Analysis Demo</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 40px; max-width: 600px; }
        textarea { width: 100%; height: 100px; font-size: 1rem; }
        button { margin-top: 10px; padding: 8px 16px; font-size: 1rem; }
        #result { margin-top: 20px; font-weight: bold; }
    </style>
</head>
<body>
    <h2>Sentiment Analysis</h2>
    <textarea id="inputText" placeholder="Type your text here..."></textarea><br />
    <button onclick="getSentiment()">Analyze Sentiment</button>
    <div id="result"></div>
<script>
async function getSentiment() {
    const text = document.getElementById('inputText').value.trim();
    if (!text) {
        alert('Please enter some text!');
        return;
    }
    const resultDiv = document.getElementById('result');
    resultDiv.textContent = "Analyzing...";
    try {
        const response = await fetch('/predict', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ text })
        });
        if (!response.ok) {
            throw new Error('Error from API: ' + response.status);
        }
        const data = await response.json();
        resultDiv.textContent = `Sentiment: ${data.label} (Confidence: ${(data.score * 100).toFixed(2)}%)`;
    } catch (error) {
        resultDiv.textContent = 'Error: ' + error.message;
    }
}
</script>
</body>
</html>
\"\"\"

@app.get("/", response_class=HTMLResponse)
def root():
    return html_content

# Start ngrok tunnel to expose port 8000
public_url = ngrok.connect(8000)
print(f"Public URL: {public_url}")

# Run Uvicorn server in a separate thread so it doesn't block Colab
def run():
    uvicorn.run(app, host="0.0.0.0", port=8000)

thread = threading.Thread(target=run)
thread.start()
"""

with open("main.py", "w") as f:
    f.write(code)

print("File 'main.py' has been saved!")

In [None]:
from pyngrok import conf

# Replace 'YOUR_NGROK_AUTHTOKEN' with your actual token from ngrok dashboard
conf.get_default().auth_token = "30DzDLzHbU7gaimgih8PqfUrQYI_89BsgqqKyie4bMtU8jhvS"

In [None]:
# Find the process ID using port 8000
!lsof -t -i:8000

In [None]:
# Kill the process using the PID; repeat if multiple returned
!kill -9 12345

In [None]:
!kill -9 475

In [None]:
!lsof -t -i:8000

In [None]:
!python main.py

## Conclusion

This project demonstrates the end-to-end development of a sentiment analysis web application using state-of-the-art NLP models, FastAPI for the backend, and a custom front-end served seamlessly in a Google Colab environment. By combining machine learning, API development, cloud tunneling with ngrok, and interactive user experience, the project showcases practical technical skills highly relevant for data analytics and modern data-driven product delivery.

Beyond just building a working demo, this effort highlights experience in problem-solving, integrating multiple technologies, and delivering a polished, user-friendly application a valuable addition to any data analytics portfolio.