<a href="https://colab.research.google.com/github/Ragav1821/machine-learning-project-1/blob/main/sendimental_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json


# Step 1: Download Sentiment140 dataset

In [4]:

!kaggle datasets download -d kazanova/sentiment140


Dataset URL: https://www.kaggle.com/datasets/kazanova/sentiment140
License(s): other
Downloading sentiment140.zip to /content
  0% 0.00/80.9M [00:00<?, ?B/s]
100% 80.9M/80.9M [00:00<00:00, 1.41GB/s]


# Step 2: Unzip the dataset

In [5]:

!unzip sentiment140.zip


Archive:  sentiment140.zip
  inflating: training.1600000.processed.noemoticon.csv  


# Step 3: import necessary library

In [7]:
import pandas as pd
import numpy as np
import re
import string
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report
import pickle
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

# Step 4: load dataset into df

In [9]:
from nltk.corpus import stopwords

df = pd.read_csv("training.1600000.processed.noemoticon.csv", encoding='latin-1', header=None)
df = df[[0, 5]]  # Only sentiment and tweet text
df.columns = ['label', 'text']
# Convert label: 0 = Negative, 4 = Positive â†’ normalize to 0 & 1
df['label'] = df['label'].replace(4, 1)
# Clean text function
stop_words = set(stopwords.words('english'))
def clean_text(text):
    text = re.sub(r'http\S+', '', text)  # Remove URLs
    text = re.sub(r'@\w+', '', text)     # Remove mentions
    text = re.sub(r'#\w+', '', text)     # Remove hashtags
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = text.lower()                  # Lowercase
    text = ' '.join([word for word in text.split() if word not in stop_words])  # Remove stopwords
    return text
df['text'] = df['text'].apply(clean_text)
vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df['text'])
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
with open('sentiment_model.pkl', 'wb') as f:
    pickle.dump((vectorizer, model), f)
with open('sentiment_model.pkl', 'rb') as f:
    vectorizer, model = pickle.load(f)
def predict_sentiment(text):
    text = clean_text(text)
    vector = vectorizer.transform([text])
    prediction = model.predict(vector)[0]
    return "Positive ðŸ˜Š" if prediction == 1 else "Negative ðŸ˜”"
# Example
print(predict_sentiment("i love about something"))

Accuracy: 0.77350625
              precision    recall  f1-score   support

           0       0.78      0.75      0.77    159494
           1       0.76      0.80      0.78    160506

    accuracy                           0.77    320000
   macro avg       0.77      0.77      0.77    320000
weighted avg       0.77      0.77      0.77    320000

Positive ðŸ˜Š


In [10]:
!pip install gradio




#Step 11:To create the  user interface (UI)  using gradio in Google Colab

In [11]:
import pickle
import gradio as gr

# Load the saved model and vectorizer
with open("sentiment_model.pkl", "rb") as f:
    vectorizer, model = pickle.load(f)

# Define prediction function
def predict_sentiment(text):
    vector = vectorizer.transform([text])
    pred = model.predict(vector)[0]
    return "ðŸ˜Š Positive" if pred == 1 else "ðŸ˜” Negative"

# Gradio interface with custom background and number 18
interface = gr.Interface(
    fn=predict_sentiment,
    inputs=gr.Textbox(lines=4, placeholder="Type your sentence here..."),
    outputs="text",
    title="ðŸ’¬ Sentiment Analysis",
    description="Enter a sentence to predict if it is Positive or Negative.",
    theme="dark",  # Using Gradio's dark theme
    css="""
    .gradio-container {
        background-color: #000000 !important;
        color: white !important;
        position: relative;
    }
    .gradio-button {
        background-color: #333333 !important;
    }
    .gradio-input {
        background-color: #333333 !important;
        color: white !important;
    }
    .gradio-output {
        background-color: #333333 !important;
        color: white !important;
    }
    .gradio-title {
        color: white !important;
    }
    .gradio-description {
        color: white !important;
    }
    /* Styling for number 18 */
    .number-18 {
        position: absolute;
        top: 50%;
        left: 50%;
        transform: translate(-50%, -50%);
        font-size: 200px;
        font-weight: bold;
        color: lightblue;
        text-shadow: 3px 3px 5px #fff, 0 0 25px #00f, 0 0 5px #00f;
        z-index: -1;  /* Ensure it's behind the other elements */
    }
    """
)

# Launch the interface
interface.launch(share=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.

Sorry, we can't find the page you are looking for.


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://66caccffcd04cf5ab8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


