Natural language processing has extensively applied in sentiment analysis as a
significant challenge. In this scenario, the objective is to determine if the tweets
shared by customers regarding technology companies that produce and sell
mobiles, computers, laptops, and similar products express positive sentiment or
negative sentiment.
The goal will be to build a system that can accurately classify the new tweets
sentiments. You can divide the data into train and test. The Evaluation metric you
should use is the accuracy.


**Importing Libraries**

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

In [2]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


True

**Read the dataset into python evn**

In [4]:
import pandas as pd

tweets_df =pd.read_csv('/content/tweets.csv')
tweets_df

Unnamed: 0,id,label,tweet
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...
1,2,0,Finally a transparant silicon case ^^ Thanks t...
2,3,0,We love this! Would you go? #talk #makememorie...
3,4,0,I'm wired I know I'm George I was made that wa...
4,5,1,What amazing service! Apple won't even talk to...
...,...,...,...
7915,7916,0,Live out loud #lol #liveoutloud #selfie #smile...
7916,7917,0,We would like to wish you an amazing day! Make...
7917,7918,0,Helping my lovely 90 year old neighbor with he...
7918,7919,0,Finally got my #smart #pocket #wifi stay conne...


In [5]:
# Negative Comments List
tweets_df[tweets_df['label']==0].head()

Unnamed: 0,id,label,tweet
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...
1,2,0,Finally a transparant silicon case ^^ Thanks t...
2,3,0,We love this! Would you go? #talk #makememorie...
3,4,0,I'm wired I know I'm George I was made that wa...
6,7,0,Happy for us .. #instapic #instadaily #us #son...


In [6]:
# Positive  Comments List
tweets_df[tweets_df['label']==1].head()

Unnamed: 0,id,label,tweet
4,5,1,What amazing service! Apple won't even talk to...
5,6,1,iPhone software update fucked up my phone big ...
10,11,1,hey #apple when you make a new ipod dont make ...
11,12,1,Ha! Not heavy machinery but it does what I nee...
12,13,1,Contemplating giving in to the iPhone bandwago...


In [7]:
# Count of Negative and Positive Comments

tweets_df['label'].value_counts()

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
0,5894
1,2026


#**Text Preprocessing**

In [23]:
import re
import string

def cleaned_tweet(text):
    # 1. Lowercase
    text = text.lower()

    # 2. Remove URLs (http, https, www)
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)

    # 3. Remove mentions (@username) and hashtags (#topic)
    text = re.sub(r'\@\w+|\#\w+', '', text)

    # 4. Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))

    # 5. Remove numbers
    text = re.sub(r'\d+', '', text)

    # 6. Remove extra whitespace
    text = text.strip()
    text = re.sub(r'\s+', ' ', text)

    return text


In [11]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [24]:
# Apply Preprocessing

# Apply the cleaned_tweet function to the "tweet" column
tweets_df['cleaned_tweet'] = tweets_df['tweet'].apply(cleaned_tweet)


In [26]:
print(tweets_df[['tweet', 'cleaned_tweet']].head(5))

                                               tweet  \
0  #fingerprint #Pregnancy Test https://goo.gl/h1...   
1  Finally a transparant silicon case ^^ Thanks t...   
2  We love this! Would you go? #talk #makememorie...   
3  I'm wired I know I'm George I was made that wa...   
4  What amazing service! Apple won't even talk to...   

                                       cleaned_tweet  
0                                               test  
1  finally a transparant silicon case thanks to m...  
2                          we love this would you go  
3      im wired i know im george i was made that way  
4  what amazing service apple wont even talk to m...  


**Vectorization**

In [28]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Create a TF-IDF Vectorizer
vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(tweets_df['cleaned_tweet'])

In [29]:
y = tweets_df['label']

In [30]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [31]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       0.85      0.95      0.90      1735
           1       0.81      0.55      0.65       641

    accuracy                           0.84      2376
   macro avg       0.83      0.75      0.78      2376
weighted avg       0.84      0.84      0.83      2376



In [34]:

def predict_sentiment(new_tweet):
    # Clean the tweet
    cleaned = cleaned_tweet(new_tweet)

    # Vectorize the tweet
    vectorized = vectorizer.transform([cleaned])

    # Predict sentiment using the trained model
    prediction = model.predict(vectorized)[0]

    # Format result
    sentiment = 'Positive 😊' if prediction == 1 else 'Negative 😞'
    return sentiment

In [38]:
# Display the Negative
print(predict_sentiment("Worst experience ever. Totally disappointed."))

Negative 😞


In [39]:
# Display the Positive
print(predict_sentiment("iPhone software update fucked up my phone big time Stupid iPhones"))

Positive 😊
