<a href="https://colab.research.google.com/github/Eaby/NLP_Codes/blob/main/NU_IUI_TwitterMonitoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Twitter (X platform) Monitoring***

********************************************************************************

# **Task 1 : Trend Detection**

Twitter monitoring for trend detection using Natural Language Processing (NLP) involves collecting, analyzing, and interpreting Twitter data to identify emerging topics, sentiments, or patterns. The main steps involved are **Data Collection, Preprocessing, Feature Extraction, Trend Detection, Visualization, Alerts and Notifications**.
NLP can be a powerful tool for trend detection, human interpretation and judgment are essential. Always contextualize findings and consider external factors that might influence trends.

**Prerequisites:**

You'll need download the dataset from the below google drive location and save it in you google drive to run the code.

Dataset Link: https://drive.google.com/file/d/13tjdXgX3cSyw-IdJiRvfyx_8p92zs_S2/view?usp=sharing
File name of the dataset: **training.1600000.processed.noemoticon.csv**

Once the file is in you google drive,  replace the file_path in the code below with your correct file path of the dataset for executing the code.



In [None]:
pip install pandas nltk

In [None]:
import pandas as pd
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from google.colab import drive
import matplotlib.pyplot as plt
from nltk.util import bigrams

# Mount Google Drive
drive.mount('/content/drive')

# Ensure you've downloaded the necessary NLTK data
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

# Set the path to your dataset on Google Drive
file_path = '/content/drive/My Drive/MyData/training.1600000.processed.noemoticon.csv'

# Load dataset
data = pd.read_csv(file_path, encoding='ISO-8859-1', header=None)
data.columns = ['sentiment', 'id', 'date', 'query', 'user', 'text']

# Take input keyword from user
keyword = input("Enter the keyword to analyze trends for: ")
top_n = int(input("Enter the number of top trends you want to view (e.g. 10): "))

# Filter the dataset for tweets containing the input keyword
data = data[data['text'].str.contains(keyword, case=False, na=False)]

# If no tweets found for the given keyword, exit
if data.empty:
    print(f"No tweets found for keyword: {keyword}")
    exit()

# Tokenization, Lemmatization, and cleaning
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()
all_words = []
for text in data['text']:
    words = word_tokenize(text)
    words = [lemmatizer.lemmatize(word.lower()) for word in words if word.isalpha() and word not in stop_words]
    all_words.extend(words)

# Incorporate bigrams
all_bigrams = list(bigrams(all_words))
bigram_freq = Counter(all_bigrams)
top_bigrams = bigram_freq.most_common(top_n)

# Get the most common words associated with the keyword
word_freq = Counter(all_words)
trends = word_freq.most_common(top_n)  # Top N trends

# If no tweets found for the given keyword, exit
if data.empty:
    print(f"No tweets found for keyword: {keyword}")
    exit()

# ... your tokenization and cleaning code ...

# Check if trends list is empty
if not trends:
    print(f"No significant words found for the keyword: {keyword} in the sampled dataset.")
else:
    # Visualization
    words, counts = zip(*trends)
    plt.figure(figsize=(15,7))
    plt.bar(words, counts, color='skyblue')
    plt.title(f"Top {top_n} Trends associated with '{keyword}'")
    plt.ylabel('Count')
    plt.xticks(rotation=45)
    plt.show()


# Display top bigrams
print(f"\nTop {top_n} Bigrams associated with '{keyword}':")
for bigram, freq in top_bigrams:
    print(f"{bigram}: {freq}")

# Sentiment Analysis
positive_tweets = len(data[data['sentiment'] == 4])
neutral_tweets = len(data[data['sentiment'] == 2])
negative_tweets = len(data[data['sentiment'] == 0])

print(f"\nSentiment Analysis for '{keyword}':")
print(f"Positive Tweets: {positive_tweets}")
print(f"Neutral Tweets: {neutral_tweets}")
print(f"Negative Tweets: {negative_tweets}")


The above code performs keyword trend analysis and sentiment analysis on a dataset of tweets stored in Google Drive. The main objective of this code is to analyse how frequently certain words and bigrams appear in tweets containing a specific keyword entered by the user and to determine the sentiment of those tweets.

Libraries user:
**pandas** for data manipulation and analysis.
**Counter** to count occurrences of elements.
Several functions and modules from the **nltk** library for natural language processing.
**matplotlib.pyplot** for data visualization.

**NLTK Data Download:**
It downloads the necessary data files for the **nltk** library. This includes stopwords, tokenizers, lemmatizers, and POS taggers.

**User Input**:
The code takes two inputs from the user:
A keyword to analyze trends.
The number of top trends (words or bigrams) they want to view.

**Tokenization, Lemmatization, and Cleaning:**
The code performs several text preprocessing steps:
Tokenization: splitting text into individual words.
Lemmatization: converting words to their base form (e.g., "running" to "run").
Cleaning: removing non-alphanumeric words and stopwords (common words like "and", "the", etc.).
**Bigram Analysis:**
After tokenization, the code creates bigrams (pairs of adjacent words) and counts their occurrences.

**Top Trends:**
The code identifies the top N words (trends) associated with the input keyword.
** Visualization:**
The top N trends are visualized using a bar chart.

**Sentiment Analysis:**
The code counts the number of positive, neutral, and negative tweets associated with the keyword and prints out the counts for each sentiment category.
