# Topic Modeling on Alternative Energy Tweets

This notebook demonstrates how topic modeling can be used to analyze public discourse on alternative energy (e.g., Solar, Lithium) using NLP techniques.
The example simulates part of a project done on global vs Bangladesh energy trends.

In [None]:
import pandas as pd
import re

# Twitter data loading 
# df = pd.read_csv('data/tweets.csv')
# Simulated tweet data
df = pd.DataFrame({
    'tweet': [
        'Tesla is revolutionizing lithium battery tech! #EV',
        'Solar panels are getting cheaper every year! https://solar.com',
        'Bangladesh explores wind energy potential.',
        'Hydrogen fuel is the future. Clean and efficient!',
        'Solar and wind can power the world 🌍'
    ]
})

# Clean the text
def clean_tweet(tweet):
    tweet = re.sub(r"http\S+|www\S+|https\S+", '', tweet, flags=re.MULTILINE)
    tweet = re.sub(r'\@\w+|\#', '', tweet)
    tweet = re.sub(r'[^\w\s]', '', tweet)
    return tweet.lower()

df['clean_text'] = df['tweet'].apply(clean_tweet)
df.head()

In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk

nltk.download('punkt')
nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
df['tokens'] = df['clean_text'].apply(lambda x: [word for word in word_tokenize(x) if word not in stop_words and len(word) > 3])
df[['clean_text', 'tokens']].head()

In [None]:
from gensim import corpora, models

# Create dictionary and corpus
dictionary = corpora.Dictionary(df['tokens'])
corpus = [dictionary.doc2bow(text) for text in df['tokens']]

# Build LDA model
lda_model = models.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=10)
topics = lda_model.print_topics()
for topic in topics:
    print(topic)

In [None]:
import pyLDAvis.gensim_models as gensimvis
import pyLDAvis

# Visualize topics
vis_data = gensimvis.prepare(lda_model, corpus, dictionary)
pyLDAvis.display(vis_data)  # Will work in Jupyter Notebook
# pyLDAvis.save_html(vis_data, 'visualizations/lda_visualization.html')

In [None]:
from sklearn.decomposition import PCA
from sklearn.feature_extraction.text import TfidfVectorizer
import matplotlib.pyplot as plt

vectorizer = TfidfVectorizer(max_features=1000)
X = vectorizer.fit_transform(df['clean_text'])

pca = PCA(n_components=2)
components = pca.fit_transform(X.toarray())

plt.figure(figsize=(10,6))
plt.scatter(components[:,0], components[:,1], alpha=0.7)
plt.title('PCA of Tweets on Alternative Energy')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

### ✅ Summary
In this notebook, we:
- Cleaned and tokenized tweets related to alternative energy.
- Applied LDA topic modeling to identify themes.
- Used PCA to visualize tweet clusters.

This replicates part of the methodology from the research project on energy perspectives.