# **A. Sentimen Analysis - Bahasa Inggris**


## Import Library

* VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.
Source : https://github.com/cjhutto/vaderSentiment

In [None]:
# Install Library
!pip install vaderSentiment

* Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
NLTK is a leading platform for building Python programs to work with human language data.



In [None]:
# Import Library
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk

## Import Data

In [None]:
# Import Data from Github
url = 'https://raw.githubusercontent.com/amaliaristantya/Text-Analytics/main/General%20Motor.csv'
df = pd.read_csv(url, sep=',',)


In [None]:
df.head(5)

In [None]:
df.shape

## Pemodelan

In [None]:
#Change Title to String
df['text'] = df['text'].astype(str)

In [None]:
# Import library for Text Analytics
import nltk
nltk.download('vader_lexicon')

In [None]:
# Sentiment Analysis
sid = SentimentIntensityAnalyzer()
listy = [] 
for index, row in df.iterrows():
  df['text']
  ss = sid.polarity_scores(row['text'])
  listy.append(ss)
  
se = pd.Series(listy)
df['polarity'] = se.values
display(df.head(10))

In [None]:
# Pie Chart
import matplotlib.pyplot as plt
labels = ['negative', 'neutral', 'positive']
sizes  = [ss['neg'], ss['neu'], ss['pos']]
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.axis('equal') 
plt.show()

In [None]:
df.to_csv('Output_File.csv', index=False)



---

# **B. Topic Modeling**

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.

## Install & Import Library

In [None]:
# Install pyLDAvis
! pip install pyLDAvis

In [None]:
# Import Libraries
import nltk
import os
import numpy as np, pyLDAvis, pyLDAvis.sklearn; pyLDAvis.enable_notebook()

# Import Modules
from __future__ import print_function 
from tqdm import tqdm
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from matplotlib import pyplot as plt

import warnings
warnings.filterwarnings("ignore")

## Import Data

In [None]:
# Clone Data from Github
! git clone https://github.com/amaliaristantya/Text-Analytics

# Set Data Directory
os.chdir('Text-Analytics')


In [None]:
# Import Data
nltk.download('stopwords')
data_file = 'General Motor.csv'
n_topics, Top_Topics, Top_Words = 4, 5, 5 # Depends on the purpose of analytics
max_df, min_df = 0.75, 10 # Can be adjusted

In [None]:
# Import Library
import MyLib as TS

## Pemodelan 

In [None]:
# Import Stop Words
nltk.download('stopwords')

# Import Data
data_file = 'General Motor.csv'

# Load Tweets Data
import MyLib as TS
Tweets = TS.LoadTxt(data_file) 
print('Total loaded tweets = {0}'.format(len(Tweets)))

In [None]:
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',stop_words = 'english', lowercase = True, token_pattern = r'\b[a-zA-Z]{3,}\b',max_df = max_df, min_df = min_df)
dtm_tf = tf_vectorizer.fit_transform(Tweets)
tf_terms = tf_vectorizer.get_feature_names()
del Tweets
print('Done Calculating VSM ... ', flush = True)

In [None]:
# LDA Topics
lda_tf = LatentDirichletAllocation(n_components=n_topics, learning_method='online', random_state=0).fit(dtm_tf)
print('Done LDA topics ... ', flush = True) 

In [None]:
vsm_topics = lda_tf.transform(dtm_tf); doc_topic =  [a.argmax()+1 for a in tqdm(vsm_topics)] # topic of docs
print('In total there are {0} major topics, distributed as follows'.format(len(set(doc_topic))))
plt.hist(np.array(doc_topic), alpha=0.5); plt.show()
print('Printing top {0} Topics, with top {1} Words:'.format(Top_Topics, Top_Words))
TS.print_Topics(lda_tf, tf_terms, Top_Topics, Top_Words)

In [None]:
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer) # Interactively visualizing the Topics, please ignore the Warnings
# Wait few minutes and then hover the Mouse over the Topics to Explore



---



---

# **C. Text Network**



Based on Social Network Analysis (SNA), implement on wod/phases/ n-gram that act as actor

## Import Library

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
import community
import seaborn as sns
import csv

## Import Data

In [None]:
# Import Data from Github
url = 'https://raw.githubusercontent.com/amaliaristantya/Text-Analytics/main/General%20Motor-TNA.csv'
df_tna = pd.read_csv(url, sep=';')
df_tna.head()

## Pemodelan Jaringan

In [None]:
# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tna)

In [None]:
# Degree Centrality
degree = nx.degree_centrality(G1)

# Sorted from the Highest
sorted(nx.degree(G1), key=lambda x: x[1], reverse=True)[0:10]

In [None]:
# Set Degree Dictionary
d = dict(degree)

# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tna)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', nodelist=d.keys(),
        node_size=[v * 60000 for v in d.values()], 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=10,
        pos=nx.kamada_kawai_layout(G1))

In [None]:
# Show Number of Nodes
nx.number_of_nodes(G1)

In [None]:
# Show Number of Edges
nx.number_of_edges(G1)

In [None]:
# Show Graph Density
nx.density(G1)

In [None]:
# Show Number of Connected Component
nx.number_connected_components(G1)

In [None]:
# Import Module
from networkx.algorithms.community import greedy_modularity_communities

# Modularity Community Detection
communities_m = sorted(greedy_modularity_communities(G1), key=len, reverse=True)
communities_m