Twitter-Data-Analysis

Given filter words, fetches tweets and shows polarity, sentiment scores, topic modeling, and a word map based on the fetched tweets.

Twitter is A social media on which enormous data is being generated. Using Twitter Data, it is possible to various analyses on a particular product or entity. So, in this report, we will see step-by-step how to make Twitter data analysis.

Data Acquisition

The first step should be to understand your problem, what data it requires, and where you could get it. In this case, we can extract Twitter data using the API key provided by the Twitter Company upon request. This data is pulled into JSON format, which is a little bit difficult to be readable by humans.

Data Preparation

We should first convert the JSON file to DataFrame using pandas python library. And then save it as a CSSV file for further use. The typecasting should be performed to get a suitable and sensible data format. Special characters, emojis, and unwanted content should be cleared from the DataFrame.

Missing and None values should be handled. In some cases, we may fill missing values with a reasonable value. But if it is not non-sense to fill these values, they should be dropped.

handled extract_dataframe.py, clean_tweets_dataframe.py, preprocess_tweets_data.py

Exploratory Data Analysis(EDA)

After a clean DataFrame is generated, we should carry out the EDA to get an insight from the data. This insight will help to achieve our objective. In this section, we should be able to explore the statistical relationship between attributes. For example, we can extract the most common user mentions, the most hashtags, the number of positive and negative sentiments, etc.

handled by JupyterNotebook/EDA.ipynb

Modeling

The next step will be modeling to develop a system that can solve the challenge we are facing. Our objective is to perform sentimental analysis and topic modeling. In the first task, we are going to develop a classification algorithm. For the second, we use the unsupervised LDA model. handled by SentimentalAnalysis.ipynb and TopicModeling.ipynb in the JupyterNotebooks

MYSQL Integration

SQLAlchamy is used with pandas to have a higher level interface to the database. All the database related functionalities is handled by mysql_manager.py file

Dashboard

For this part I used streamlit to show different finding I got from the EDA notebook. In addition there are wordclouds gennerated using hashtags,user mensions and tweet texts.

MLOps pipeline

The most important takeaway from the week0 challenge is the MLOps pipeline. MLOps can help automate the steps from Data Engineering to the model deployment Phase. First, the data, the features generated by the data engineering phase, is stored on SQL/NoSQL. We should also register the model parameters and performance. While deployment, we fetch these values from the database and use them. This can also help in versioning the data and model. If there is a data drift, model performance Decay, and requirement change, we can set an alert and retrain the model and do the process from the start again.

Future Works

More test coverage
Add logging
More and better exception handling
More data analysis and modeling
Add Model/Data drift detection
Integrate model to the dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
JupyterNotebooks		JupyterNotebooks
data		data
tests		tests
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
0.0.18		0.0.18
Dockerfile		Dockerfile
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
clean_tweet_dataframe.csv		clean_tweet_dataframe.csv
clean_tweets_dataframe.py		clean_tweets_dataframe.py
cleaned_schema.sql		cleaned_schema.sql
cleaned_tweet_data.csv		cleaned_tweet_data.csv
dashboard.py		dashboard.py
docker-compose.yml		docker-compose.yml
extract_dataframe.py		extract_dataframe.py
labled_schema.sql		labled_schema.sql
mysql_manager.py		mysql_manager.py
preprocess_tweet_data.py		preprocess_tweet_data.py
processed_tweet_data.csv		processed_tweet_data.csv
read.py		read.py
requirements.txt		requirements.txt
setup.sh		setup.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter-Data-Analysis

Data Acquisition

Data Preparation

Exploratory Data Analysis(EDA)

Modeling

MYSQL Integration

Dashboard

MLOps pipeline

Future Works

About

Releases

Packages

Languages

License

degagawolde/twitter-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter-Data-Analysis

Data Acquisition

Data Preparation

Exploratory Data Analysis(EDA)

Modeling

MYSQL Integration

Dashboard

MLOps pipeline

Future Works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages