Skip to content

Given filter words, fetches tweets and shows polarity, sentiment scores, topic modeling, and a word map based on the fetched tweets.

License

Notifications You must be signed in to change notification settings

degagawolde/twitter-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-Data-Analysis

Given filter words, fetches tweets and shows polarity, sentiment scores, topic modeling, and a word map based on the fetched tweets.

Twitter is A social media on which enormous data is being generated. Using Twitter Data, it is possible to various analyses on a particular product or entity. So, in this report, we will see step-by-step how to make Twitter data analysis.

Data Acquisition

The first step should be to understand your problem, what data it requires, and where you could get it. In this case, we can extract Twitter data using the API key provided by the Twitter Company upon request. This data is pulled into JSON format, which is a little bit difficult to be readable by humans.

Data Preparation

We should first convert the JSON file to DataFrame using pandas python library. And then save it as a CSSV file for further use. The typecasting should be performed to get a suitable and sensible data format. Special characters, emojis, and unwanted content should be cleared from the DataFrame.

Missing and None values should be handled. In some cases, we may fill missing values with a reasonable value. But if it is not non-sense to fill these values, they should be dropped.

handled extract_dataframe.py, clean_tweets_dataframe.py, preprocess_tweets_data.py

Exploratory Data Analysis(EDA)

After a clean DataFrame is generated, we should carry out the EDA to get an insight from the data. This insight will help to achieve our objective. In this section, we should be able to explore the statistical relationship between attributes. For example, we can extract the most common user mentions, the most hashtags, the number of positive and negative sentiments, etc.

handled by JupyterNotebook/EDA.ipynb


Modeling

The next step will be modeling to develop a system that can solve the challenge we are facing. Our objective is to perform sentimental analysis and topic modeling. In the first task, we are going to develop a classification algorithm. For the second, we use the unsupervised LDA model. handled by SentimentalAnalysis.ipynb and TopicModeling.ipynb in the JupyterNotebooks

MYSQL Integration

SQLAlchamy is used with pandas to have a higher level interface to the database. All the database related functionalities is handled by mysql_manager.py file

Dashboard

For this part I used streamlit to show different finding I got from the EDA notebook. In addition there are wordclouds gennerated using hashtags,user mensions and tweet texts.


MLOps pipeline

The most important takeaway from the week0 challenge is the MLOps pipeline. MLOps can help automate the steps from Data Engineering to the model deployment Phase. First, the data, the features generated by the data engineering phase, is stored on SQL/NoSQL. We should also register the model parameters and performance. While deployment, we fetch these values from the database and use them. This can also help in versioning the data and model. If there is a data drift, model performance Decay, and requirement change, we can set an alert and retrain the model and do the process from the start again.

Future Works

  • More test coverage
  • Add logging
  • More and better exception handling
  • More data analysis and modeling
  • Add Model/Data drift detection
  • Integrate model to the dashboard

About

Given filter words, fetches tweets and shows polarity, sentiment scores, topic modeling, and a word map based on the fetched tweets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published