This project is created to test basic text sentimental and categorization. So, I scrapped 1000 tweets and analyze them.
All in python.
I used Tweepy library to scrap some tweets with a keyword.
I use Textblob library which has the sentiment method.
I use Textblob library to train 5 classifiers and use them jointly for multi-classification, categorize.
tweet_sentimental.py
Scrapping and sentimental Steps:
- Set up twitter's keys and tokens from the Twitter Dev Console.
- Set up your keyword, target date (start scrapping), number tweets and retweet flag.
- Set up your output CSV file.
- Run tweet_sentimental.py, which call twitter API and then create a list of tweets with their sentimental, can observe at get_tweets().
Training Categorize Model.py tweet_categorize.py
I need to create a model for multi-classification. So, I need to have a dataset which contains text and label.
Dataset Acquired Steps:
- Use twitter scrapping to scrap tweets with hashtag sport, tech, government&politic, education, and business.
- Use Textblob library to train one-vs-all classifications for each category.
- Save them with pickle library.
Categorize Steps:
- Load the models.
- Load the dataset and classify it with all classifiers.
- Normalize the results and select the highest probability.
I scrapped 1000 tweets with #bbcnews since 01-06-2019. Then, I categorized and sentiments them. As a result, 40% of tweets are about politic with 3 times positive than negative. Then, 25% of them is about a business which a bit more negative than positive. So, others are education (8%), tech (1%) and no sport.