# What's trending on Twitter now ?

This project aims to find out the trending topics on Twitter in real time based on the frequencies of the hashtags used. Twitter API is used to get the streaming data and the results are then processed to determine the trending topics. Python is used to process the data. The library 'tweepy' is used to get the streaming data from API.

In [1]:
# Importing libraries
import numpy as np
import pandas as pd
import time
import tweepy
import json
from tweepy import Stream
from tweepy.streaming import StreamListener

In order to access Twitter Streaming data, a Twitter account needs to be created and the following four keys needs to be generated.

In [2]:
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

The streaming data can be extracted by extending the 'StreamListener' Class. By default, Twitter's Streaming data will be continuous and will not stop. In order to restrict the number of tweets collected, a time limit is set in the constructor of the extended class. As long as the time limit is not exceeded, the 'on_data' method will write the Tweets collected in real time to a file. 

In [3]:
class MyListener(StreamListener):
    def __init__(self, time_limit=60):
        self.start_time = time.time()
        self.limit = time_limit
        self.saveFile = open('python.json', 'a')
        super(MyListener, self).__init__()
            
    def on_data(self, data):
        try:
            if (time.time() - self.start_time) < self.limit:
                with open('python.json', 'a') as f:
                    f.write(data)
                    return True
            else:
                self.saveFile.close()
                return False

        except BaseException as e:
            print(("Error on_data: %s") % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

Tweepy provides methods to get access to streaming data. The keys generated will be used to gain access to the data. As mentioned above, Twitter's streaming data will enter an infinite loop by default and may block the other processes. In order to avoid this, the 'async' parameter in set to True in the 'Stream' object. Further by using the parameter 'track' only those tweets containing the desired terms can be extracted.

In [9]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

twitter_stream = Stream(auth, MyListener())
#twitter_stream.filter(async=True)
twitter_stream.filter(track=['a'], async=True)

## Code to print the tweets on one's timeline.   
#api = tweepy.API(auth)
#public_tweets = api.home_timeline()
#for tweet in public_tweets:
#    print(tweet.text)

406
406


The tweets collected were saved in a file in JSON format. The following piece of code will then extract the tweets from the file and save it as a Python List.

In [38]:
tweets = []
with open('python.json', 'r') as f:
    for line in f:
        tweets.append(json.loads(line))        

1

The collected tweets will have a number of parameters besides the actual text of the tweets. In order to extract the hashtags, the text in each of the tweets is split into words and each word is then scanned to see if it begins with the character '#'. The extracted hashtags can then be stored in a Python List.

In [44]:
hashtag = []
for i in range(len(tweets)):
    
    ## It looks like attribute 'text' is missing in some tweets
    ## Not using such tweets
    flag=0
    for key, value in tweets[i].items():
        if key == 'text':
            flag=1
            break
        else:
            flag=0
    
    if flag==1:
        for text in tweets[i]['text'].split(" "):
            if text.startswith('#'):
                hashtag.append(text[1:])
#hashtag

A Python dictionary is then created to store the frequencies of occurrences of each hashtag. 

In [56]:
hashtag_dict = {}
for i in hashtag:
    if i in hashtag_dict:
        hashtag_dict[i] += 1
    else:
        hashtag_dict[i] = 1

The top 'N' hashtags can then be computed. This can be used as a measure to study the trending topics on Twitter based on the tweets collected over a period of 'T' seconds in real time.

In [57]:
import operator

hashtags_sorted = sorted(hashtag_dict.items(), key=operator.itemgetter(1), reverse=True)
hashtags_sorted[:10]

[('TreCru', 35),
 ('인피니트', 9),
 ('태풍', 9),
 ('DialogoEsHambruna', 5),
 ('Win', 5),
 ('4DaysToLion…', 5),
 ('FelizLunes', 5),
 ('WIN', 5),
 ('Comp', 3),
 ('NationalBoyfriendDay', 3)]

As of this writing, the game 'Treasure Cruise' seem to be trending with a total of 35 tweets with the hashtags 'TreCru' over a period of 60s.