### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Scenario**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [2]:
# Copy the YAML file and your Twitter keys over before you start to work.
import yaml
from yaml.loader import SafeLoader
from twitter import *

# Import the YAML file – remember to specify the whole path.
twitter_creds = yaml.safe_load(open('twitter.yaml', 'r').read())

# Pass your Twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key'] ))

In [3]:
# See if you are connected.
print(twitter_api)

<twitter.api.Twitter object at 0x000001E39146DF70>


In [4]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q='#python')

# View the output.
print(python_tweets)

{'statuses': [{'created_at': 'Thu Feb 09 11:27:59 +0000 2023', 'id': 1623644878461628416, 'id_str': '1623644878461628416', 'text': 'RT @xen0f0n: https://t.co/5iZgScXLqr\n\n#earthobservation #remotesensing #machinelearning #deeplearning #python @robmarkcole @planet @samapri…', 'truncated': False, 'entities': {'hashtags': [{'text': 'earthobservation', 'indices': [38, 55]}, {'text': 'remotesensing', 'indices': [56, 70]}, {'text': 'machinelearning', 'indices': [71, 87]}, {'text': 'deeplearning', 'indices': [88, 101]}, {'text': 'python', 'indices': [102, 109]}], 'symbols': [], 'user_mentions': [{'screen_name': 'xen0f0n', 'name': 'Akis Karagiannis', 'id': 1032587510, 'id_str': '1032587510', 'indices': [3, 11]}, {'screen_name': 'robmarkcole', 'name': 'Robin Cole', 'id': 30924981, 'id_str': '30924981', 'indices': [110, 122]}, {'screen_name': 'planet', 'name': 'Planet', 'id': 17663776, 'id_str': '17663776', 'indices': [123, 130]}], 'urls': [{'url': 'https://t.co/5iZgScXLqr', 'expanded_url': 'ht

## 2. Identify worldwide trends

In [5]:
# Identify worldwide trends
trends_worldwide = twitter_api.trends.available()

# Identify the number of available trends
print(len(trends_worldwide))

# Example of trends_worldwide
trends_worldwide[0]

467


{'name': 'Worldwide',
 'placeType': {'code': 19, 'name': 'Supername'},
 'url': 'http://where.yahooapis.com/v1/place/1',
 'parentid': 0,
 'country': '',
 'woeid': 1,
 'countryCode': None}

## London

In [6]:
# Identify specific city (London)

our_city = 'London'

# Create a variable.
list_of_names_our_city = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(list_of_names_our_city))

# Use index to find London.
list_of_names_our_city[0]

1


{'name': 'London',
 'placeType': {'code': 7, 'name': 'Town'},
 'url': 'http://where.yahooapis.com/v1/place/44418',
 'parentid': 23424975,
 'country': 'United Kingdom',
 'woeid': 44418,
 'countryCode': 'GB'}

In [7]:
# List of 'where on earth identifiers' (woeid)
list_of_names_our_city[0]['woeid']

44418

In [8]:
london_trends = twitter_api.trends.place(_id = list_of_names_our_city[0]['woeid'])

# View the output
london_trends

[{'trends': [{'name': '#NAW2023',
    'url': 'http://twitter.com/search?q=%23NAW2023',
    'promoted_content': None,
    'query': '%23NAW2023',
    'tweet_volume': None},
   {'name': '#ThisMorning',
    'url': 'http://twitter.com/search?q=%23ThisMorning',
    'promoted_content': None,
    'query': '%23ThisMorning',
    'tweet_volume': None},
   {'name': 'Durant',
    'url': 'http://twitter.com/search?q=Durant',
    'promoted_content': None,
    'query': 'Durant',
    'tweet_volume': 144569},
   {'name': 'Elon',
    'url': 'http://twitter.com/search?q=Elon',
    'promoted_content': None,
    'query': 'Elon',
    'tweet_volume': 351642},
   {'name': '#skillsforlife',
    'url': 'http://twitter.com/search?q=%23skillsforlife',
    'promoted_content': None,
    'query': '%23skillsforlife',
    'tweet_volume': None},
   {'name': '#apprenticeships',
    'url': 'http://twitter.com/search?q=%23apprenticeships',
    'promoted_content': None,
    'query': '%23apprenticeships',
    'tweet_volume':

In [9]:
import pandas as pd

# Create a DF
london_trends_pd = pd.DataFrame(london_trends[0]['trends'])

london_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#NAW2023,http://twitter.com/search?q=%23NAW2023,,%23NAW2023,
1,#ThisMorning,http://twitter.com/search?q=%23ThisMorning,,%23ThisMorning,
2,Durant,http://twitter.com/search?q=Durant,,Durant,144569.0
3,Elon,http://twitter.com/search?q=Elon,,Elon,351642.0
4,#skillsforlife,http://twitter.com/search?q=%23skillsforlife,,%23skillsforlife,
5,#apprenticeships,http://twitter.com/search?q=%23apprenticeships,,%23apprenticeships,
6,Russ,http://twitter.com/search?q=Russ,,Russ,66798.0
7,Churchill,http://twitter.com/search?q=Churchill,,Churchill,11997.0
8,Lee Anderson,http://twitter.com/search?q=%22Lee+Anderson%22,,%22Lee+Anderson%22,38344.0
9,European Super League,http://twitter.com/search?q=%22European+Super+...,,%22European+Super+League%22,


In [12]:
# Limit to 500k tweets - note: 500k return 0, changed to 100k

london_trends_500k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 100000]\
.sort_values('tweet_volume', ascending=False)

print(london_trends_500k_pd.shape)
london_trends_500k_pd

(7, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
3,Elon,http://twitter.com/search?q=Elon,,Elon,351642.0
30,Twitter Blue,http://twitter.com/search?q=%22Twitter+Blue%22,,%22Twitter+Blue%22,233107.0
34,Musk,http://twitter.com/search?q=Musk,,Musk,225168.0
13,Suns,http://twitter.com/search?q=Suns,,Suns,184998.0
11,Putin,http://twitter.com/search?q=Putin,,Putin,158714.0
2,Durant,http://twitter.com/search?q=Durant,,Durant,144569.0
20,Nets,http://twitter.com/search?q=Nets,,Nets,128076.0


In [14]:
# Save to csv
london_trends_500k_pd.to_csv('london_trends_500k.csv', index=False)

## New York

In [15]:
# Find NY
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

2459115

In [16]:
# Import JSON.
import json

# Search for NY
ny_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(ny_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "Russ",
                "url": "http://twitter.com/search?q=Russ",
                "promoted_content": null,
                "query": "Russ",
                "tweet_volume": 66832
            },
            {
                "name": "Suns",
                "url": "http://twitter.com/search?q=Suns",
                "promoted_content": null,
                "query": "Suns",
                "tweet_volume": 185628
            },
            {
                "name": "Nets",
                "url": "http://twitter.com/search?q=Nets",
                "promoted_content": null,
                "query": "Nets",
                "tweet_volume": 128486
            },
            {
                "name": "Ben Simmons",
                "url": "http://twitter.com/search?q=%22Ben+Simmons%22",
                "promoted_content": null,
                "query": "%22Ben+Simmons%22",
                "tweet_volume": 22139
            },
     

In [17]:
# Create a DF
ny_trends_pd = pd.DataFrame(ny_trends[0]['trends'])

ny_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,Russ,http://twitter.com/search?q=Russ,,Russ,66832.0
1,Suns,http://twitter.com/search?q=Suns,,Suns,185628.0
2,Nets,http://twitter.com/search?q=Nets,,Nets,128486.0
3,Ben Simmons,http://twitter.com/search?q=%22Ben+Simmons%22,,%22Ben+Simmons%22,22139.0
4,Ayton,http://twitter.com/search?q=Ayton,,Ayton,25863.0
5,Mikal,http://twitter.com/search?q=Mikal,,Mikal,58565.0
6,Booker,http://twitter.com/search?q=Booker,,Booker,28492.0
7,Chris Paul,http://twitter.com/search?q=%22Chris+Paul%22,,%22Chris+Paul%22,18646.0
8,#NBATradeDeadline,http://twitter.com/search?q=%23NBATradeDeadline,,%23NBATradeDeadline,
9,takuto,http://twitter.com/search?q=takuto,,takuto,


In [20]:
# Narrow the list down to 500,000 tweets.

ny_trends_over500k_pd = ny_trends_pd[ny_trends_pd['tweet_volume'] > 100000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(ny_trends_over500k_pd.shape)
ny_trends_over500k_pd

(4, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
10,The NBA,http://twitter.com/search?q=%22The+NBA%22,,%22The+NBA%22,195077.0
1,Suns,http://twitter.com/search?q=Suns,,Suns,185628.0
15,The West,http://twitter.com/search?q=%22The+West%22,,%22The+West%22,144314.0
2,Nets,http://twitter.com/search?q=Nets,,Nets,128486.0


In [21]:
# Save output as CSV file.
ny_trends_over500k_pd.to_csv('ct_over500k.csv', index=False)

## 3. Common trends

In [22]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View the output.
print(london_trends_list)

['#NAW2023', '#ThisMorning', 'Durant', 'Elon', '#skillsforlife', '#apprenticeships', 'Russ', 'Churchill', 'Lee Anderson', 'European Super League', 'Brian Hughes', 'Putin', 'Toy Story 5', 'Suns', 'Death Penalty', 'Wordle 600 X', 'Ian Hislop', 'Verity', 'lee rigby', 'Timothy Evans', 'Nets', 'booker', 'Ashfield', 'West Lancashire', 'Mone', 'Brooklyn', 'New Tory', 'Ben Simmons', 'Starlink', 'Daily Quordle 381', 'Twitter Blue', 'martin lewis', 'Priti Patel', 'Mastodon', 'Musk', 'Kate Bush', 'Morning Karen', 'Breaking Bad', 'Skubala', 'Metroid Prime', 'Frozen 3', 'Levelling', 'Zootopia 2', 'Roger Waters', 'Kanye', 'Professor Layton', 'Morning Paul', 'Deputy Chair', 'Eminem']


In [23]:
# Find common topic(s).
ny_trends_list =[trend['name'] for trend in ny_trends[0]['trends']]

# View the output.
print(ny_trends_list)

['Russ', 'Suns', 'Nets', 'Ben Simmons', 'Ayton', 'Mikal', 'Booker', 'Chris Paul', '#NBATradeDeadline', 'takuto', 'The NBA', 'Wordle 600 X', 'Cam Johnson', 'KD and Kyrie', 'Joe Tsai', 'The West', 'Good Thursday', 'TJ Warren', '#NationalPizzaDay', 'KD to Phoenix', 'Jae Crowder', '#YouNetflix', "James O'Keefe", 'Kyrie and KD', 'Daily Quordle 381', 'Clippers', 'Sean Marks', 'KD is a Sun', 'DBook', 'David Stern', '#WeAreTheValley', 'claxton', 'Josh Hart', 'KD to PHX', 'Vernon', 'Damn KD', 'Toy Story 5', 'Blazers', '#MetroidPrimeRemastered', 'Seth Curry', 'seob', 'kd & kyrie', 'Tupac', 'Sixers', 'Illey', 'Torrey Craig', 'Fetterman', 'Rockets', 'Zootopia 2', 'James Jones']


In [24]:
# Find common trends between cities.
london_trends_set = set(london_trends_list)
ny_trends_set = set(ny_trends_list)

# Set the variable.
common_trends = london_trends_set.intersection(ny_trends_set)

# View the output.
print(common_trends)

{'Nets', 'Toy Story 5', 'Daily Quordle 381', 'Russ', 'Zootopia 2', 'Wordle 600 X', 'Suns', 'Ben Simmons'}


## Search for Bitcoin

In [27]:
# Search for Bitcoin

q = '#Bitcoin'

# Set count to 100.
Count = 100

In [28]:
# Read some tweets.
search_results = twitter_api.search.tweets(q=q, count=100)

statuses = search_results['statuses']

In [29]:
for _ in range(5):
    print("Length of statuses", len(statuses))
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError: # No more results when next_results doesn't exist
        break
        
    # Create a dictionary from next_results
    kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])
    
    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results['statuses']

print(json.dumps(statuses[1], indent=1))

Length of statuses 90
Length of statuses 158
Length of statuses 158
{
 "created_at": "Thu Feb 09 11:38:24 +0000 2023",
 "id": 1623647500321140736,
 "id_str": "1623647500321140736",
 "text": "If you are new to #bitcoin and you see anyone with .btc or .eth in their name, or if they are trying to say good th\u2026 https://t.co/oBnbx88vuN",
 "truncated": true,
 "entities": {
  "hashtags": [
   {
    "text": "bitcoin",
    "indices": [
     18,
     26
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [],
  "urls": [
   {
    "url": "https://t.co/oBnbx88vuN",
    "expanded_url": "https://twitter.com/i/web/status/1623647500321140736",
    "display_url": "twitter.com/i/web/status/1\u2026",
    "indices": [
     117,
     140
    ]
   }
  ]
 },
 "metadata": {
  "iso_language_code": "en",
  "result_type": "recent"
 },
 "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
 "in_reply_to_status_id": null,
 "in_reply_to_status_id_str": null,
 "in_rep

In [30]:
# Check statuses.
t = statuses[0]

# Print the keys.
t.keys()

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])

In [31]:
# Find the id.
print(t['id'])

# View the ouput in text.
print(t['text'])

# View entities.
t['entities']

1623647501797531649
Retweet,if you want @binance to accept #BabyDoge #BabyDogeArmy #Binance #BUSD #BNB #Bitcoin #LUNC #ETH #CZBinance… https://t.co/xz59rZHQIq


{'hashtags': [{'text': 'BabyDoge', 'indices': [39, 48]},
  {'text': 'BabyDogeArmy', 'indices': [49, 62]},
  {'text': 'Binance', 'indices': [63, 71]},
  {'text': 'BUSD', 'indices': [72, 77]},
  {'text': 'BNB', 'indices': [78, 82]},
  {'text': 'Bitcoin', 'indices': [83, 91]},
  {'text': 'LUNC', 'indices': [92, 97]},
  {'text': 'ETH', 'indices': [98, 102]},
  {'text': 'CZBinance', 'indices': [103, 113]}],
 'symbols': [],
 'user_mentions': [{'screen_name': 'binance',
   'name': 'Binance',
   'id': 877807935493033984,
   'id_str': '877807935493033984',
   'indices': [20, 28]}],
 'urls': [{'url': 'https://t.co/xz59rZHQIq',
   'expanded_url': 'https://twitter.com/i/web/status/1623647501797531649',
   'display_url': 'twitter.com/i/web/status/1…',
   'indices': [115, 138]}]}