### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Important**

Please take note that you will work with the Twitter API. Keep in mind that the Twitter API is based on live and current events. Therefore, your output will differ from the outputs provided. For example, the colour of apples was trending yesterday, but today the aerodynamics of aeroplanes is trending.

**This is the solution to the activity.**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [1]:
# Copy the YAML file and your Twitter keys over to this Jupyter Notebook before you start to work.
import yaml
from yaml.loader import SafeLoader
from twitter import *

# Import the YAML file - remember to specify the whole path.
twitter_creds = yaml.safe_load(open('twitter_tmp.yaml', 'r').read())

# Pass your Twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key'] ))

In [2]:
# See if you are connected.
print(twitter_api)

<twitter.api.Twitter object at 0x7f9ba9928c40>


In [3]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q="#python")

# View the output.
print(python_tweets)

{'statuses': [{'created_at': 'Sat Jul 02 18:39:08 +0000 2022', 'id': 1543303272865341443, 'id_str': '1543303272865341443', 'text': 'RT @RealBenjizo: 💥 Weekend Giveaway 💥\n\nGiving FREE copies of this book to Python Beginners who need some practice.\n\nLike, Retweet and Comme…', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'RealBenjizo', 'name': 'Benjamin Bennett Alexander', 'id': 1388894322553237504, 'id_str': '1388894322553237504', 'indices': [3, 15]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1268061397, 'id_str': '1268061397', 'name': 'HAR65', 'screen_name': 'Mendolo65', 'location': 'Argentina', 'description': 'Argentino, Ap

## 2. Identify New York and London

In [4]:
# Determine worldwide trends.
trends_worldwide = twitter_api.trends.available()

# How many trends are available?
print(len(trends_worldwide))

# Example of trends_worldwide.
trends_worldwide[0]

467


{'name': 'Worldwide',
 'placeType': {'code': 19, 'name': 'Supername'},
 'url': 'http://where.yahooapis.com/v1/place/1',
 'parentid': 0,
 'country': '',
 'woeid': 1,
 'countryCode': None}

## New York

In [5]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(new_york))

# Use index to find New York.
new_york[0]

# List of where on earth identifier (woeid).
new_york[0]['woeid']

1


2459115

## London

In [6]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
print(len(london))

# Use index to find London.
london[0]

# List of where on earth identifier (woeid).
london[0]['woeid']

1


44418

## 3. Common trends

## New York

In [7]:
# Look at trends in New York.
new_york_trends = twitter_api.trends.place(_id = new_york[0]['woeid'])

# View the output.
new_york_trends

[{'trends': [{'name': '#StrangerThings4',
    'url': 'http://twitter.com/search?q=%23StrangerThings4',
    'promoted_content': None,
    'query': '%23StrangerThings4',
    'tweet_volume': 398204},
   {'name': 'Ohio',
    'url': 'http://twitter.com/search?q=Ohio',
    'promoted_content': None,
    'query': 'Ohio',
    'tweet_volume': 192326},
   {'name': 'Independence Day',
    'url': 'http://twitter.com/search?q=%22Independence+Day%22',
    'promoted_content': None,
    'query': '%22Independence+Day%22',
    'tweet_volume': 51676},
   {'name': 'Indiana',
    'url': 'http://twitter.com/search?q=Indiana',
    'promoted_content': None,
    'query': 'Indiana',
    'tweet_volume': 57821},
   {'name': 'Arizona',
    'url': 'http://twitter.com/search?q=Arizona',
    'promoted_content': None,
    'query': 'Arizona',
    'tweet_volume': 78262},
   {'name': 'Wyoming',
    'url': 'http://twitter.com/search?q=Wyoming',
    'promoted_content': None,
    'query': 'Wyoming',
    'tweet_volume': 50108

In [8]:
# Look at the output as a DataFrame.
# Import Pandas.
import pandas as pd

# Create a DataFrame.
new_york_trends_pd = pd.DataFrame(new_york_trends[0]['trends'])

# View a DataFrame.
new_york_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#StrangerThings4,http://twitter.com/search?q=%23StrangerThings4,,%23StrangerThings4,398204.0
1,Ohio,http://twitter.com/search?q=Ohio,,Ohio,192326.0
2,Independence Day,http://twitter.com/search?q=%22Independence+Da...,,%22Independence+Day%22,51676.0
3,Indiana,http://twitter.com/search?q=Indiana,,Indiana,57821.0
4,Arizona,http://twitter.com/search?q=Arizona,,Arizona,78262.0
5,Wyoming,http://twitter.com/search?q=Wyoming,,Wyoming,50108.0
6,Kyrgios,http://twitter.com/search?q=Kyrgios,,Kyrgios,13562.0
7,Ronaldo,http://twitter.com/search?q=Ronaldo,,Ronaldo,213085.0
8,#Caturday,http://twitter.com/search?q=%23Caturday,,%23Caturday,18807.0
9,Andujar,http://twitter.com/search?q=Andujar,,Andujar,


In [9]:
# Narrow list down to 50,000 tweets.
new_york_trends_over50k_pd = new_york_trends_pd[new_york_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(new_york_trends_over50k_pd.shape)
new_york_trends_over50k_pd

(10, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
11,#KinnPorscheEP13,http://twitter.com/search?q=%23KinnPorscheEP13,,%23KinnPorscheEP13,1129104.0
0,#StrangerThings4,http://twitter.com/search?q=%23StrangerThings4,,%23StrangerThings4,398204.0
7,Ronaldo,http://twitter.com/search?q=Ronaldo,,Ronaldo,213085.0
1,Ohio,http://twitter.com/search?q=Ohio,,Ohio,192326.0
18,She's 10,http://twitter.com/search?q=%22She%27s+10%22,,%22She%27s+10%22,115094.0
4,Arizona,http://twitter.com/search?q=Arizona,,Arizona,78262.0
40,Sainz,http://twitter.com/search?q=Sainz,,Sainz,66258.0
3,Indiana,http://twitter.com/search?q=Indiana,,Indiana,57821.0
2,Independence Day,http://twitter.com/search?q=%22Independence+Da...,,%22Independence+Day%22,51676.0
5,Wyoming,http://twitter.com/search?q=Wyoming,,Wyoming,50108.0


In [10]:
# Save output as a CSV file.
new_york_trends_over50k_pd.to_csv('new_york_trends_over50k.csv', index=False)

## London

In [11]:
# Look at trends in London.
london = twitter_api.trends.place(_id = london[0]['woeid'])

# View the output.
london

[{'trends': [{'name': '#RLCS',
    'url': 'http://twitter.com/search?q=%23RLCS',
    'promoted_content': None,
    'query': '%23RLCS',
    'tweet_volume': None},
   {'name': '#GunsNRoses',
    'url': 'http://twitter.com/search?q=%23GunsNRoses',
    'promoted_content': None,
    'query': '%23GunsNRoses',
    'tweet_volume': None},
   {'name': '#ManUtd',
    'url': 'http://twitter.com/search?q=%23ManUtd',
    'promoted_content': None,
    'query': '%23ManUtd',
    'tweet_volume': None},
   {'name': 'Albon',
    'url': 'http://twitter.com/search?q=Albon',
    'promoted_content': None,
    'query': 'Albon',
    'tweet_volume': None},
   {'name': 'Trialist',
    'url': 'http://twitter.com/search?q=Trialist',
    'promoted_content': None,
    'query': 'Trialist',
    'tweet_volume': None},
   {'name': 'AirPods',
    'url': 'http://twitter.com/search?q=AirPods',
    'promoted_content': None,
    'query': 'AirPods',
    'tweet_volume': 23536},
   {'name': '#IStandWithJKRowling',
    'url': 'ht

In [12]:
# Look at the output as a DataFrame.

# Create a DataFrame.
london_trends_pd = pd.DataFrame(london[0]['trends'])

# View the DataFrame.
london_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#RLCS,http://twitter.com/search?q=%23RLCS,,%23RLCS,
1,#GunsNRoses,http://twitter.com/search?q=%23GunsNRoses,,%23GunsNRoses,
2,#ManUtd,http://twitter.com/search?q=%23ManUtd,,%23ManUtd,
3,Albon,http://twitter.com/search?q=Albon,,Albon,
4,Trialist,http://twitter.com/search?q=Trialist,,Trialist,
5,AirPods,http://twitter.com/search?q=AirPods,,AirPods,23536.0
6,#IStandWithJKRowling,http://twitter.com/search?q=%23IStandWithJKRow...,,%23IStandWithJKRowling,11958.0
7,#Caturday,http://twitter.com/search?q=%23Caturday,,%23Caturday,18835.0
8,Speed,http://twitter.com/search?q=Speed,,Speed,105399.0
9,Southport,http://twitter.com/search?q=Southport,,Southport,


In [13]:
# Narrow list down to 50,000 tweets.
london_trends_over50k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(london_trends_over50k_pd.shape)
london_trends_over50k_pd

(5, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
25,Ronaldo,http://twitter.com/search?q=Ronaldo,,Ronaldo,214240.0
8,Speed,http://twitter.com/search?q=Speed,,Speed,105399.0
17,Bumrah,http://twitter.com/search?q=Bumrah,,Bumrah,76495.0
35,Sainz,http://twitter.com/search?q=Sainz,,Sainz,66295.0
43,Ten Hag,http://twitter.com/search?q=%22Ten+Hag%22,,%22Ten+Hag%22,50581.0


In [14]:
# Save output as a CSV file.
london_trends_over50k_pd.to_csv('london_trends_over50k.csv', index=False)

### compare cities

In [15]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

2459115

In [16]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
london[0]['woeid']

44418

In [17]:
# Search for each city.
# Import JSON.
import json

# Search for New York.
new_york_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(new_york_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "#StrangerThings4",
                "url": "http://twitter.com/search?q=%23StrangerThings4",
                "promoted_content": null,
                "query": "%23StrangerThings4",
                "tweet_volume": 398893
            },
            {
                "name": "Ohio",
                "url": "http://twitter.com/search?q=Ohio",
                "promoted_content": null,
                "query": "Ohio",
                "tweet_volume": 193301
            },
            {
                "name": "Independence Day",
                "url": "http://twitter.com/search?q=%22Independence+Day%22",
                "promoted_content": null,
                "query": "%22Independence+Day%22",
                "tweet_volume": 51882
            },
            {
                "name": "Indiana",
                "url": "http://twitter.com/search?q=Indiana",
                "promoted_content": null,
                "query": "Indi

In [18]:
# Search for London.
london_trends = twitter_api.trends.place(_id=44418)

# View JSON output.
print (json.dumps(london_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "#RLCS",
                "url": "http://twitter.com/search?q=%23RLCS",
                "promoted_content": null,
                "query": "%23RLCS",
                "tweet_volume": null
            },
            {
                "name": "#ManUtd",
                "url": "http://twitter.com/search?q=%23ManUtd",
                "promoted_content": null,
                "query": "%23ManUtd",
                "tweet_volume": null
            },
            {
                "name": "Albon",
                "url": "http://twitter.com/search?q=Albon",
                "promoted_content": null,
                "query": "Albon",
                "tweet_volume": null
            },
            {
                "name": "Trialist",
                "url": "http://twitter.com/search?q=Trialist",
                "promoted_content": null,
                "query": "Trialist",
                "tweet_volume": null
            },
         

In [19]:
# Find common topics.
new_york_trends_list = [trend['name'] for trend in new_york_trends[0]['trends']]

# View output.
print(new_york_trends_list)

['#StrangerThings4', 'Ohio', 'Independence Day', 'Indiana', 'Arizona', 'Wyoming', 'Kyrgios', 'Ronaldo', '#Caturday', 'Chapman', 'Andujar', '#VShojoNext', '#KinnPorscheEP13', 'Matt Carpenter', '#UFC276', 'pappy', 'Tsitsipas', 'Wilbur', 'Kson', "She's 10", 'Miggy', 'Touki', 'Gallo', 'Happy Birthday Mac', 'Clover', 'Darius Garland', 'Jim Breuer', 'Gerrit Cole', 'Shane McClanahan', 'Cornet', 'Pete Arredondo', 'Happy 4th', 'minecraft launcher', 'Bayern', 'Life Is Good', "Damon's Mound", 'Gausman', 'Napoli', 'Social Security and Medicare', 'Swiatek', 'Uvalde City Council', 'Nadal', 'Namor', 'Kirk McCarty', 'Rich Paul', 'Sainz', 'Daily Quordle 159', 'Hydro City', 'Joey Hand', 'Attuma']


In [20]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View output.
print(london_trends_list)

['#RLCS', '#ManUtd', 'Albon', 'Trialist', 'AirPods', '#IStandWithJKRowling', '#Caturday', 'Speed', 'Southport', 'Hamilton', '#CoralEclipse', 'Birmingham City', 'Perez', 'Dybala', 'Free Wind', 'Anne Diamond', 'Haydock', 'Bumrah', 'De Gea', 'Jacinda', 'Rangers', 'Eddie Jones', 'Tarkowski', 'Lancaster', 'Hickey', 'Ronaldo', 'Kyrgios', 'Tsitsipas', 'Bayern', 'Napoli', 'Nadal', 'Martial', 'Rashford', 'Wilbur', 'Latifi', 'Sainz', 'South Africa', 'Biggar', 'Downes', 'Ronny', 'Andy Goram', 'Ronnie', 'Laporta', 'Ten Hag', 'Osimhen', 'Alize Cornet', 'Duncan Castles', "He's 37", 'Boks', 'Crowley']


In [21]:
# Find trends between cities.
new_york_trends_set = set(new_york_trends_list)
london_trends_set = set(london_trends_list)

# Set variable.
common_trends = new_york_trends_set.intersection(london_trends_set)

# View output.
print(common_trends)

{'Wilbur', 'Tsitsipas', '#Caturday', 'Ronaldo', 'Nadal', 'Napoli', 'Bayern', 'Kyrgios', 'Sainz'}


## Search for #Bitcoin

In [22]:
# Run a test with #Bitcoin.
bitcoin_tweets = twitter_api.search.tweets(q="#Bitcoin")

# View JSON output.
print(json.dumps(bitcoin_tweets, indent=4))

{
    "statuses": [
        {
            "created_at": "Sat Jul 02 18:46:42 +0000 2022",
            "id": 1543305178530693121,
            "id_str": "1543305178530693121",
            "text": "Jfrey found #bitcoin in a User vault at this location! Join me playing #coinhuntworld, It's awesome!\u2026 https://t.co/jDighQ0ZHT",
            "truncated": true,
            "entities": {
                "hashtags": [
                    {
                        "text": "bitcoin",
                        "indices": [
                            12,
                            20
                        ]
                    },
                    {
                        "text": "coinhuntworld",
                        "indices": [
                            71,
                            85
                        ]
                    }
                ],
                "symbols": [],
                "user_mentions": [],
                "urls": [
                    {
                 