### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Important**

Please take note that you will work with the Twitter API. Keep in mind that the Twitter API is based on live and current events. Therefore, your output will differ from the outputs provided. For example, the colour of apples was trending yesterday, but today the aerodynamics of aeroplanes is trending.

**This is the solution to the activity.**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [3]:
# Copy the YAML file and your Twitter keys over to this Jupyter Notebook before you start to work.
import yaml
from yaml.loader import SafeLoader
from twitter import *

# Import the YAML file - remember to specify the whole path.
twitter_creds = yaml.safe_load(open('twitter.yaml', 'r').read())

# Pass your Twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key'] ))

In [4]:
# See if you are connected.
print(twitter_api)

<twitter.api.Twitter object at 0x00000228D4D504F0>


In [5]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q="#python")

# View the output.
print(python_tweets)

{'statuses': [{'created_at': 'Thu Jun 30 12:22:46 +0000 2022', 'id': 1542483782267273219, 'id_str': '1542483782267273219', 'text': 'RT @NickGweezy: Concatenate\n\n#artificialintelligence #ai #Python #DataScience #MachineLearning #ml #IoT #Python #RStats #js #TensorFlow #Se…', 'truncated': False, 'entities': {'hashtags': [{'text': 'artificialintelligence', 'indices': [29, 52]}, {'text': 'ai', 'indices': [53, 56]}, {'text': 'Python', 'indices': [57, 64]}, {'text': 'DataScience', 'indices': [65, 77]}, {'text': 'MachineLearning', 'indices': [78, 94]}, {'text': 'ml', 'indices': [95, 98]}, {'text': 'IoT', 'indices': [99, 103]}, {'text': 'Python', 'indices': [104, 111]}, {'text': 'RStats', 'indices': [112, 119]}, {'text': 'js', 'indices': [120, 123]}, {'text': 'TensorFlow', 'indices': [124, 135]}], 'symbols': [], 'user_mentions': [{'screen_name': 'NickGweezy', 'name': 'NickGweezy \uf8ff', 'id': 1487841070880808969, 'id_str': '1487841070880808969', 'indices': [3, 14]}], 'urls': []}, 'metadata'

## 2. Identify New York and London

In [7]:
# Determine worldwide trends.
trends_worldwide = twitter_api.trends.available()

# How many trends are available?
print(len(trends_worldwide))

# Example of trends_worldwide.
trends_worldwide[0]

467


{'name': 'Worldwide',
 'placeType': {'code': 19, 'name': 'Supername'},
 'url': 'http://where.yahooapis.com/v1/place/1',
 'parentid': 0,
 'country': '',
 'woeid': 1,
 'countryCode': None}

## New York

In [8]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(new_york))

# Use index to find New York.
new_york[0]

# List of where on earth identifier (woeid).
new_york[0]['woeid']

1


2459115

## London

In [9]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
print(len(london))

# Use index to find London.
london[0]

# List of where on earth identifier (woeid).
london[0]['woeid']

1


44418

## 3. Common trends

## New York

In [10]:
# Look at trends in New York.
new_york_trends = twitter_api.trends.place(_id = new_york[0]['woeid'])

# View the output.
new_york_trends

[{'trends': [{'name': '#thursdaymorning',
    'url': 'http://twitter.com/search?q=%23thursdaymorning',
    'promoted_content': None,
    'query': '%23thursdaymorning',
    'tweet_volume': None},
   {'name': '#VenmoMe',
    'url': 'http://twitter.com/search?q=%23VenmoMe',
    'promoted_content': None,
    'query': '%23VenmoMe',
    'tweet_volume': 12650},
   {'name': '#ThursdayThoughts',
    'url': 'http://twitter.com/search?q=%23ThursdayThoughts',
    'promoted_content': None,
    'query': '%23ThursdayThoughts',
    'tweet_volume': 13899},
   {'name': 'Daily Quordle 157',
    'url': 'http://twitter.com/search?q=%22Daily+Quordle+157%22',
    'promoted_content': None,
    'query': '%22Daily+Quordle+157%22',
    'tweet_volume': None},
   {'name': '#thursdayvibes',
    'url': 'http://twitter.com/search?q=%23thursdayvibes',
    'promoted_content': None,
    'query': '%23thursdayvibes',
    'tweet_volume': 11079},
   {'name': 'Wordle 376 X',
    'url': 'http://twitter.com/search?q=%22Wordle+

In [11]:
# Look at the output as a DataFrame.
# Import Pandas.
import pandas as pd

# Create a DataFrame.
new_york_trends_pd = pd.DataFrame(new_york_trends[0]['trends'])

# View a DataFrame.
new_york_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#thursdaymorning,http://twitter.com/search?q=%23thursdaymorning,,%23thursdaymorning,
1,#VenmoMe,http://twitter.com/search?q=%23VenmoMe,,%23VenmoMe,12650.0
2,#ThursdayThoughts,http://twitter.com/search?q=%23ThursdayThoughts,,%23ThursdayThoughts,13899.0
3,Daily Quordle 157,http://twitter.com/search?q=%22Daily+Quordle+1...,,%22Daily+Quordle+157%22,
4,#thursdayvibes,http://twitter.com/search?q=%23thursdayvibes,,%23thursdayvibes,11079.0
5,Wordle 376 X,http://twitter.com/search?q=%22Wordle+376+X%22,,%22Wordle+376+X%22,
6,Happy Friday Eve,http://twitter.com/search?q=%22Happy+Friday+Ev...,,%22Happy+Friday+Eve%22,
7,Good Thursday,http://twitter.com/search?q=%22Good+Thursday%22,,%22Good+Thursday%22,18603.0
8,Snake Island,http://twitter.com/search?q=%22Snake+Island%22,,%22Snake+Island%22,28565.0
9,#IOnlyAnswerMyPhoneWhen,http://twitter.com/search?q=%23IOnlyAnswerMyPh...,,%23IOnlyAnswerMyPhoneWhen,


In [12]:
# Narrow list down to 50,000 tweets.
new_york_trends_over50k_pd = new_york_trends_pd[new_york_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(new_york_trends_over50k_pd.shape)
new_york_trends_over50k_pd

(7, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
13,Liz Cheney,http://twitter.com/search?q=%22Liz+Cheney%22,,%22Liz+Cheney%22,120971.0
39,Baymax,http://twitter.com/search?q=Baymax,,Baymax,91756.0
19,Richarlison,http://twitter.com/search?q=Richarlison,,Richarlison,71940.0
25,Conte,http://twitter.com/search?q=Conte,,Conte,69042.0
27,Diana,http://twitter.com/search?q=Diana,,Diana,60414.0
24,Emmett Till,http://twitter.com/search?q=%22Emmett+Till%22,,%22Emmett+Till%22,58642.0
30,Everton,http://twitter.com/search?q=Everton,,Everton,57071.0


In [13]:
# Save output as a CSV file.
new_york_trends_over50k_pd.to_csv('new_york_trends_over50k.csv', index=False)

## London

In [16]:
# Look at trends in London.
london = twitter_api.trends.place(_id = london[0]['woeid'])

# View the output.
london

[{'trends': [{'name': 'Putin',
    'url': 'http://twitter.com/search?q=Putin',
    'promoted_content': None,
    'query': 'Putin',
    'tweet_volume': 243108},
   {'name': '#ThisMorning',
    'url': 'http://twitter.com/search?q=%23ThisMorning',
    'promoted_content': None,
    'query': '%23ThisMorning',
    'tweet_volume': None},
   {'name': 'Glen',
    'url': 'http://twitter.com/search?q=Glen',
    'promoted_content': None,
    'query': 'Glen',
    'tweet_volume': None},
   {'name': 'Congolese',
    'url': 'http://twitter.com/search?q=Congolese',
    'promoted_content': None,
    'query': 'Congolese',
    'tweet_volume': None},
   {'name': '#woncaeurope2022',
    'url': 'http://twitter.com/search?q=%23woncaeurope2022',
    'promoted_content': None,
    'query': '%23woncaeurope2022',
    'tweet_volume': None},
   {'name': '#RCGPAC',
    'url': 'http://twitter.com/search?q=%23RCGPAC',
    'promoted_content': None,
    'query': '%23RCGPAC',
    'tweet_volume': None},
   {'name': '#Metho

In [17]:
# Look at the output as a DataFrame.

# Create a DataFrame.
london_trends_pd = pd.DataFrame(london[0]['trends'])

# View the DataFrame.
london_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,Putin,http://twitter.com/search?q=Putin,,Putin,243108.0
1,#ThisMorning,http://twitter.com/search?q=%23ThisMorning,,%23ThisMorning,
2,Glen,http://twitter.com/search?q=Glen,,Glen,
3,Congolese,http://twitter.com/search?q=Congolese,,Congolese,
4,#woncaeurope2022,http://twitter.com/search?q=%23woncaeurope2022,,%23woncaeurope2022,
5,#RCGPAC,http://twitter.com/search?q=%23RCGPAC,,%23RCGPAC,
6,#MethodistConf,http://twitter.com/search?q=%23MethodistConf,,%23MethodistConf,
7,Bernie Ecclestone,http://twitter.com/search?q=%22Bernie+Ecclesto...,,%22Bernie+Ecclestone%22,
8,Richarlison,http://twitter.com/search?q=Richarlison,,Richarlison,72011.0
9,Snake Island,http://twitter.com/search?q=%22Snake+Island%22,,%22Snake+Island%22,28690.0


In [18]:
# Narrow list down to 50,000 tweets.
london_trends_over50k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(london_trends_over50k_pd.shape)
london_trends_over50k_pd

(6, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,Putin,http://twitter.com/search?q=Putin,,Putin,243108.0
11,Spurs,http://twitter.com/search?q=Spurs,,Spurs,123465.0
8,Richarlison,http://twitter.com/search?q=Richarlison,,Richarlison,72011.0
21,Conte,http://twitter.com/search?q=Conte,,Conte,69179.0
12,Everton,http://twitter.com/search?q=Everton,,Everton,57122.0
45,Formula 1,http://twitter.com/search?q=%22Formula+1%22,,%22Formula+1%22,54778.0


In [19]:
# Save output as a CSV file.
london_trends_over50k_pd.to_csv('london_trends_over50k.csv', index=False)

### compare cities

In [20]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

2459115

In [21]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
london[0]['woeid']

44418

In [22]:
# Search for each city.
# Import JSON.
import json

# Search for New York.
new_york_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(new_york_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "#thursdaymorning",
                "url": "http://twitter.com/search?q=%23thursdaymorning",
                "promoted_content": null,
                "query": "%23thursdaymorning",
                "tweet_volume": null
            },
            {
                "name": "#VenmoMe",
                "url": "http://twitter.com/search?q=%23VenmoMe",
                "promoted_content": null,
                "query": "%23VenmoMe",
                "tweet_volume": 12735
            },
            {
                "name": "#ThursdayThoughts",
                "url": "http://twitter.com/search?q=%23ThursdayThoughts",
                "promoted_content": null,
                "query": "%23ThursdayThoughts",
                "tweet_volume": 13953
            },
            {
                "name": "Daily Quordle 157",
                "url": "http://twitter.com/search?q=%22Daily+Quordle+157%22",
                "promoted_content": nu

In [23]:
# Search for London.
london_trends = twitter_api.trends.place(_id=44418)

# View JSON output.
print (json.dumps(london_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "Putin",
                "url": "http://twitter.com/search?q=Putin",
                "promoted_content": null,
                "query": "Putin",
                "tweet_volume": 243785
            },
            {
                "name": "#ThisMorning",
                "url": "http://twitter.com/search?q=%23ThisMorning",
                "promoted_content": null,
                "query": "%23ThisMorning",
                "tweet_volume": null
            },
            {
                "name": "Glen",
                "url": "http://twitter.com/search?q=Glen",
                "promoted_content": null,
                "query": "Glen",
                "tweet_volume": null
            },
            {
                "name": "Congolese",
                "url": "http://twitter.com/search?q=Congolese",
                "promoted_content": null,
                "query": "Congolese",
                "tweet_volume": null
           

In [24]:
# Find common topics.
new_york_trends_list = [trend['name'] for trend in new_york_trends[0]['trends']]

# View output.
print(new_york_trends_list)

['#thursdaymorning', '#VenmoMe', '#ThursdayThoughts', 'Daily Quordle 157', '#thursdayvibes', 'Wordle 376 X', 'Happy Friday Eve', 'Snake Island', '#IOnlyAnswerMyPhoneWhen', 'Good Thursday', '$25 USD', 'Miles Bridges', 'Thankful Thursday', 'Liz Cheney', 'Upper East Side', 'Jesus is Lord', 'Joe Rogan', 'LUMA', 'wooyoung', 'Pam Bondi', 'Richarlison', 'Tony Ornato', 'Killing in the Name', 'Emmett Till', 'Sutton', 'Kidney', 'Conte', 'Diana', 'Reagan Library', 'Favre', 'Guatemala', 'Snowden', 'RATM', 'Everton', 'Cornell', 'Cleaning', 'Josh Naylor', 'Tom Hiddleston', 'Rage Against the Machine', 'Rocco', 'Baymax', '#SocialMediaDay', '#thunderous', '#ThursdayMotivation', '#ThirstyThursday', '#KetanjiBrownJackson', '#happytinacon', '#KidLitPit', '#waffle160', '#RHOBH']


In [25]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View output.
print(london_trends_list)

['Putin', '#ThisMorning', 'Glen', 'Congolese', '#woncaeurope2022', '#RCGPAC', '#MethodistConf', 'Bernie Ecclestone', 'Richarlison', 'Snake Island', 'RIP Rocky', 'Spurs', 'Everton', 'Size 9', 'Moyes', 'Danjuma', 'Size 10', 'Natalie McGarry', 'Captain Tom', 'Lingard', 'Kane', 'Conte', 'Ronaldinho', 'romero', 'Killing in the Name', 'Daily Quordle 157', 'Arlene', 'Dennis', 'iPads', 'Alexandro Bernabei', 'Tom Moore', 'wooyoung', 'Perisic', 'Crown Estate', 'Joma', 'Monkey Island', 'Buckingham Palace', 'Barbados', 'Gavin Williamson', 'Lando', 'Meghan', 'Liz Truss', 'Wordle 376 X', 'Formula 1', 'Tom Hiddleston', 'Branson', 'Richy', 'Royal Family', "O'Brien"]


In [26]:
# Find trends between cities.
new_york_trends_set = set(new_york_trends_list)
london_trends_set = set(london_trends_list)

# Set variable.
common_trends = new_york_trends_set.intersection(london_trends_set)

# View output.
print(common_trends)

{'Wordle 376 X', 'Everton', 'wooyoung', 'Tom Hiddleston', 'Richarlison', 'Snake Island', 'Killing in the Name', 'Daily Quordle 157', 'Conte'}


## Search for #Bitcoin

In [27]:
# Run a test with #Bitcoin.
bitcoin_tweets = twitter_api.search.tweets(q="#Bitcoin")

# View JSON output.
print(json.dumps(bitcoin_tweets, indent=4))

{
    "statuses": [
        {
            "created_at": "Thu Jun 30 12:29:22 +0000 2022",
            "id": 1542485444163665921,
            "id_str": "1542485444163665921",
            "text": "RT @BitcoinMagazine: The Bank of International Settlements proposed today that banks be allowed to hold 1% of reserves in #bitcoin \n\nThat's\u2026",
            "truncated": false,
            "entities": {
                "hashtags": [
                    {
                        "text": "bitcoin",
                        "indices": [
                            122,
                            130
                        ]
                    }
                ],
                "symbols": [],
                "user_mentions": [
                    {
                        "screen_name": "BitcoinMagazine",
                        "name": "Bitcoin Magazine",
                        "id": 361289499,
                        "id_str": "361289499",
                        "indices": [
          