### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Important**

Please take note that you will work with the Twitter API. Keep in mind that the Twitter API is based on live and current events. Therefore, your output will differ from the outputs provided. For example, the colour of apples was trending yesterday, but today the aerodynamics of aeroplanes is trending.

**This is the solution to the activity.**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [3]:
# Import warnings.
import warnings
warnings.filterwarnings('ignore')

In [4]:
# Install YAML.
!pip install pyyaml



In [5]:
# Import libraries.
import yaml
from yaml.loader import SafeLoader

In [6]:
# Import the yaml file - remember to specify the whole path.
twitter_creds = yaml.safe_load(open('twitter_tmp.yaml', 'r').read())

# View the keys in the dictionary.
twitter_creds.keys()

dict_keys(['api_key', 'api_secret_key', 'access_token', 'access_token_secret'])

In [7]:
# Install Twitter api.
!pip install twitter



In [8]:
# Import the library.
from twitter import *

In [9]:
# Pass your twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key']))

In [10]:
# Confirm your connection.
print(twitter_api)

<twitter.api.Twitter object at 0x0000015B58DA9070>


In [11]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q='#python')

# View output.
print(python_tweets)

{'statuses': [{'created_at': 'Tue Oct 11 15:33:40 +0000 2022', 'id': 1579857775274110976, 'id_str': '1579857775274110976', 'text': 'RT @waris027: https://t.co/sdMBeIZsGU\n\n#BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #JavaSc…', 'truncated': False, 'entities': {'hashtags': [{'text': 'BigData', 'indices': [39, 47]}, {'text': 'Analytics', 'indices': [48, 58]}, {'text': 'DataScience', 'indices': [59, 71]}, {'text': 'AI', 'indices': [72, 75]}, {'text': 'MachineLearning', 'indices': [76, 92]}, {'text': 'IoT', 'indices': [93, 97]}, {'text': 'IIoT', 'indices': [98, 103]}, {'text': 'Python', 'indices': [104, 111]}, {'text': 'RStats', 'indices': [112, 119]}, {'text': 'TensorFlow', 'indices': [120, 131]}], 'symbols': [], 'user_mentions': [{'screen_name': 'waris027', 'name': 'Malik Waris (میانوالی)', 'id': 431603776, 'id_str': '431603776', 'indices': [3, 12]}], 'urls': [{'url': 'https://t.co/sdMBeIZsGU', 'expanded_url': 'https://www.digiskillspk.com

## 2. Identify New York and London

In [12]:
# Determine worldwide trends.
trends_worldwide = twitter_api.trends.available()

# How many trends are available?
print(len(trends_worldwide))

# Example of trends_worldwide.
trends_worldwide[0]

467


{'name': 'Worldwide',
 'placeType': {'code': 19, 'name': 'Supername'},
 'url': 'http://where.yahooapis.com/v1/place/1',
 'parentid': 0,
 'country': '',
 'woeid': 1,
 'countryCode': None}

## New York

In [13]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(new_york))

# Use index to find New York.
new_york[0]

# List of where on earth identifier (woeid).
new_york[0]['woeid']

1


2459115

## London

In [14]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
print(len(london))

# Use index to find London.
london[0]

# List of where on earth identifier (woeid).
london[0]['woeid']

1


44418

## 3. Common trends

## New York

In [15]:
# Look at trends in New York.
new_york_trends = twitter_api.trends.place(_id = new_york[0]['woeid'])

# View the output.
new_york_trends

[{'trends': [{'name': 'ALCS',
    'url': 'http://twitter.com/search?q=ALCS',
    'promoted_content': None,
    'query': 'ALCS',
    'tweet_volume': None},
   {'name': '#NoQuitInNY',
    'url': 'http://twitter.com/search?q=%23NoQuitInNY',
    'promoted_content': None,
    'query': '%23NoQuitInNY',
    'tweet_volume': None},
   {'name': 'Rangers',
    'url': 'http://twitter.com/search?q=Rangers',
    'promoted_content': None,
    'query': 'Rangers',
    'tweet_volume': 20936},
   {'name': 'Tim Ryan',
    'url': 'http://twitter.com/search?q=%22Tim+Ryan%22',
    'promoted_content': None,
    'query': '%22Tim+Ryan%22',
    'tweet_volume': 162791},
   {'name': 'Tulsi',
    'url': 'http://twitter.com/search?q=Tulsi',
    'promoted_content': None,
    'query': 'Tulsi',
    'tweet_volume': 77625},
   {'name': '#NationalComingOutDay',
    'url': 'http://twitter.com/search?q=%23NationalComingOutDay',
    'promoted_content': None,
    'query': '%23NationalComingOutDay',
    'tweet_volume': 14083},

In [16]:
# Look at the output as a DataFrame.
# Import Pandas.
import pandas as pd

# Create a DataFrame.
new_york_trends_pd = pd.DataFrame(new_york_trends[0]['trends'])

# View a DataFrame.
new_york_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,ALCS,http://twitter.com/search?q=ALCS,,ALCS,
1,#NoQuitInNY,http://twitter.com/search?q=%23NoQuitInNY,,%23NoQuitInNY,
2,Rangers,http://twitter.com/search?q=Rangers,,Rangers,20936.0
3,Tim Ryan,http://twitter.com/search?q=%22Tim+Ryan%22,,%22Tim+Ryan%22,162791.0
4,Tulsi,http://twitter.com/search?q=Tulsi,,Tulsi,77625.0
5,#NationalComingOutDay,http://twitter.com/search?q=%23NationalComingO...,,%23NationalComingOutDay,14083.0
6,#PortfolioDay,http://twitter.com/search?q=%23PortfolioDay,,%23PortfolioDay,57434.0
7,Democrat,http://twitter.com/search?q=Democrat,,Democrat,191844.0
8,Peraza,http://twitter.com/search?q=Peraza,,Peraza,
9,Mbappe,http://twitter.com/search?q=Mbappe,,Mbappe,172775.0


In [17]:
# Narrow list down to 50,000 tweets.
new_york_trends_over50k_pd = new_york_trends_pd[new_york_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(new_york_trends_over50k_pd.shape)
new_york_trends_over50k_pd

(11, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
22,Madrid,http://twitter.com/search?q=Madrid,,Madrid,213791.0
7,Democrat,http://twitter.com/search?q=Democrat,,Democrat,191844.0
9,Mbappe,http://twitter.com/search?q=Mbappe,,Mbappe,172775.0
14,#chainsawman,http://twitter.com/search?q=%23chainsawman,,%23chainsawman,171960.0
3,Tim Ryan,http://twitter.com/search?q=%22Tim+Ryan%22,,%22Tim+Ryan%22,162791.0
41,Independent,http://twitter.com/search?q=Independent,,Independent,114817.0
44,Genie,http://twitter.com/search?q=Genie,,Genie,103691.0
18,Notifications,http://twitter.com/search?q=Notifications,,Notifications,79508.0
4,Tulsi,http://twitter.com/search?q=Tulsi,,Tulsi,77625.0
6,#PortfolioDay,http://twitter.com/search?q=%23PortfolioDay,,%23PortfolioDay,57434.0


In [18]:
# Save output as a CSV file.
new_york_trends_over50k_pd.to_csv('new_york_trends_over50k.csv', index=False)

## London

In [19]:
# Look at trends in London.
london = twitter_api.trends.place(_id = london[0]['woeid'])

# View the output.
london

[{'trends': [{'name': '#WorldMentalHealthDay',
    'url': 'http://twitter.com/search?q=%23WorldMentalHealthDay',
    'promoted_content': None,
    'query': '%23WorldMentalHealthDay',
    'tweet_volume': 221482},
   {'name': 'Mbappe',
    'url': 'http://twitter.com/search?q=Mbappe',
    'promoted_content': None,
    'query': 'Mbappe',
    'tweet_volume': 172775},
   {'name': '#avfc',
    'url': 'http://twitter.com/search?q=%23avfc',
    'promoted_content': None,
    'query': '%23avfc',
    'tweet_volume': 16356},
   {'name': '#NationalComingOutDay',
    'url': 'http://twitter.com/search?q=%23NationalComingOutDay',
    'promoted_content': None,
    'query': '%23NationalComingOutDay',
    'tweet_volume': 14164},
   {'name': '#DetestToryValues',
    'url': 'http://twitter.com/search?q=%23DetestToryValues',
    'promoted_content': None,
    'query': '%23DetestToryValues',
    'tweet_volume': 24382},
   {'name': 'John Cleese',
    'url': 'http://twitter.com/search?q=%22John+Cleese%22',
    '

In [20]:
# Look at the output as a DataFrame.

# Create a DataFrame.
london_trends_pd = pd.DataFrame(london[0]['trends'])

# View the DataFrame.
london_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#WorldMentalHealthDay,http://twitter.com/search?q=%23WorldMentalHeal...,,%23WorldMentalHealthDay,221482.0
1,Mbappe,http://twitter.com/search?q=Mbappe,,Mbappe,172775.0
2,#avfc,http://twitter.com/search?q=%23avfc,,%23avfc,16356.0
3,#NationalComingOutDay,http://twitter.com/search?q=%23NationalComingO...,,%23NationalComingOutDay,14164.0
4,#DetestToryValues,http://twitter.com/search?q=%23DetestToryValues,,%23DetestToryValues,24382.0
5,John Cleese,http://twitter.com/search?q=%22John+Cleese%22,,%22John+Cleese%22,13254.0
6,Nicola Sturgeon,http://twitter.com/search?q=%22Nicola+Sturgeon%22,,%22Nicola+Sturgeon%22,67370.0
7,Gerrard,http://twitter.com/search?q=Gerrard,,Gerrard,12961.0
8,Blink 182,http://twitter.com/search?q=%22Blink+182%22,,%22Blink+182%22,51355.0
9,Scotland,http://twitter.com/search?q=Scotland,,Scotland,89395.0


In [21]:
# Narrow list down to 50,000 tweets.
london_trends_over50k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(london_trends_over50k_pd.shape)
london_trends_over50k_pd

(15, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
28,Kyiv,http://twitter.com/search?q=Kyiv,,Kyiv,247789.0
0,#WorldMentalHealthDay,http://twitter.com/search?q=%23WorldMentalHeal...,,%23WorldMentalHealthDay,221482.0
14,Madrid,http://twitter.com/search?q=Madrid,,Madrid,214146.0
1,Mbappe,http://twitter.com/search?q=Mbappe,,Mbappe,172775.0
39,Viserys,http://twitter.com/search?q=Viserys,,Viserys,156557.0
10,Arsenal,http://twitter.com/search?q=Arsenal,,Arsenal,149794.0
24,Rex Orange County,http://twitter.com/search?q=%22Rex+Orange+Coun...,,%22Rex+Orange+County%22,137724.0
25,Liverpool,http://twitter.com/search?q=Liverpool,,Liverpool,117526.0
27,Gabriel,http://twitter.com/search?q=Gabriel,,Gabriel,98199.0
9,Scotland,http://twitter.com/search?q=Scotland,,Scotland,89395.0


In [22]:
# Save output as a CSV file.
london_trends_over50k_pd.to_csv('london_trends_over50k.csv', index=False)

# Common trends

In [23]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

2459115

In [24]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
london[0]['woeid']

44418

In [25]:
# Search for each city.
# Import JSON.
import json

# Search for New York.
new_york_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(new_york_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "ALCS",
                "url": "http://twitter.com/search?q=ALCS",
                "promoted_content": null,
                "query": "ALCS",
                "tweet_volume": null
            },
            {
                "name": "#NoQuitInNY",
                "url": "http://twitter.com/search?q=%23NoQuitInNY",
                "promoted_content": null,
                "query": "%23NoQuitInNY",
                "tweet_volume": null
            },
            {
                "name": "Rangers",
                "url": "http://twitter.com/search?q=Rangers",
                "promoted_content": null,
                "query": "Rangers",
                "tweet_volume": 20957
            },
            {
                "name": "Tim Ryan",
                "url": "http://twitter.com/search?q=%22Tim+Ryan%22",
                "promoted_content": null,
                "query": "%22Tim+Ryan%22",
                "tweet_volume": 16290

In [26]:
# Search for London.
london_trends = twitter_api.trends.place(_id=44418)

# View JSON output.
print (json.dumps(london_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "#WorldMentalHealthDay",
                "url": "http://twitter.com/search?q=%23WorldMentalHealthDay",
                "promoted_content": null,
                "query": "%23WorldMentalHealthDay",
                "tweet_volume": 221537
            },
            {
                "name": "Mbappe",
                "url": "http://twitter.com/search?q=Mbappe",
                "promoted_content": null,
                "query": "Mbappe",
                "tweet_volume": 173626
            },
            {
                "name": "#avfc",
                "url": "http://twitter.com/search?q=%23avfc",
                "promoted_content": null,
                "query": "%23avfc",
                "tweet_volume": 16358
            },
            {
                "name": "#NationalComingOutDay",
                "url": "http://twitter.com/search?q=%23NationalComingOutDay",
                "promoted_content": null,
                "que

In [27]:
# Find common topics.
new_york_trends_list = [trend['name'] for trend in new_york_trends[0]['trends']]

# View output.
print(new_york_trends_list)

['ALCS', '#NoQuitInNY', 'Rangers', 'Tim Ryan', 'Tulsi', '#NationalComingOutDay', '#PortfolioDay', 'Democrat', 'Peraza', 'Mbappe', 'Blink 182', 'Marwin', 'Effross', '#drivendragon', '#chainsawman', 'Adnan Syed', 'Tom DeLonge', 'Bye Felicia', 'Jonathan Majors', 'Notifications', 'Hannity', 'Matt Skiba', 'Madrid', 'Tommy John', 'Locastro', 'Tito Puente', 'No DJ', 'Turnstile', 'Neymar', 'David Robertson', 'Hae Min Lee', 'LETS GO ASTROS', 'Good Riddance', 'Matzek', 'TSSF', 'Odorizzi', 'Notis', 'TOM IS BACK', 'Rise Against', 'Sinema', 'Daily Quordle 260', 'Independent', 'Russian Asset', "She's a Russian", 'Genie', 'Aaron Hicks', 'Honda', 'Benintendi', 'Alkaline Trio', 'WITH TOM']


In [28]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View output.
print(london_trends_list)

['#WorldMentalHealthDay', 'Mbappe', '#avfc', '#NationalComingOutDay', '#DetestToryValues', 'John Cleese', 'Nicola Sturgeon', 'Gerrard', 'Blink 182', 'Scotland', 'Arsenal', 'Ashley Young', 'Sam Tarry', 'Madrid', 'Villa', 'Festival of Brexit', 'Red Bull', 'Paddy Considine', 'Henderson', 'Therese Coffey', 'british cycling', 'Neymar', 'Deputy PM', 'Daily Quordle 260', 'Rex Orange County', 'Liverpool', 'Yates', 'Gabriel', 'Kyiv', 'Tom DeLonge', 'Jamie Oliver', 'Ilford South', 'mcginn', 'Lucy Letby', 'Coutinho', 'Martinelli', 'Scots', 'The IMF', 'GB News', 'Viserys', 'Enock', 'PSG in January', 'Purslow', 'Klopp', 'Gallows', 'Trent', 'Diaz', 'mings', 'Steve Bruce']


In [29]:
# Find trends between cities.
new_york_trends_set = set(new_york_trends_list)
london_trends_set = set(london_trends_list)

# Set variable.
common_trends = new_york_trends_set.intersection(london_trends_set)

# View output.
print(common_trends)

{'Neymar', 'Madrid', 'Mbappe', '#NationalComingOutDay', 'Tom DeLonge', 'Daily Quordle 260', 'Blink 182'}


## Search for #Bitcoin

In [30]:
# Run a test with #Bitcoin.
bitcoin_tweets = twitter_api.search.tweets(q="#Bitcoin")

# View JSON output.
print(json.dumps(bitcoin_tweets, indent=4))

{
    "statuses": [
        {
            "created_at": "Tue Oct 11 15:35:36 +0000 2022",
            "id": 1579858261658198016,
            "id_str": "1579858261658198016",
            "text": "RT @Coinmatik1: \u00c7EK\u0130L\u0130\u015e VAR!!! \ud83d\udce2\n\n\u27054 K\u0130\u015e\u0130YE $200.00 DE\u011eER\u0130NDE \u00d6D\u00dcL DA\u011eITIYORUZ.\n\n\u2705Kat\u0131l\u0131m \u015eartlar\u0131;\nTakip et; @Ufuk__Karapinar \nTakip et; @\u2026",
            "truncated": false,
            "entities": {
                "hashtags": [],
                "symbols": [],
                "user_mentions": [
                    {
                        "screen_name": "Coinmatik1",
                        "name": "Paramatik",
                        "id": 1309740525470265344,
                        "id_str": "1309740525470265344",
                        "indices": [
                            3,
                            14
                        ]
                    },
                  