### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Important**

Please take note that you will work with the Twitter API. Keep in mind that the Twitter API is based on live and current events. Therefore, your output will differ from the outputs provided. For example, the colour of apples was trending yesterday, but today the aerodynamics of aeroplanes is trending.

**This is the solution to the activity.**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [None]:
# Copy the YAML file and your Twitter keys over to this Jupyter Notebook before you start to work.
import yaml
from yaml.loader import SafeLoader
from twitter import *

# Import the YAML file - remember to specify the whole path.
twitter_creds = yaml.safe_load(open('C:/Users/hamh/Dropbox/Coding et al/LSE/2_Python/LSE_DA201_Python/C3_LSE_DA201/Week 5/twitter.yaml', 'r').read())

# Pass your Twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key'] ))

In [None]:
# See if you are connected.
print(twitter_api)

In [None]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q="#python")

# View the output.
print(python_tweets)

## 2. Identify New York and London

In [None]:
# Determine worldwide trends.
trends_worldwide = twitter_api.trends.available()

# How many trends are available?
print(len(trends_worldwide))

# Example of trends_worldwide.
trends_worldwide[0]

## New York

In [None]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(new_york))

# Use index to find New York.
new_york[0]

# List of where on earth identifier (woeid).
new_york[0]['woeid']

## London

In [None]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
print(len(london))

# Use index to find London.
london[0]

# List of where on earth identifier (woeid).
london[0]['woeid']

## 3. Common trends

## New York

In [None]:
# Look at trends in New York.
new_york_trends = twitter_api.trends.place(_id = new_york[0]['woeid'])

# View the output.
new_york_trends

In [None]:
# Look at the output as a DataFrame.
# Import Pandas.
import pandas as pd

# Create a DataFrame.
new_york_trends_pd = pd.DataFrame(new_york_trends[0]['trends'])

# View a DataFrame.
new_york_trends_pd

In [None]:
# Narrow list down to 50,000 tweets.
new_york_trends_over50k_pd = new_york_trends_pd[new_york_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(new_york_trends_over50k_pd.shape)
new_york_trends_over50k_pd

In [None]:
# Save output as a CSV file.
new_york_trends_over50k_pd.to_csv('new_york_trends_over50k.csv', index=False)

## London

In [None]:
# Look at trends in London.
london = twitter_api.trends.place(_id = london[0]['woeid'])

# View the output.
london

In [None]:
# Look at the output as a DataFrame.

# Create a DataFrame.
london_trends_pd = pd.DataFrame(london[0]['trends'])

# View the DataFrame.
london_trends_pd

In [None]:
# Narrow list down to 50,000 tweets.
london_trends_over50k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(london_trends_over50k_pd.shape)
london_trends_over50k_pd

In [None]:
# Save output as a CSV file.
london_trends_over50k_pd.to_csv('london_trends_over50k.csv', index=False)

# Common trends

In [None]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

In [None]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
london[0]['woeid']

In [None]:
# Search for each city.
# Import JSON.
import json

# Search for New York.
new_york_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(new_york_trends, indent=4))

In [None]:
# Search for London.
london_trends = twitter_api.trends.place(_id=44418)

# View JSON output.
print (json.dumps(london_trends, indent=4))

In [None]:
# Find common topics.
new_york_trends_list = [trend['name'] for trend in new_york_trends[0]['trends']]

# View output.
print(new_york_trends_list)

In [None]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View output.
print(london_trends_list)

In [None]:
# Find trends between cities.
new_york_trends_set = set(new_york_trends_list)
london_trends_set = set(london_trends_list)

# Set variable.
common_trends = new_york_trends_set.intersection(london_trends_set)

# View output.
print(common_trends)

## Search for #Bitcoin

In [None]:
# Run a test with #Bitcoin.
bitcoin_tweets = twitter_api.search.tweets(q="#Bitcoin")

# View JSON output.
print(json.dumps(bitcoin_tweets, indent=4))