# Week 10
# Load Data using Web API

Many websites (Twitter, Facebook, Kaggle, Reddit, ...) offer Application Programming Interfaces (APIs) which provide access to data on their web server. Today we will use Twitter API to download and analyze some tweets.

**Get started with Twitter API:**

1. Sign up on Twitter.
2. Apply for [developer access](https://developer.twitter.com/en/apps).
3. Create a Twitter app. (You can use lehman website for the placeholder on the urls)

Reference:
- [Twitter API Documentation](https://developer.twitter.com/en/docs/twitter-api)
- [Tweepy Documentation](http://docs.tweepy.org/en/v3.9.0/index.html)

In [None]:
# Install tweepy package for Python
# !pip install tweepy

In [None]:
import tweepy
tweepy.__version__

To learn more about the tweepy API, visit the [offical documentation](http://docs.tweepy.org/en/latest/getting_started.html)

In [None]:
# Copy and paste tokens from "Keys and Access Tokens" tab
consumer_key = "(Paste your token here)"
consumer_secret = "(Paste your token here)"
access_token = "(Paste your token here)"
access_token_secret = "(Paste your token here)"

In [None]:
# User authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

In [None]:
# Create API object to access twitter data
api = tweepy.API(auth)

In [None]:
# Post a tweet from Python
api.update_status("I'm tweeting from #Python in my #DataScience class! @LehmanOCL")

## Task 1: Retrieve tweets from timeline

In [None]:
# My timeline
public_tweets = api.home_timeline(tweet_mode = "extended")

In [None]:
# Look into one tweet data
tweet = public_tweets[5]
tweet

In [None]:
# the _json attribute contains info of the tweet
tweet._json

In [None]:
# Use the _json attribute to infer some information
obj = tweet._json
# obj['created_at']
obj['user']['name']

In [None]:
# Find specific info
print(tweet.full_text)
print(tweet.author.name)
print(tweet.created_at)
print(tweet.author.location)

In [None]:
# Use api.home_timeline without setting tweet_mode="extended" may result in truncated messages.
tweets2 = api.home_timeline()

In [None]:
for idx in range(10):
    print('-' * 80)
    tweet = tweets2[idx]
    print(tweet.author.name)
    print(tweet.text)

## Task 2: Retrieve Tweets from Another User

In [None]:
name = "nytimes"
tweetCount = 20
results = api.user_timeline(id=name, count=tweetCount, tweet_mode = "extended")

In [None]:
for tweet in results:
    print('-' * 80)
    print(tweet.full_text)

## Task 3: Search for Tweets

In [None]:
search_words = "wildfire -filter:retweets"
date_since = "2020-10-01"
cursor = tweepy.Cursor(api.search,
                       q=search_words,
                       lang="en",
                       since=date_since,
                       tweet_mode = "extended")

In [None]:
tweets = cursor.items(10)
for tweet in tweets:
    print('-' * 80)
    print(tweet.full_text)
    print(tweet.author.name)
    print(tweet.author.location)
    print(tweet.created_at)

In [None]:
# Create a Pandas DataFrame to store tweets, authors, and locations
import pandas as pd

# Create an empty data frame
tweets_df = pd.DataFrame(columns=['Name', 'Location', 'Text'])
tweets_df

In [None]:
# Append tweets data to the data frame
tweets = cursor.items(10)
idx = 0
for tweet in tweets:
    tweets_df.loc[idx, :] = [tweet.author.name,
                               tweet.author.location,
                               tweet.full_text]
    idx += 1
                               
tweets_df