#### San Francisco Tweet Analysis

In this notebook I will be collecting data from the Twitter API. I created an app through the Twitter developer portal which allows me to make requests to Twitter's API endpoints including a generic search based on keywords. Here I pulled in 42k tweets from 11/22/22 - 11/28/22 related to San Francisco. As someone who lives in San Francisco I am curious to identify which twitter accounts I should follow to stay aligned with whats happening in the city as well as just generaly identifying what people think of San Francisco.

**Limitations**: Unfortunately twitter only allows access to the full tweet archives for those with academic research access and so here I am limited to pulling 2M tweets per month from the most recent week's tweets. 

In [1]:
# import libraries
# For sending GET requests from the API
import requests
# For saving access tokens and for file management when creating and adding to the dataset
import os
# For dealing with json responses we receive from the API
import json

import pandas as pd
import csv
# For parsing the dates received from twitter in readable formats
import datetime
import dateutil.parser
import unicodedata
#To add wait time between requests
import time

In [2]:
import tweepy
import config

client = tweepy.Client(bearer_token= config.bearer)

In [3]:
today = datetime.datetime.now(datetime.timezone.utc)
start= today - datetime.timedelta(days=6)

In [5]:
#code leveraged from Professor Jeremy Foote of the Community Data Science Collective
tweet_list = []
for response in tweepy.Paginator(client.search_recent_tweets,
                                 query= 'SanFrancisco OR San Francisco -is:retweet lang:en', 
                                 max_results=100, 
                                 tweet_fields = ['created_at','public_metrics', 'author_id'], 
                                 user_fields = ['username', 'public_metrics', 'location'],
                                 start_time = str(start.isoformat()),
                                 end_time = str(today.isoformat()),
                                 expansions = ['author_id']):
    time.sleep(1)
    tweet_list.append(response)

In [7]:
#code leveraged from Professor Jeremy Foote of the Community Data Science Collective
tweet_data = []
user_data = {}

for t in tweet_list:
    for user in t.includes['users']:
        user_data[user.id] = {'username' : user.username,
                             'followers' : user.public_metrics['followers_count'],
                             'description': user.description,
                             'location': user.location}
        
    for x in t.data:
        author_data = user_data[x.author_id]
        tweet_data.append({'author_id':x.author_id,
                          'username': author_data['username'],
                          'author_followers': author_data['followers'],
                          'author_description': author_data['description'],
                          'author_location': author_data['location'],
                          'id':x.id,
                           'text': x.text,
                          'created_at':x.created_at,
                          'retweets':x.public_metrics['retweet_count'],
                          'replies':x.public_metrics['reply_count'],
                           'likes':x.public_metrics['like_count'],
                           'quotes': x.public_metrics['quote_count']
                          })
        
df = pd.DataFrame(tweet_data)

In [10]:
df.to_csv('data/SFTweets.csv')