# Extracting tweets using the twitter API
##### Step 1: Get Blodgett_500k.csv file from the dataset folder. 
##### Step 2: Get access to twitter developer account and get your api secret key.
##### Step 3: Place both these file in the same folder or change the path in the code to wherever both files are located.
##### Step 4: Run all the cells and you will get the Blodget_50k.csv file with 50 k tweets that can be used to test our model.

Note: It is quite often easier to use [open-source data](https://github.com/slanglab/twitteraae) already available, as it takes time to retrieve the tweets using the API. Once we retrieve tweets from this source, we get approximately 1M tweets. The tweets with high likelihood (>0.9) of AAE dialect is predicted by the Blodgett Classifier and it comes close to 500k tweets. This 500k data is used as the "Blodgett_500k.csv" file in our code below.

## Importing the libraries

In [None]:
import tweepy
import pandas as pd

## Reading blodgett 500 k files and the twitter secret key

In [None]:
df = pd.read_csv("/content/Blodgett_500k.csv")

#make sure to have twitter elevated access to extracted the tweets.
with open("/content/twitter-api-secret.txt") as api_file:
  api_secret = api_file.read().splitlines()

## Authentication

In [None]:
auth = tweepy.AppAuthHandler(api_secret[0], api_secret[1])

api = tweepy.API(auth, wait_on_rate_limit=True)

## Dictionary where all the tweet id's and tweet text will be stored

In [None]:
data_dict = {"Tweet_id":[],"Tweet_text":[]}

## API call to fetch the tweets and store the same in the dictionary

In [None]:
for tweet_id in df["Tweet_id"]:
  if len(data_dict["Tweet_id"]) == 50000:
    break
  try:
    tweet = api.get_status(tweet_id)._json["text"]
    data_dict["Tweet_id"].append(tweet_id)
    data_dict["Tweet_text"].append(tweet)
  except: # The exception handling is done to avoid the code from crashing when a certain tweet was not found as it was deleted or removed.
    pass



## Creating data frame from the dictionary and then generating a csv file which will be used further.

In [None]:
tweet_df = pd.DataFrame(data_dict)
tweet_df.to_csv("Blodget_50k.csv")