## Homework 1: Advanced Track -- Harvest the Twitter API

**Objective:** Write a series of functions that allow you to dynamically harvest Twitter data.

**Estimated Time to Complete:** 4-12 hours

#### Sections

 - **Section 1:** Setting up your developer account, using OAuth1 authentication (approx 45-120 minutes)
 - **Section 2:** Navigating the API documentation, getting your first query string (approx 45-120 minutes)
 - **Section 3:** Writing your API calls (approx 90 - 360 minutes)
 
#### What You'll Turn In:  
 - A `.py` (not a Notebook!) file that contains the functions that you were prompted to create.  These should contain comments demonstrating why your code does what it does, and after it's run, the instructor should be able to make the appropriate function calls in Spyder or any other IDE.

## Section 1:  Setting Up Your Developer Account

Most API's require you to do a little pre-work in order to be able to use them, so the first part of this homework assignment will be spent setting up your developer account so you have API Access.

**Step 1:  Create a Twitter Developer Account**

 - Make sure you have a regular twitter account before you do this
 - You can apply for a developer account here:  https://developer.twitter.com/en/apply-for-access
  - Choose either a student or hobbyist/personal account

**Step 2:  Create An App**

You don't have to intend to build an official software program to have an app.....this is just a way for you to get authentication keys to use with the API.

 - Go to the menu in the upper right hand corner and click on **Your Name** > **Apps**
 - Choose **Create An App**
 - You'll be prompted to enter some information about your app.  Don't worry too much about this, it can say almost anything.  You'll be prompted to list websites where it will be hosted...this can be anything for now.  Use https://generalassemb.ly if you're undecided about what to put.

**Step 3: Create Your API Tokens**

Now that you have an app, you can use its API tokens to go ahead and make requests like we did in class 3.  Like a lot of API's, the Twitter API uses something called OAuth authentication.  

If you didn't wait until the night before this assignment was due and have a spare 30 minutes, you can read a little about it here: https://oauth.net/

In any event, you need API tokens in order to make requests.  Do the following:

 - Go to the **Apps** section of your developer portal
 - Click on the **Details** button for the app that you just created
 - Click on the **Keys & Tokens** tab
  - Two keys should already be given to you:  **API Key** & **API Secret Key**
  - Two you have to generate:  **Access token** & **Access token secret**
 - Generate your Access Token and Access Token Secret keys.  You'll need to write these down when you're done

Now you're ready to make requests to the Twitter api.  Everytime you make a request, you'll need to include the 4 tokens you just created.  (You can always regenerate them for whatever reason).  

**Step 4:  Your First Request**

To make requests to the Twitter API you're going to need a module which is **not** already pre-installed in Anaconda. You'll need to install it via PIP, which is python's package manager.  It's called `requests_oauthlib`.  You can install this via Anaconda Prompt or Terminal by simply typing in the command `pip install requests_oauthlib`, and then you'll be finished.

The logistics of making an OAuth1 authenticated request are very similar to what was done in class 3, but with a few additional steps.  You can see how to do it here:  https://requests.readthedocs.io/en/master/user/authentication/#oauth-1-authentication.  The only thing you'll need to change is the info for your API tokens that are passed into the `OAuth1()` function.

Try making a request to the following URL to confirm that it works: 'https://api.twitter.com/1.1/account/verify_credentials.json'

In [1]:
# your code here
import requests
from requests_oauthlib import OAuth1

# passing in my developer tokens to the OAuth1 function
tokens = OAuth1('NOZHm1aLT1AVmchGbCmiZOAga', 'nPyaYCt8L7ymqGZtU8EqC0a2ypI9aSJgVNIhtoZ0wGsaf3BJw9',
                '1079981876864008192-AlhO4yOa06oW2sXZpLpWPwnOxEERYS', 'o3E0AsKJfDoTBk77UQYExzOG7E46jPYvpWNGAKsD6lUBY')

# url that I'm going to connect to
url = 'https://api.twitter.com/1.1/account/verify_credentials.json'

# initiate the request
req = requests.get(url, auth=tokens).json()

In [2]:
# and here it is
req

{'id': 1079981876864008192,
 'id_str': '1079981876864008192',
 'name': 'Jonathan Bechtel',
 'screen_name': 'JonathanBechte6',
 'location': '',
 'description': '',
 'url': None,
 'entities': {'description': {'urls': []}},
 'protected': False,
 'followers_count': 1,
 'friends_count': 19,
 'listed_count': 0,
 'created_at': 'Tue Jan 01 06:05:19 +0000 2019',
 'favourites_count': 1,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': False,
 'verified': False,
 'statuses_count': 1,
 'lang': None,
 'status': {'created_at': 'Fri Nov 29 23:27:52 +0000 2019',
  'id': 1200557014067363843,
  'id_str': '1200557014067363843',
  'text': "@kunjanshah @paulg @Austen How do you define 'solved'?  I would define it as quality-adjusted real prices that go d… https://t.co/ghgIv3irn6",
  'truncated': True,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'kunjanshah',
     'name': 'Kunjan Shah',
     'id': 14828205,
     'id_str': '14828205',
     'indices': [0, 11]}

If you get your json object back, then you're good to go.

## Section 2: Searching Tweets

Most websites you access will have a long string attached to the end of them that look something like this:  `http://thewebsite.com/?year=2019&color=golden%20yellow&user_id=48549395959438`.

Most people have no reason to pay attention to any of this, but all the special symbols at the end are basically encoded commands that say 'return a website that displays x,y,z characteristics.'  

When accessing api data, it basically works the same way.

**Step 1:  Set Up Your First Query String**

If you go into Twitter and search for the term `Data Science`, you should be brought to a url that looks like this:  `https://twitter.com/search?q=Data%20Science&src=typed_query`

If you'd like, you can drop the `&src=typed_query` from the url and still get the same results.

There are some important details to pay attention to:

 - Like class 3 when we worked with GitHub, there is a **base url**.  In this case it's `https://twitter.com/search`
 - Whenever you enter a search for something, the base url will be followed by something that looks like `?q=My%20Search%20Term`
  - The `?` marks the beginning of the query string.  This basically says 'initiate a request with whatever parameters that follow'
  - The `q` is a **parameter**, essentially some condition to pass into the query string that determines what results will be given back to you.
  
**Useful Thing To Do Right Now:** Go back to the Twitter search page, and just try searching for different things, and notice what shows up after the `q=`.  Here are some questions to ask yourself:

 - How are white spaces encoded?  Ie, if you search for `Jonathan Bechtel` in the search box, what shows up to account for the space between the two words?
 - What about hash symbols?  If you search for `#MeToo`, `#GirlsWhoCode` or `#DataScience`, what happens with that `#` symbol?
 - Once you get the hang of this, see if you can just re-create some searches yourself by creating the url directly, and bypassing the search box altogether.

Now, let's try and make a request for a search for `Data Science`.  

If you look at Twitter's docs, you'll see that the base url for the search API is `https://api.twitter.com/1.1/search/tweets.json`

This means you have to add the `?q=Whatever%20Word%20%Goes%20Here` to the end to complete the search.

So go ahead, and see if you can create your API call for a search for the term `Data Science`.

If you did it correctly, you should have a dictionary with a key called `statuses`, and it'll be a list with all of the tweets returned by your search.  

In [3]:
# your answer here
base_url = 'https://api.twitter.com/1.1/search/tweets.json'
search = '?q=Data%20Science'

req = requests.get(base_url + search, auth=tokens).json()

In [4]:
# and here's our dictionary, with our key that contains all the tweets returned by our response
req['statuses']

[{'created_at': 'Thu Mar 04 03:02:31 +0000 2021',
  'id': 1367309454362898432,
  'id_str': '1367309454362898432',
  'text': 'RT @ainunnajib: Yang benar, penting, esensial, adalah kemampuan mengambil keputusan yg terbaik -- baik oleh manusia (data science for human…',
  'truncated': False,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'ainunnajib',
     'name': 'Ainun Najib (bukan Emha Cak Nun)',
     'id': 35167068,
     'id_str': '35167068',
     'indices': [3, 14]}],
   'urls': []},
  'metadata': {'iso_language_code': 'in', 'result_type': 'recent'},
  'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 75772038,
   'id_str': '75772038',
   'name': 'Ismi',
   'screen_name': 'IsmiY',
   'location': '',
   'descriptio

For good measure, try doing a search for tweets relating to `#MeToo` as well.

In [5]:
# your answer here
query_string = '?q=%23MeToo'

req = requests.get(base_url + query_string, auth=tokens).json()

In [6]:
# and here are our returned tweets
req['statuses']

[{'created_at': 'Thu Mar 04 03:02:15 +0000 2021',
  'id': 1367309384171327488,
  'id_str': '1367309384171327488',
  'text': "RT @KimSordyl: @darakass As her attorney, you can take your phony support for Dr. Choo's survivor and shove it.  What a fraud.  DO NOT ment…",
  'truncated': False,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'KimSordyl',
     'name': 'Kim Sordyl, JD',
     'id': 2458126873,
     'id_str': '2458126873',
     'indices': [3, 13]},
    {'screen_name': 'darakass',
     'name': 'Dara Kass, MD',
     'id': 738796494,
     'id_str': '738796494',
     'indices': [15, 24]}],
   'urls': []},
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user':

**Step 2:  Adding Parameters to Your Query String**

Query strings basically have two parts:

 - What follows fhe `?` encodes the actual text to search for, using utf-8 encoding to account for special characters.  This is required.
 - You can also add additional search parameters, which are encoded by `&`. They dictate what kinds of results are returned.  
  - For example, a parameter you can use in Twitter's search API is `count`, which tells you how many results to return.  The default is 15, but you can return up to 100.  So if we wanted to search for tweets and return 50 results our query string would look like the following:
    `https://api.twitter.com/1.1/search/tweets.json?My%20Search%20String&count=50`
  - You can add as many of these parameters to your string as you'd like.  So for example, if we wanted to include parameters for `count` and `result_type`, we could do the following: `https://api.twitter.com/1.1/search/tweets.json?My%20Search%20String&count=50&result_type=mixed`
  
To get the hang of this, try searching for tweets that mention the hashtag `#DeepLearning`, and return 75 results.

In [7]:
# our search string
query_string = '?q=%23DeepLearning&count=75'
# initiate a request at this endpoint
results = requests.get(base_url+query_string, auth=tokens).json()

In [9]:
# and now we have our results!
results['statuses']

[{'created_at': 'Thu Mar 04 03:03:32 +0000 2021',
  'id': 1367309708483174402,
  'id_str': '1367309708483174402',
  'text': 'Train “undying” Flappy Bird using Reinforcement Learning on Java https://t.co/clf9Sp2Lu0 #java… https://t.co/Coa9q5eSvp',
  'truncated': True,
  'entities': {'hashtags': [{'text': 'java', 'indices': [89, 94]}],
   'symbols': [],
   'user_mentions': [],
   'urls': [{'url': 'https://t.co/clf9Sp2Lu0',
     'expanded_url': 'http://dlvr.it/RtwRmN',
     'display_url': 'dlvr.it/RtwRmN',
     'indices': [65, 88]},
    {'url': 'https://t.co/Coa9q5eSvp',
     'expanded_url': 'https://twitter.com/i/web/status/1367309708483174402',
     'display_url': 'twitter.com/i/web/status/1…',
     'indices': [96, 119]}]},
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'source': '<a href="https://dlvrit.com/" rel="nofollow">dlvr.it</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str

Try adding a second parameter.  You can find the list here:  https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

In [9]:
# we'll search for deep learning, returning 100 results, using the most recent tweets
search_string = '?q=%23DeepLearning&count=100&result_type=recent'
# initiate the request
results = requests.get(base_url + search_string, auth=tokens).json()

In [10]:
# and success!
results['statuses']

[{'created_at': 'Wed Oct 21 21:41:46 +0000 2020',
  'id': 1319031148257939457,
  'id_str': '1319031148257939457',
  'text': 'RT @ProgrammingHero: Cloud computing has multiplied over the past several years. \n\nHave a look at Top 5 Cloud Providers In 2020\n.\n.\n.\n.\n.\n#…',
  'truncated': False,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'ProgrammingHero',
     'name': 'Programming Hero',
     'id': 1254869948796674048,
     'id_str': '1254869948796674048',
     'indices': [3, 19]}],
   'urls': []},
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'source': '<a href="https://occlude.netlify.app/" rel="nofollow">OccludejsBot</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 1275056773443973124,
   'id_str': '1275056773443973124',
   'name': 'Candis',
   'screen_name': 'Women_who_co

## Section 3: Searching Users

The last section of the API you'll need to get the hang of before you're let loose is the users API, which allows you to search for users and get their followers, friends, etc, as opposed to tweets which fit a particular criteria.  This part is pretty similar to the advanced lab in class 3, so if you saw how that worked then you shouldn't need much instruction.  

But if you're seeing this with fresh eyes, you'll want to spend 15-20 minutes to make sure you understand this part.  

Official documentation can be found here:  https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/overview

So, as an example, if you want to get a list of someone's followers, you use the base url `https://api.twitter.com/1.1/followers/list.json` and then enter your query string to get a list of that persons followers.  

List of parameters to use can be found here:  https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference

One possible parameter to use is `screen_name`, so if you wanted to get a list of someone's followers based on their screen name (the handle that begins with an @), then you would set up your API call to look something like:

`https://api.twitter.com/1.1/followers/list.json?screen_name=persons_screenname`

Note that you exclude the `@`.

**Your turn:** Pull in the list of General Assembly's followers.  General Assembly's handle is `@GA`  The `users` key will contain the list of followers for General Assembly's account.

Note that this won't return the whole list of GA's users.  If you want to do that you have to use cursoring:  https://developer.twitter.com/en/docs/basics/cursoring.  This is the topic of your bonus assignment.

In [11]:
# your answer here
base_url     = 'https://api.twitter.com/1.1/followers/list.json'
query_string = '?screen_name=GA'

results = requests.get(base_url + query_string, auth=tokens).json()

In [12]:
# and our results
results['users']

[{'id': 3022159025,
  'id_str': '3022159025',
  'name': 'Quwanie Gordon',
  'screen_name': 'QuwanieGordon',
  'location': 'Central London ',
  'description': 'Founder of @cleaninggouk',
  'url': 'https://t.co/g5b6fXXBqe',
  'entities': {'url': {'urls': [{'url': 'https://t.co/g5b6fXXBqe',
      'expanded_url': 'https://cleaninggo.co.uk/',
      'display_url': 'cleaninggo.co.uk',
      'indices': [0, 23]}]},
   'description': {'urls': []}},
  'protected': False,
  'followers_count': 387,
  'friends_count': 1737,
  'listed_count': 5,
  'created_at': 'Fri Feb 06 20:47:46 +0000 2015',
  'favourites_count': 174,
  'utc_offset': None,
  'time_zone': None,
  'geo_enabled': True,
  'verified': False,
  'statuses_count': 26,
  'lang': None,
  'status': {'created_at': 'Sun Oct 11 09:54:27 +0000 2020',
   'id': 1315229266137686017,
   'id_str': '1315229266137686017',
   'text': '@iamtomskinner Congratulations fella.',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'u

## Section 4: Functions

This section details the functions you have to write and turn in as part of your homework assignment.  

Please read the requirements carefully.

**What you'll turn in:** A `.py` file with all of the functions written.  We should be able to load this into an IDE, run the file, and then call your functions to verify how and if they work. This file should also be properly commented so we can follow your line of reasoning.  **Even if something doesn't work exactly the way it should, it's okay to still turn in your file, and explain what you were trying to do.**

The functions you'll be prompted to write will be defined in the following ways:

 - **name:** the name of the function
 - **returns:** what the function should return
 - **arguments:** arguments to include inside the function in order to specify how it should behave.
 
 **Note:** The free API has limitations built into it, so this means most of your API calls won't return complete results.  This is fine.  It's understood and recognized that your functions won't be able to return an entire list of someone's users or other such things, so as long as your work delivers the best it can under present circumstances you're in good shape.
 
 **Other Note:** Every aspect of the API that you need to use can be found on either of these pages.
 
 Search API:  https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
 
 Users API: https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-followers-list

##### Function 1 (Required)

**Name:** `find_user`

**Returns:** dictionary that represents user object returned by Twitter's API

**Arguments:**
 - `screen_name`: str, required; Twitter handle to search for.  **Can include the @ symbol.  The function should check for this and remove it if necessary.**
 - `keys`: list, optional; list that contains keys to return about user object.  If not specified, then function should return the entire user object.

In [10]:
# since we're going to re-use this several times, we'll wrap it up into our own function
def check_handle(twitter_handle):
    # check if twitter handle begins with @
    if twitter_handle[0] == '@':
        # and if so slice it out
        return twitter_handle[1:]
    # otherwise just return the handle as is
    return twitter_handle

In [11]:
def find_user(screen_name, keys=None):
    
    # verify the screen_name
    screen_name=check_handle(screen_name)
    
    # base url we're going to work with
    base_url = 'https://api.twitter.com/1.1/users/lookup.json'
    # query we're going to attach
    query_string   = f'?screen_name={screen_name}'
    
    # the results return a list -- so we're just going to get the first (and only) item out of it
    results  = requests.get(base_url+query_string, auth=tokens).json()[0]
    
    # if something was provided for keys 
    if keys is not None:
        # then make a new dictionary out of the set of keys provided
        new_results = {key: results[key] for key in keys}
        # and return this
        return new_results
    
    # and if the if statement above wasn't true, then just give back the whole thing
    return results

Some tests:  

In [12]:
# we'll search for a user with an @ symbol
find_user('@GA')

{'id': 170393291,
 'id_str': '170393291',
 'name': 'General Assembly',
 'screen_name': 'GA',
 'location': '',
 'description': 'We transform careers and teams — including more than one third of the Fortune 100 — through dynamic courses in coding, data, design, and business.',
 'url': 'https://t.co/YQeEXPxJ4H',
 'entities': {'url': {'urls': [{'url': 'https://t.co/YQeEXPxJ4H',
     'expanded_url': 'http://ga.co/Twitter',
     'display_url': 'ga.co/Twitter',
     'indices': [0, 23]}]},
  'description': {'urls': []}},
 'protected': False,
 'followers_count': 162564,
 'friends_count': 5397,
 'listed_count': 3207,
 'created_at': 'Sat Jul 24 18:19:59 +0000 2010',
 'favourites_count': 35508,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': True,
 'statuses_count': 23097,
 'lang': None,
 'status': {'created_at': 'Wed Mar 03 20:43:02 +0000 2021',
  'id': 1367213953626107908,
  'id_str': '1367213953626107908',
  'text': 'Time to make a list of affirmations. 🤗 Thanks for 

In [13]:
# and without
find_user('GA')

{'id': 170393291,
 'id_str': '170393291',
 'name': 'General Assembly',
 'screen_name': 'GA',
 'location': '',
 'description': 'We transform careers and teams — including more than one third of the Fortune 100 — through dynamic courses in coding, data, design, and business.',
 'url': 'https://t.co/YQeEXPxJ4H',
 'entities': {'url': {'urls': [{'url': 'https://t.co/YQeEXPxJ4H',
     'expanded_url': 'http://ga.co/Twitter',
     'display_url': 'ga.co/Twitter',
     'indices': [0, 23]}]},
  'description': {'urls': []}},
 'protected': False,
 'followers_count': 162564,
 'friends_count': 5397,
 'listed_count': 3207,
 'created_at': 'Sat Jul 24 18:19:59 +0000 2010',
 'favourites_count': 35508,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': True,
 'statuses_count': 23097,
 'lang': None,
 'status': {'created_at': 'Wed Mar 03 20:43:02 +0000 2021',
  'id': 1367213953626107908,
  'id_str': '1367213953626107908',
  'text': 'Time to make a list of affirmations. 🤗 Thanks for 

In [14]:
# and include some keys
find_user('GA', keys=['name', 'screen_name','followers_count', 'friends_count', ])

{'name': 'General Assembly',
 'screen_name': 'GA',
 'followers_count': 162564,
 'friends_count': 5397}

##### Function 2 (Required)

**Name:** `find_hashtag`

**Returns:** list of data objects that contain information about each tweet that matches the hashtag provided as input.

**Arguments:**
 - `hashtag`: str, required; text to use as a hashtag search.  
 - `count`: int, optional; number of results to return
 - `search_type`: str, optional; type of results to return.  should accept 3 different values:
   - `mixed`:   return mix of most recent and most popular results
   - `recent`:  return most recent results
   - `popular`: return most popular results
   
**Note:** User should **not** have to actually use the `#` character for the `hashtag` argument.  The function should check to see if it's there, and if not, add it in for them.

In [15]:
def find_hashtag(hashtag, count=None, search_type=None):
    # check to see if the first character is a # symbol
    if hashtag[0] != '#':
        # if it's not, then add it to the beginning of the string
        hash_encoding = '%23'
        hashtag = hash_encoding + hashtag
    else:
        # if it is there, replace it with the correct encoding
        hashtag = hashtag.replace('#', '%23')
        
    # base url to use    
    base_url  = 'https://api.twitter.com/1.1/search/tweets.json'
    # we are always going to use this query string no matter what -- we must specify a hashtag
    hash_str  = f'?q={hashtag}'
    # these are optional, so we're going to leave them blank for now
    count_str = ''
    # ditto this
    type_str  = ''
    
    # if an argument for count was specified
    if count != None:
        # we'll rebuild this portion of the string appropriately
        count_str = f'&count={str(count)}'
        
    # and do the same thing for the search_type argument
    if search_type != None:
        type_str = f'&search_type={search_type}'
        
    # and finally when we're done, add them all together
    query_string = hash_str + count_str + type_str
    
    # and now we'll initiate our request
    results = requests.get(base_url + query_string, auth=tokens).json()
    
    # and return the results
    return results

In [16]:
# we'll try a search for DataScience
find_hashtag('DataScience')

{'statuses': [{'created_at': 'Thu Mar 04 03:05:13 +0000 2021',
   'id': 1367310130753208324,
   'id_str': '1367310130753208324',
   'text': 'RT @MiriamAsensi: Speed and quality have never been more essential \n\n#hclswlobp #appsec #lowcode #analytics #5g #nodejs #cybersecurity #dev…',
   'truncated': False,
   'entities': {'hashtags': [{'text': 'hclswlobp', 'indices': [69, 79]},
     {'text': 'appsec', 'indices': [80, 87]},
     {'text': 'lowcode', 'indices': [88, 96]},
     {'text': 'analytics', 'indices': [97, 107]},
     {'text': '5g', 'indices': [108, 111]},
     {'text': 'nodejs', 'indices': [112, 119]},
     {'text': 'cybersecurity', 'indices': [120, 134]}],
    'symbols': [],
    'user_mentions': [{'screen_name': 'MiriamAsensi',
      'name': 'Miriam Martinez-Asensi',
      'id': 1271692434150432768,
      'id_str': '1271692434150432768',
      'indices': [3, 16]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="https

In [17]:
# and add in the Hashtag as well for good measure
find_hashtag('#DataScience')

{'statuses': [{'created_at': 'Thu Mar 04 03:05:13 +0000 2021',
   'id': 1367310130753208324,
   'id_str': '1367310130753208324',
   'text': 'RT @MiriamAsensi: Speed and quality have never been more essential \n\n#hclswlobp #appsec #lowcode #analytics #5g #nodejs #cybersecurity #dev…',
   'truncated': False,
   'entities': {'hashtags': [{'text': 'hclswlobp', 'indices': [69, 79]},
     {'text': 'appsec', 'indices': [80, 87]},
     {'text': 'lowcode', 'indices': [88, 96]},
     {'text': 'analytics', 'indices': [97, 107]},
     {'text': '5g', 'indices': [108, 111]},
     {'text': 'nodejs', 'indices': [112, 119]},
     {'text': 'cybersecurity', 'indices': [120, 134]}],
    'symbols': [],
    'user_mentions': [{'screen_name': 'MiriamAsensi',
      'name': 'Miriam Martinez-Asensi',
      'id': 1271692434150432768,
      'id_str': '1271692434150432768',
      'indices': [3, 16]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="https

In [20]:
# try using the count argument -- confirm that it returns the right number of results
len(find_hashtag('#DataScience', count=50)['statuses'])

50

In [21]:
# and use the search_type argument as well
find_hashtag('#DataScience', search_type='recent')

{'statuses': [{'created_at': 'Wed Oct 21 21:42:22 +0000 2020',
   'id': 1319031297503825920,
   'id_str': '1319031297503825920',
   'text': 'RT @David_Analytics: Announcing: Graph-Native Machine Learning in Neo4j!\nBy Alicia Frame and @amyhodler You gotta check it out before your…',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'user_mentions': [{'screen_name': 'David_Analytics',
      'name': 'David Vessie',
      'id': 1446596928,
      'id_str': '1446596928',
      'indices': [3, 19]},
     {'screen_name': 'amyhodler',
      'name': 'Amy Hodler',
      'id': 48175748,
      'id_str': '48175748',
      'indices': [93, 103]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="https://youngapp.co" rel="nofollow">Yap Code Learn</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_nam

In [22]:
# and for good measure, we'll use both
find_hashtag('#DataScience', count=50, search_type='popular')

{'statuses': [{'created_at': 'Wed Oct 21 21:42:22 +0000 2020',
   'id': 1319031297503825920,
   'id_str': '1319031297503825920',
   'text': 'RT @David_Analytics: Announcing: Graph-Native Machine Learning in Neo4j!\nBy Alicia Frame and @amyhodler You gotta check it out before your…',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'user_mentions': [{'screen_name': 'David_Analytics',
      'name': 'David Vessie',
      'id': 1446596928,
      'id_str': '1446596928',
      'indices': [3, 19]},
     {'screen_name': 'amyhodler',
      'name': 'Amy Hodler',
      'id': 48175748,
      'id_str': '48175748',
      'indices': [93, 103]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="https://youngapp.co" rel="nofollow">Yap Code Learn</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_nam

##### Function 3 (Required)

**Name:** `get_followers`

**Returns:** list of data objects for each of the users followers, returning values for the `name`, `followers_count`, `friends_count`, and `screen_name` key for each user.

**Arguments:** 

 - `screen_name`: str, required; Twitter handle to search for.  **Results should not depend on user inputting the @ symbol.**
 - `keys`: list, required;  keys to return for each user.  default value: [`name`, `followers_count`, `friends_count`, `screen_name`]; if something else is listed, values for those keys should be returned
 - `to_df`: bool, required; default value: False; if True, return results in a dataframe.  Every value provided in the `keys` argument should be its own column, with rows populated by the corresponding values for each one for every user.

In [18]:
def get_followers(screen_name, keys=['name', 'followers_count', 'friends_count', 'screen_name'], to_df=False):
   
    # check if the first item is an @ symbol
    screen_name  = check_handle(screen_name)
    
    # url to use
    base_url     = 'https://api.twitter.com/1.1/followers/list.json'
    # query to attach
    query_string = f'?screen_name={screen_name}'
    
    # generate the request
    users        = requests.get(base_url + query_string, auth=tokens).json()['users']
    
    # create dictionaries that only contain the keys specified in the keys argument
    abbreviated_users = [{key: user[key] for key in keys} for user in users]
    
    # if returning results in a dataframe
    if to_df:
        import pandas as pd
        # create a dictionary with key values for each one 
        df_dict = {key: [user[key] for user in abbreviated_users] for key in keys }
        return pd.DataFrame(df_dict)
    
    return abbreviated_users

In [19]:
# with GA handle
get_followers('@GA')

[{'name': 'Prompt.io',
  'followers_count': 1056,
  'friends_count': 836,
  'screen_name': 'prompt_io'},
 {'name': "McDaniel Inks Int'l Strategic Biz Expansion Plans",
  'followers_count': 55,
  'friends_count': 278,
  'screen_name': 'McDaniel_InksBz'},
 {'name': 'laura',
  'followers_count': 267,
  'friends_count': 380,
  'screen_name': 'Laura_Tombia'},
 {'name': 'Britt Cambas',
  'followers_count': 49,
  'friends_count': 189,
  'screen_name': 'brittcambas'},
 {'name': 'Rekt Ralph',
  'followers_count': 41,
  'friends_count': 1244,
  'screen_name': 'RektRalph4'},
 {'name': 'Mahedi Hasan Omi',
  'followers_count': 125,
  'friends_count': 1648,
  'screen_name': 'mahedi_dreams'},
 {'name': 'mafromist',
  'followers_count': 155,
  'friends_count': 855,
  'screen_name': 'mafromist'},
 {'name': 'Startups Magazine',
  'followers_count': 8469,
  'friends_count': 6305,
  'screen_name': 'TheStartupsMag'},
 {'name': 'Ryan',
  'followers_count': 10,
  'friends_count': 877,
  'screen_name': 'Stabl

In [25]:
# and without the @ symbol
get_followers('GA')

[{'name': 'Quwanie Gordon',
  'followers_count': 387,
  'friends_count': 1737,
  'screen_name': 'QuwanieGordon'},
 {'name': 'David Knudson',
  'followers_count': 7,
  'friends_count': 28,
  'screen_name': 'djknudson'},
 {'name': 'Lesley S.',
  'followers_count': 1591,
  'friends_count': 2210,
  'screen_name': 'msgigglez'},
 {'name': 'Eppik',
  'followers_count': 1097,
  'friends_count': 4650,
  'screen_name': 'eppikstories'},
 {'name': 'Quest Engineering Solutions',
  'followers_count': 15,
  'friends_count': 139,
  'screen_name': 'QESEngineering'},
 {'name': 'Srikanth',
  'followers_count': 5,
  'friends_count': 170,
  'screen_name': 'Srikant65202945'},
 {'name': 'TUSIIME CHRISTBELL',
  'followers_count': 492,
  'friends_count': 1998,
  'screen_name': 'TUSIIMECHRISTB1'},
 {'name': 'iamsamhacker',
  'followers_count': 67,
  'friends_count': 222,
  'screen_name': 'iamsamhacker'},
 {'name': 'joshua d chapel',
  'followers_count': 1,
  'friends_count': 74,
  'screen_name': 'chapel_joshua'

In [26]:
# and now only use certain keys
get_followers('GA', keys=['name', 'followers_count'])

[{'name': 'Quwanie Gordon', 'followers_count': 387},
 {'name': 'David Knudson', 'followers_count': 7},
 {'name': 'Lesley S.', 'followers_count': 1591},
 {'name': 'Eppik', 'followers_count': 1097},
 {'name': 'Quest Engineering Solutions', 'followers_count': 15},
 {'name': 'Srikanth', 'followers_count': 5},
 {'name': 'TUSIIME CHRISTBELL', 'followers_count': 492},
 {'name': 'iamsamhacker', 'followers_count': 67},
 {'name': 'joshua d chapel', 'followers_count': 1},
 {'name': '.', 'followers_count': 160},
 {'name': 'NEM Support', 'followers_count': 0},
 {'name': 'Further 🇦🇺', 'followers_count': 34},
 {'name': 'Jasmin Alvarez', 'followers_count': 0},
 {'name': 'Nick Dyer', 'followers_count': 58},
 {'name': 'Pramit Majumder', 'followers_count': 39},
 {'name': 'melisa 🌱', 'followers_count': 19},
 {'name': 'Susie Vernalis', 'followers_count': 687},
 {'name': 'delara chowdhury', 'followers_count': 110},
 {'name': 'Lobster Coaching', 'followers_count': 52},
 {'name': 'jena adeke', 'followers_coun

In [20]:
# and put those results in a dataframe
get_followers('GA', keys=['name', 'followers_count'], to_df=True)

Unnamed: 0,name,followers_count
0,Prompt.io,1056
1,McDaniel Inks Int'l Strategic Biz Expansion Plans,55
2,laura,267
3,Britt Cambas,49
4,Rekt Ralph,41
5,Mahedi Hasan Omi,125
6,mafromist,155
7,Startups Magazine,8469
8,Ryan,10
9,Freddie Blicher,296


In [29]:
# and test the same thing, but without any keys specified
get_followers('GA', to_df=True)

Unnamed: 0,name,followers_count,friends_count,screen_name
0,Quwanie Gordon,387,1737,QuwanieGordon
1,David Knudson,7,28,djknudson
2,Lesley S.,1591,2210,msgigglez
3,Eppik,1097,4650,eppikstories
4,Quest Engineering Solutions,16,139,QESEngineering
5,Srikanth,5,170,Srikant65202945
6,TUSIIME CHRISTBELL,490,1998,TUSIIMECHRISTB1
7,iamsamhacker,67,222,iamsamhacker
8,joshua d chapel,1,74,chapel_joshua
9,.,160,117,jarrell_idk


##### Function 4 (Optional)

**Name:** `friends_of_friends`

**Returns:** list of data objects for each user that two Twitter users have in common

**Arguments:**

 - `names`: list, required; list of two Twitter handles to compare friends list with.  **Results should be insensitive to whether or not the '@' symbol is included or not.**
 - `keys`: list, optional; list of keys to return for information about each user.  Default value should be to return the entire data object.
 - `to_df`: bool, required; default value: False; if True, returns results in a dataframe.

In [29]:
def friends_of_friends(names, keys=None, to_df=False):
    # modify handles if necessary
    names    = [check_handle(name) for name in names]
    # base url
    url      = 'https://api.twitter.com/1.1/friends/list.json'
    # get friends of both users -- note we return the list associated with the 'users' key
    results  = [requests.get(url + f'?screen_name={name}&count=200', auth=tokens).json()['users']
                for name in names]
    
    # ids of first friend
    user1_ids = [result['id'] for result in results[0]]
    # ids of second friend
    user2_ids = [result['id'] for result in results[1]]
    
    # ids that are common in both of the above lists
    joint_ids = [user_id for user_id in user1_ids if user_id in user2_ids]
    
    # return the users that have the id that's held jointly by each user
    friends_of_friends = [user for user in results[0] if user['id'] in joint_ids]
    
    # if keys are provided
    if keys is not None:
        # do the following comprehension:
         # create a dictionary which contains the value for each key provided
         # for every person in the friend_of_friends list
        friends_of_friends = [{key: friend[key] for key in keys} for friend in friends_of_friends]
        
    # if we want the results in a dataframe
    if to_df:
        import pandas as pd
        # first check to see if there's a specific list of keys we need to use
        if keys is not None:
            # if yes, create a dictionary, where each key is a list with all of its values
            df_dict = {key: [friend[key] if friend[key] else None for friend in friends_of_friends] for key in keys}
            return pd.DataFrame(df_dict)
        
        # likewise, if keys are not provided, then get them inside of a list
        keys = friends_of_friends[0].keys()
        # and create a dataframe friendly dictionary in the same way
        df_dict = {key: [friend[key] for friend in friends_of_friends] for key in keys}
        return pd.DataFrame(df_dict)
            
    # and if none of the above conditions are true, just return the original friends_of_friends list
    return friends_of_friends
    

In [30]:
# mutual friends
friends_of_friends(['Beyonce', 'MariahCarey'])

[{'id': 40908929,
  'id_str': '40908929',
  'name': 'Usher Raymond IV',
  'screen_name': 'Usher',
  'location': '',
  'description': 'Text me (404) 737-1821',
  'url': 'https://t.co/Vj0zkZ2Rr0',
  'entities': {'url': {'urls': [{'url': 'https://t.co/Vj0zkZ2Rr0',
      'expanded_url': 'http://smarturl.it/xicry',
      'display_url': 'smarturl.it/xicry',
      'indices': [0, 23]}]},
   'description': {'urls': []}},
  'protected': False,
  'followers_count': 12192202,
  'friends_count': 564,
  'listed_count': 26263,
  'created_at': 'Mon May 18 16:36:50 +0000 2009',
  'favourites_count': 1333,
  'utc_offset': None,
  'time_zone': None,
  'geo_enabled': True,
  'verified': True,
  'statuses_count': 5992,
  'lang': None,
  'status': {'created_at': 'Wed Aug 05 19:10:55 +0000 2020',
   'id': 1291089318597890048,
   'id_str': '1291089318597890048',
   'text': 'A 4-Day Celebration of Community Power + Cultural Resistance. Tune in live on August 6 + 7 + 12 + 18, 2020 for… https://t.co/jUZQCv1dUF',

In [31]:
# mutual friends, returned in a dataframe
friends_of_friends(['Beyonce', 'MariahCarey'], to_df=True)

Unnamed: 0,id,id_str,name,screen_name,location,description,url,entities,protected,followers_count,...,default_profile,default_profile_image,following,live_following,follow_request_sent,notifications,muting,blocking,blocked_by,translator_type
0,40908929,40908929,Usher Raymond IV,Usher,,Text me (404) 737-1821,https://t.co/Vj0zkZ2Rr0,{'url': {'urls': [{'url': 'https://t.co/Vj0zkZ...,False,12192202,...,False,False,False,False,False,False,False,False,False,regular
1,30782495,30782495,KELENDRIA ROWLAND,KELLYROWLAND,,COFFEE – out now!,https://t.co/IUvpsr9qhu,{'url': {'urls': [{'url': 'https://t.co/IUvpsr...,False,6896645,...,False,False,False,False,False,False,False,False,False,none
2,18395177,18395177,solange knowles,solangeknowles,,"water, melanin, bones, blood.",https://t.co/8F2cZD8D37,{'url': {'urls': [{'url': 'https://t.co/8F2cZD...,False,4151877,...,False,False,False,False,False,False,False,False,False,none


In [32]:
# friends of friends, with keys passed in
friends_of_friends(['Beyonce', 'MariahCarey'], keys=['id', 'name'])

[{'id': 40908929, 'name': 'Usher Raymond IV'},
 {'id': 30782495, 'name': 'KELENDRIA ROWLAND'},
 {'id': 18395177, 'name': 'solange knowles'}]

In [33]:
# f_o_f, with keys and to_df specified
friends_of_friends(['Beyonce', 'MariahCarey'], keys=['id', 'name'], to_df=True)

Unnamed: 0,id,name
0,40908929,Usher Raymond IV
1,30782495,KELENDRIA ROWLAND
2,18395177,solange knowles


 ##### Function 5 (Optional)

Rewrite the `friends_of_friends` function, except this time include an argument called `full_search`, which accepts a boolean value.  If set to `True`, use cursoring to cycle through the complete set of users for the users provided.  

The twitter API only returns a subset of users in your results to save bandwidth, so you have to cycle through multiple result sets to get all of the values.

You can read more about how this works here:  https://developer.twitter.com/en/docs/basics/cursoring

Basically you have to do a `while` loop to continually make a new request using the values stored in the `next_cursor` key as part of your next query string until there's nothing left to search.

**Note:** We're using the free API, so we're operating under some limitations.  One of them being that you can only make 15 API calls in a 15 minute span.  You can also only return up to 200 results per cursor, so this means you won't be able to completely search for everyone even if you set this up correctly.

That's fine, just do what you can under the circumstances.

In [34]:
# we'll create a separate function to handle the cursoring
def cursor_user(username):
    # check the username
    username  = check_handle(username)
    # set the following variables to use in the loop
    cursor    = -1
    followers = []
    
    # while this condition is True
    while cursor != 0:
        # set the url using the value of cursor & username
        url = f'https://api.twitter.com/1.1/friends/list.json?screen_name={username}&count=200&cursor={cursor}'
        # do the request
        req = requests.get(url, auth=tokens).json()
        # add the values returned by the request
        followers.extend(req['users'])
        # set the value of cursor to value inside this key
        cursor = req['next_cursor']
    
    # and when you're finished, return the list
    return followers

In [36]:
# and now we'll stick it inside friends_of_friends
def friends_of_friends(names, keys=None, to_df=False, full_search=False):
    names    = [check_handle(name) for name in names]
    url      = 'https://api.twitter.com/1.1/friends/list.json'
    
    # if full_search, use the cursor_user() fn for each person 
    if full_search:
        results = [cursor_user(name) for name in names]
    # if not, then do a standard search
    else:
        results = [requests.get(url + f'?screen_name={name}&count=200', auth=tokens).json()['users']
                   for name in names]
    
    user1_ids = [result['id'] for result in results[0]]
    user2_ids = [result['id'] for result in results[1]]
    
    joint_ids = [user_id for user_id in user1_ids if user_id in user2_ids]
    
    friends_of_friends = [user for user in results[0] if user['id'] in joint_ids]
    
    if keys is not None:
        friends_of_friends = [{key: friend[key] for key in keys} for friend in friends_of_friends]
        
    if to_df:
        import pandas as pd
        if keys is not None:
            df_dict = {key: [friend[key] for friend in friends_of_friends] for key in keys}
            return pd.DataFrame(df_dict)
        
        keys = friends_of_friends[0].keys()
        df_dict = {key: [friend[key] for friend in friends_of_friends] for key in keys}
        return pd.DataFrame(df_dict)
            
    return friends_of_friends
    

In [37]:
# if you search for 200 friends, you'll find 4 people in common
len(friends_of_friends(['ezraklein', 'tylercowen']))

9

In [38]:
# if you add full_search, then it'll find 54 people
len(friends_of_friends(['ezraklein', 'tylercowen'], full_search=True))

KeyError: 'users'