# How The Web Works...and an Introduction to the Twitter API

Before we take take a look at APIs, let's take a step back and learn about how the web works. Understanding some of the fundamentals about how information travels around the internet will help us a ton. And, the best place to start is by learning about HTTP.

## HTTP

A simple protocol called [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) powers most of the communications on the web, including your browser and probably most of the apps that you use. HTTP allows you (via your browser, a mobile app or even code you write!) to **request** data (HTML, PDFs, MP3s, etc) from a service across the internet (e.g. google.com, twitter.com) and that service will respond with the requested data (i.e. the **response**).

Let's take a look at how HTTP in more detail by looking at Mike's silly slides on "How the Internet Works"

** (MY/TODO: put in links to keynote and pdf) **

### A Quick Review of HTTP: Request and Response

So, let's review a simple HTTP request:

1. a "**client**" (your browser, your Instagram app, or even some code that you are about to write) makes a **request** for data.

2. the **request** is in the form of a [URL](https://en.wikipedia.org/wiki/URL). The URL specifies the site and page/document/data. For example: https://nytimes.com/ or https://www.nytimes.com/2018/01/24/technology/personaltech/huawei-mate-10-pro-smartphone-review.html

3. the "**server**" receives the request and then returns the page/document/data. This is the **response**.

Simple!

One important note: this type of request is called a "**GET**" request. There are other types of HTTP requests which we'll learn about later. 

### Anatomy of a URL

The [URL](https://en.wikipedia.org/wiki/URL), or Uniform Resource Locator, contains a variety of important information about data that we are requesting. Here are the various fields in a URL:

<img src="https://camo.githubusercontent.com/43bd353c3d0879547481da33bba7d15768bdf4bb/68747470733a2f2f7261772e6769746875622e636f6d2f41544c2d5744492d437572726963756c756d2f686f772d7468652d696e7465726e65742d776f726b732f6d61737465722f696d616765732f616e61746f6d792d75726c2e706e67">
    
For now, we're just going to focus on the protocol, domain and path. The parameters are very important but we'll come back to that in a future lesson.

### What Kind of Data is on the Other End of a Request?

The data you find in a web page (HTML) or PDF document is meant to be read as you would read the page of a book. But in this class, we'll learn that that kind of reading is labor-intensive. We want a computer to read for us instead -- to take in the data and create something new. This means we want other formats, which lead us to CSV, JSON and XML.

[NOAA Daily Weather Records (**HTML**)](https://www.ncdc.noaa.gov/cdo-web/datatools/records):  
[`https://www.ncdc.noaa.gov/cdo-web/datatools/records`](https://www.ncdc.noaa.gov/cdo-web/datatools/records)

[USDA Expenditures on Children by Families (**PDF**)](https://catalog.data.gov/dataset/expenditures-on-children-by-families):  
[`https://www.cnpp.usda.gov/sites/default/files/expenditures_on_children_by_families/crc2013.pdf`](https://www.cnpp.usda.gov/sites/default/files/expenditures_on_children_by_families/crc2013.pdf)

[FDNY Monthly Response Times (**CSV**)](https://data.cityofnewyork.us/Social-Services/FDNY-Monthly-Response-Times/j34j-vqvt):  
[`https://data.cityofnewyork.us/api/views/j34j-vqvt/rows.csv`](https://data.cityofnewyork.us/api/views/j34j-vqvt/rows.csv)

[FDNY Monthly Response Times (**JSON**)](https://data.cityofnewyork.us/Social-Services/FDNY-Monthly-Response-Times/j34j-vqvt):  
[`https://data.cityofnewyork.us/resource/6b8a-2fci.json`](https://data.cityofnewyork.us/resource/6b8a-2fci.json)


## Enough About URLs! Let's Write Some Code

Ok, time for us to write some code to make out own HTTP requests. There are many python libraries which handle all of the fun of HTTP for us - we'll use one simply called [`requests`](http://docs.python-requests.org/en/master/).

To install the requests python library, you can run the following. Recall that the double percent signs indicate that the code in the cell is to be interpreted as something other than Python commands. In this case, we are giving instructions to the UNIX **sh**ell.

In [None]:
%%sh
pip install requests

In the code below, we will make an HTTP request to `http://digg.com` just as saw in the presentation earlier.

In [None]:
import requests

url = 'http://digg.com'

response = requests.get(url)

So what is `response`? Remember that we can inspect the object to see what type it is?

In [None]:
print(type(response))

I'm not sure how much that helps us so let's jump over to the [`requests` library documentation](http://docs.python-requests.org/en/master/) to see how we use this library.

In [None]:
# print out the HTTP status code
print(response.status_code)

In [None]:
# we can also print out the "headers" sent back by the digg.com server
print(response.headers)

In [None]:
# best of all, we can see the page we've requested (the digg.com homepage in this case) using the following code?
print(response.text)

**NOTE**: This is the same as opening the URL in Chrome and selecting `View --> Developer --> View Source`

### Before moving on...

What do you think happens on the other end of our HTTP request to `digg.com`? Well, we know that a server receives our **request** and then passes back the information that we've asked for (the digg homepage in this case). The `digg.com` servers also record a record of each requests in "log" files. Let's take a quick peek at the digg log files.


### A Quick Exercise

Write some code in the box below to make an HTTP request to The New York Times homepage. After you make the request, print the homepage HTML. Ready? Go

In [150]:
# put your code here! 




**Follow-up Question**: how might you go about collecting all of The New York Times headlines programatically if this was your only means (requesting their homepage)? Take a look through the NYTimes homepage HTML and see if you see any patterns. Code which fetches a page, like the NYTimes, and parses out the headlines is an example of "web scraping."

There is a lot more to learn about HTTP and "web scraping" but we'll pick that up in future lessons. For now, let's move on to APIs!

## What's an API?

An API, or application programming interface, allows you to specify the data you want and returns it in a computer-friendly format like [JSON](https://www.json.org/) or [XML](https://en.wikipedia.org/wiki/XML). The "interface"  is a regularized way to make requests, and a consistent specification for the data you asked for. So many organizations now publish APIs for their data. From [The New York Times](https://developer.nytimes.com/) to [ProPublica](https://propublica.github.io/campaign-finance-api-docs/), to governmental organizations like the [EPA](https://developer.epa.gov/category/api/), to social media sites like [Twitter](https://developer.twitter.com/en/docs) and [Instagram](https://www.instagram.com/developer/) and [LinkedIn](https://developer.linkedin.com).


### API Authentication

Most API providers require you as the developer to use a form of authentication while using their APIs. There are various forms of authentication: oauth, api keys and even username and passwords.

For example, like [The New York Times](https://developer.nytimes.com/) only require that you use an API key when making API calls. With API keys, you usually just pass the key in your API calls, like:

```
https://developer.nytimes.com/article_search_v2.json?api_key=abcxyz&q=tesla
```

[OAuth](https://en.wikipedia.org/wiki/OAuth) is a bit more complicated but provides more fine-grained control for the API service as well as the users. Let's come back to it right after we set up our Twitter API keys (yep, they use OAuth for their API authentication).


## Using The Twitter API

To access the Twitter API, we need to register an "application" that will be pulling data from their service. This means that in one sweet instant you have become a developer! 😮 The steps are pretty easy and listed below. You'll first need to get a set of "keys" to use the API and then install a Python library that exposes the Twitter API through special objects. You are well on your way to writing your very own mis-information bot! (kidding)
    
**1) Get Your API Keys**

If you don't already have credentials for Twitter, you have to create an application and generate a set of keys (an API key, API secret, Access token and Access token secret) on the Twitter developer site. There are five easy steps!

1. Create a Twitter user account if you do not already have one.
2. Go to [https://apps.twitter.com](https://apps.twitter.com/) and log in with your Twitter user account. This step gives you a Twitter developer's account under the same name as your user account. (Um, and congratulations! You're now a developer!)
3. Click “Create New App”
4. Fill out the form, agree to the terms, and click “Create your Twitter application”
5. In the next page, click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret”. Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.

Once you have your tokens, copy them into the variables below:

In [151]:
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""

**2) Install the Tweepy Library**

The developer community has created [hundreds of Twitter libraries](https://dev.twitter.com/resources/twitter-libraries) that help you access Twitter's API. By "help" we mean they have created objects that hide the details of making requests for data from Twitter, and leave you with a clean coding interface. Your requests to Twitter are in the form of neat methods (verbs) that return data on users, their statuses and followers. You can even post tweets using these libraries.

We will by using Tweepy to call the Twitter API. Why? It has many of the best features of the other libraries and its documentation is complete. Often, free software projects can be thinly documented, leaving you a little out to sea if you have a problem.

Keep these two links open in tabs as we go through the code below: [Tweepy documentation](http://tweepy.readthedocs.io/en/v3.5.0/
) and [source code](https://github.com/tweepy/tweepy).

Use the following to install the Tweepy library (version 3.5) on your machine.

In [None]:
%%sh
pip install tweepy==3.5.0

In [152]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# create an object we will use to communicate with the Twitter API
api = API(auth)
print(type(api))

<class 'tweepy.api.API'>


### Getting a User's Profile Info

Now you are prepped and ready to start making Twitter API calls. First, lets look at some user profiles. 

We will be calling the `users/show` api: https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-users-show

To call the `user/show` api, Tweepy has a method called [`get_user`](http://tweepy.readthedocs.io/en/v3.5.0/api.html#user-methods)

In [None]:
# get a user's profile (the 'nytimes' in this case)
user = api.get_user('myoung')

help(type(user))

What sort of information do we get about a user? Take a look at the `user/show` documenation and then print out some of the user attributes below.

In [153]:
# let's print out a few attributes for our user
print(user.id)
print(user.screen_name)
print(user.location)
print(user.followers_count)
print(user.listed_count)
print(user.statuses_count)
print(user.created_at)

2671
myoung
Brooklyn, NY
4328
233
8914
2006-07-20 18:35:41


By calling `api.get_user()` above, the Tweepy library made an API call to Twitter. The result of the API call is technically a JSON string. As we did last week, we could parse it into primitive Python objects like lists and dictionaries and numbers and strings. Tweepy creates high-level objects to represent the result of an API call. This is why you access ".screen_name" and ".followers_count" as attributes of the object. 

Objects have both data and methods and the methods for this object are things like follow() and unfollow() the user. All of this conveniently wrapped up in a high-level object.

**Try This!** 

Modify the code above to get the user profile information for `@realDonaldTrump`

### Send a Tweet!

If you were writing a bot (which we'll do soon), it would need to tweet! You can send out a tweet with one line of code.

We will be using the `statuses/update` api to send the tweet: https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/post-statuses-update

In [None]:
# send a tweet!
# you may want to modify this message before you send it :-)

api.update_status(status='I love this class!')

### Look at a User's Tweets

Now, let's look at [`@realDonaldTrump`](https://twitter.com/realDonaldTrump)'s tweets. If you've had enough of all that, replace it with [`@justinbieber`](https://twitter.com/justinbieber) or someone less anxious-making.

We will use the `statuses/user_timeline` api to do this: https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

In [None]:
# get the justinbieber last 10 tweets
tweets = api.user_timeline(screen_name='justinbieber', count=10)
print(type(tweets))
print(len(tweets))

**Question** If we want to print out all 10 of these tweets, how would we do that? Would we do something like this?

In [None]:
print(tweets[0].text)
print(tweets[1].text)
print(tweets[2].text)
print(tweets[3].text)
print(tweets[4].text)
print(tweets[5].text)


What if we needed to print out 100 tweets instead of 10? This doesn't seem like the right approach.

## Introducing Loops! 

Loops are available in most programming languages and they simply allow code to be executed repeatedly. Let's see what this means by looking at an example.

In [156]:
# Introducing Loops! 
# say we have a list with three names (strings) in it
teachers_names = ['mark', 'emily', 'mike']

# we can print out the length of the list
print(len(teachers_names))

# we can also print out the 1st, 2nd and 3rd element in the list
print(teachers_names[0])
print(teachers_names[1])
print(teachers_names[2])

3
mark
emily
mike


In [157]:
# the following is an example of a "for" loop
# we will "loop through" the list and print out each name

for name in teachers_names:
    # start of the code to run each time we go through the loop
    print(name)
    # end of the code to run each time we go through the loop


mark
emily
mike


Let's put a `for` loop to use with some tweets. You can fetch the most recent 10 tweets from `@realDonaldTrump` and then loop over the tweets, printing each one out. Here, a tweet object from Tweepy has data attributes like the text of the tweet, stored in `".text"`

In [None]:
# get the "real" Donald's last 10 tweets
tweets = api.user_timeline(screen_name='realDonaldTrump', count=10)

# loop over the tweets and print out the tweet text
for tweet in tweets:
    print(tweet.text)

**Try This!**

Use the example above to get the latest tweets for yourself, `@nytimes`, etc. What other information would be useful to have besides the text of the tweet?

## Appendix

Latency of AT&T Network: http://ipnetwork.bgtmo.ip.att.net/pws/network_delay.html