# How The Web Works... and an Introduction to the Twitter API
<br>

<img src=https://cdn-images-1.medium.com/max/1920/1*CWytxLBZtxrxekPofi0-RQ.png width=500>
<br>

Before we take take a look at APIs, let's take a step back and learn about how the web works. Understanding some of the fundamentals about how information travels around the internet will help us a ton. And, the best place to start is by learning about HTTP.

## HTTP

A simple protocol called [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) powers most of the communications on the web, including your browser and probably most of the apps that you use. HTTP allows you (via your browser, a mobile app or even code you write!) to **request** data (HTML, PDFs, MP3s, etc) from a service across the internet (e.g. google.com, twitter.com) and that service will respond with the requested data (i.e. the **response**).

Let's take a look at how HTTP in more detail by looking at Mike's slides on "How the Internet Works." You can find the [Keynote](https://github.com/computationaljournalism/columbia2018/blob/master/docs/HowTheInternetWorks.key) (for Mac's) or [PDF](https://github.com/computationaljournalism/columbia2018/blob/master/docs/HowTheInternetWorks.pdf) copy of the slides in the [`docs` directory](https://github.com/computationaljournalism/columbia2018/tree/master/docs) of our github repository.

### A Quick Review of HTTP: Request and Response

So, let's review a simple HTTP request:

1. A "**client**" (your browser, your Instagram app, or even some code that you are about to write) makes a **request** for data.

2. The "**request**" is in the form of a [URL](https://en.wikipedia.org/wiki/URL) (Uniform Resource Locator -- a web address). The URL specifies the site you are requesting information from and the page/document/data you want. For example: https://nytimes.com/ is the site for the New York Times and this URL https://www.nytimes.com/2018/01/24/technology/personaltech/huawei-mate-10-pro-smartphone-review.html specifies a given news story in the form of an HTML page.

3. The "**server**" receives the request and then returns the page/document/data you asked for. This is the **response**.

Simple!

One important note: this type of request is called a "**GET**" request. There are other types of HTTP requests which we'll learn about later. (The main difference being in how you specify the data you want -- GET specifies the data you want in the URL of the request as we'll see below.) 

### Anatomy of a URL

The [URL](https://en.wikipedia.org/wiki/URL), or Uniform Resource Locator, or "web address," contains a variety of important information about data that we are requesting. Here are the various fields in a URL:

<img src="https://camo.githubusercontent.com/43bd353c3d0879547481da33bba7d15768bdf4bb/68747470733a2f2f7261772e6769746875622e636f6d2f41544c2d5744492d437572726963756c756d2f686f772d7468652d696e7465726e65742d776f726b732f6d61737465722f696d616765732f616e61746f6d792d75726c2e706e67" width=500>
    
For now, we're just going to focus on the protocol, domain and path. The parameters are very important but we'll come back to that in a future lesson.

### What Kind of Data is on the Other End of a Request?

The data you find in a web page (HTML) or PDF document is meant to be read as you would read the page of a book. But in this class, we'll learn that that kind of reading is labor-intensive. We want a computer to read for us instead -- to take in the data and create something new. This means we want other formats, which lead us to CSV, JSON and XML.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[NOAA Daily Weather Records (**HTML**)](https://www.ncdc.noaa.gov/cdo-web/datatools/records):  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[`https://www.ncdc.noaa.gov/cdo-web/datatools/records`](https://www.ncdc.noaa.gov/cdo-web/datatools/records)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[USDA Expenditures on Children by Families (**PDF**)](https://catalog.data.gov/dataset/expenditures-on-children-by-families):  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[`https://www.cnpp.usda.gov/sites/default/files/expenditures_on_children_by_families/crc2013.pdf`](https://www.cnpp.usda.gov/sites/default/files/expenditures_on_children_by_families/crc2013.pdf)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[FDNY Monthly Response Times (**CSV**)](https://data.cityofnewyork.us/Social-Services/FDNY-Monthly-Response-Times/j34j-vqvt):  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[`https://data.cityofnewyork.us/api/views/j34j-vqvt/rows.csv`](https://data.cityofnewyork.us/api/views/j34j-vqvt/rows.csv)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[FDNY Monthly Response Times (**JSON**)](https://data.cityofnewyork.us/Social-Services/FDNY-Monthly-Response-Times/j34j-vqvt):  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[`https://data.cityofnewyork.us/resource/6b8a-2fci.json`](https://data.cityofnewyork.us/resource/6b8a-2fci.json)


## Enough About URLs! Let's Write Some Code

Ok, time for us to write some code to make out own HTTP requests. There are many python libraries which handle all of the fun of HTTP for us - we'll use one simply called [`requests`](http://docs.python-requests.org/en/master/).

To install the requests python library, you can run the following. Recall that the double percent signs indicate that the code in the cell is to be interpreted as something other than Python commands. In this case, we are giving instructions to the UNIX **sh**ell. (For those of you on Windows, replace the "%%sh" with "%%cmd" as you want the following instruction interpreted as something you'd type at the command prompt.)

In [None]:
%%sh
pip install requests

In the code below, we will make an HTTP request to `http://digg.com` just as saw in the presentation earlier.

In [None]:
from requests import get

# Specify the location of the information you want as a string

url = 'http://digg.com'

# Then fetch the data (the resource) at that address using get() from
# the "requests" package

response = get(url)

So what is `response`? Remember that we can inspect the object to see what type it is?

In [None]:
print(type(response))

I'm not sure how much that helps us so let's jump over to the [`requests` library documentation](http://docs.python-requests.org/en/master/) to see how we use this library.

In [None]:
# print out the HTTP status code
print(response.status_code)

In [None]:
# we can also print out the "headers" sent back by the digg.com server
print(response.headers)

In [None]:
# best of all, we can see the page we've requested (the digg.com homepage in this case) using the following code?
print(response.text)

**NOTE**: This is the same as opening the URL in Chrome and selecting `View ➡️ Developer ➡️ View Source`

### Before moving on...

What do you think happens on the other end of our HTTP request to `digg.com`? Well, we know that a server receives our **request** and then passes back the information that we've asked for (the digg homepage in this case). The `digg.com` servers also keep a record of each requests in "log" files. Let's take a quick peek at the digg log files.
<br><br>

<img src=http://1.bp.blogspot.com/-TmWAXEghGtc/U20Nrs7sYvI/AAAAAAAAAXg/niP4Hf1Ef4U/s1600/Gilliam+Intermission.jpg width=400>

Web logs, I (Mark) believe, started as a way for "web masters" to assess the health of their servers, to make sure that the service was performing properly and to keep an eye on its usage. Toward that end, web servers (the program we just saw operating at digg.com) standardized on information they record with each request, [the so-called "common log format."](https://en.wikipedia.org/wiki/Common_Log_Format)

As you saw, each request generates a new line in the log file, and your individual requests are threaded in time with data from everyone else's use of the site at the same time. Web site owners quickly realized that these logs not only tell you information about broken links and busy or quiet times for your service, *they also can give you a snapshot of user's interests.*

By pulling out the lines related to just your requests to digg.com, for example, we can start to see patterns in what you are searching for. What are your interests? What are your habits? How does your use of the service compare to others? Are there "clusters" or groups of people with similar behaviors to yours? All of these questions can be answered by log analysis. [Wired had a nice overview of log analysis](https://www.wired.com/2010/02/gather_users_data_from_server_logs/) that's worth a look.

Our point here, though, is to give you a complete look, end-to-end, of an HTTP request and the traces that are left behind.

### A Quick Exercise

Write some code in the box below to make an HTTP request to The New York Times homepage. After you make the request, print the homepage HTML. Ready? Go

In [None]:
# put your code here! 




**Follow-up Question**: how might you go about collecting all of The New York Times headlines programatically if this was your only means (requesting their homepage)? Take a look through the NYTimes homepage HTML and see if you see any patterns. Code which fetches a page, like the NYTimes, and parses out the headlines is an example of "web scraping."

There is a lot more to learn about HTTP and "web scraping" but we'll pick that up in future lessons. For now, let's move on to APIs!

## What's an API?

An API, or application programming interface, allows you to specify the data you want and returns it in a computer-friendly format like [JSON](https://www.json.org/) or [XML](https://en.wikipedia.org/wiki/XML) rather than HTML. The "interface" is a regularized way to make requests, and a consistent specification for the data you asked for. So many organizations now publish APIs for their data. From [The New York Times](https://developer.nytimes.com/) to [ProPublica](https://propublica.github.io/campaign-finance-api-docs/), to governmental organizations like the [EPA](https://developer.epa.gov/category/api/), to social media sites like [Twitter](https://developer.twitter.com/en/docs) and [Instagram](https://www.instagram.com/developer/) and [LinkedIn](https://developer.linkedin.com).

**The idea of an API is quite old,** and in fact APIs exist throughout the operating system in your computer. There is an API that lets different applications on your computer access printing capabilities, or communicate via your wireless hardware. These APIs, again, provide application developers with a regularized way to access services. So Word's print screen looks like the print screen from your PDF previewer or even Photoshop.

**Then in time, the services that were being advertised moved from your computer to the web.** So-called "mashups" came on the scene that let you feed data from one service into another. To put this in a vague historical perspective, if Web 1.0 meant putting your content online, then Web 2.0 was about cooperation between sites, sharing data via the internet to build new services. 

Salesforce.com led the way with its API in 2000 (I believe), recognizing that customers needed the same data across different platforms. Ebay followed, providing an API so that others could embed their data and services. Personally, it was the Google Maps' API that really drove the idea home. It appeared in 2006 and immediately spawned a number of mapping mashups. You can read about the history of APIs from [a services perspective](https://history.apievangelist.com/), [as evolution of the mashup](https://www.ibm.com/developerworks/library/x-mashups/index.html), or as [a technical innovation](http://www.openlegacy.com/blog/the-history-of-apis-and-how-they-impact-your-future), eventually leading back to a [PhD thesis in 2000 by Roy Fieldings](http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm) laying out the whole scheme. 

Today there are so many APIs it's hard to keep track. Look at the growth, captured by the "readmeblog".

<img src=https://blog.readme.io/content/images/2016/11/Screenshot-2016-11-01-16.01.29.png width=500>

Ah, but fortunately someone is keeping track for us! Have a look at [ProgrammableWeb](https://www.programmableweb.com/) for all the latest APIs. 

**Each API can be a story!** Knowing that an API is often underneath the communication between a data source and an application, people often snoop a little and get access to the underlying data. Here's an example from yesterday. Do you play HQ?

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('DUZAI2tSnbI')

For those of you unfamiliar with this twice-daily live quiz show, the Times published a profile at the end of last year. [Here is their article](https://www.nytimes.com/2017/12/03/business/media/hq-trivia-app.html), and note the very Timesian approach, at least the slightly jarring shift in aesthetic between the "in your face" of the game and the black-and-white portrait of the founders.

In any event, someone noticed that there was a simple API handling the communication between your phone and the HQ question server. That let them pull the question when it was available via code and then post the question to Wikipedia or some other knowledge source automatically (in one Python notebook, say) and then fire off an answer. Usually there isn't time to look things up during the game as you have seconds to select the answer, but this "bot" could act more quickly. [The Daily Beast had the story.](https://www.thedailybeast.com/hq-the-worlds-most-popular-trivia-app-just-got-hacked-by-a-bot?via=twitter_page)

Here's another unadvertised API. Everytime you start typing something into your Chrome browser or the Google search box, it will make suggestions for you. That's all negotiated by API. Here is how you'd access that programmatically. 

In [None]:
from json import loads

# make a request -- here it is like we've typed "donald trump is"
url = "http://google.com/complete/search?client=firefox&q=donald trump is"
response = get(url)

# turn the JSON string response into a Python object
data = loads(response.text)

# show the object
data

So we see we get a JSON response. What kind of thing is it in terms of Python objects? Now, what kinds of bots could you write using this API? What story might you pursue?

Technical note. A URL can't include spaces, but the __requests__ package and your browser is now smart enough clean things up before they send it to Google or whatever service you're pulling data from. So adding "donald trump is" as the query string is strictly speaking not right, but the environment is making up for the mistake. We'll say more about character encodings later. This is day 3 after all!

### API Authentication

Most API providers require you as the developer to use a form of authentication while using their APIs. There are various forms of authentication: oauth, api keys and even username and passwords.

For example, like [The New York Times](https://developer.nytimes.com/) only require that you use an API key when making API calls. With API keys, you usually just pass the key in your API calls, like:

```
https://developer.nytimes.com/article_search_v2.json?api_key=abcxyz&q=tesla
```

[OAuth](https://en.wikipedia.org/wiki/OAuth) is a bit more complicated but provides more fine-grained control for the API service as well as the users. Let's come back to it right after we set up our Twitter API keys (yep, they use OAuth for their API authentication).


## Using The Twitter API

To access the Twitter API, we need to register an "application" that will be pulling data from their service. This means that in one sweet instant you have become a developer! 😮 The steps are pretty easy and listed below. You'll first need to get a set of "keys" to use the API and then install a Python library that exposes the Twitter API through special objects. You are well on your way to writing your very own mis-information bot! (kidding)
    
**A) Get Your API Keys**

If you don't already have credentials for Twitter, you have to create an application and generate a set of keys (an API key, API secret, Access token and Access token secret) on the Twitter developer site. There are five easy steps!

1. Create a Twitter user account if you do not already have one.
2. Go to [https://apps.twitter.com](https://apps.twitter.com/) and log in with your Twitter user account. This step gives you a Twitter developer's account under the same name as your user account. (Um, and congratulations! You're now a developer!)
3. Click “Create New App”
4. Fill out the form, agree to the terms, and click “Create your Twitter application”
5. In the next page, click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret”. Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.

Once you have your tokens, copy them into the variables below:

In [None]:
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""

**B) Install the Tweepy Library**

The developer community has created [hundreds of Twitter libraries](https://dev.twitter.com/resources/twitter-libraries) that help you access Twitter's API. By "help" we mean they have created objects that hide the details of making requests for data from Twitter, and leave you with a clean coding interface. Your requests to Twitter are in the form of neat methods (verbs) that return data on users, their statuses and followers. You can even post tweets using these libraries.

We will by using Tweepy to call the Twitter API. Why? It has many of the best features of the other libraries and its documentation is complete. Often, free software projects can be thinly documented, leaving you a little out to sea if you have a problem.

Keep these two links open in tabs as we go through the code below: [Tweepy documentation](http://tweepy.readthedocs.io/en/v3.5.0/
) and [source code](https://github.com/tweepy/tweepy).

Use the following to install the Tweepy library (version 3.5) on your machine. (Again, if you are on a Windows machine, you replace "%%sh" with "%%cmd".)

In [None]:
%%sh
pip install tweepy==3.5.0

In [None]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# create an object we will use to communicate with the Twitter API
api = API(auth)
print(type(api))

### Getting a User's Profile Info

Now you are prepped and ready to start making Twitter API calls. First, lets look at some user profiles. 

We will be calling the `users/show` API. [It is documented by Twitter here.](https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-users-show)

To call the `user/show` API, Tweepy has a method called [`get_user`](http://tweepy.readthedocs.io/en/v3.5.0/api.html#user-methods)

In [None]:
# get a user's profile (the 'nytimes' in this case)
user = api.get_user('myoung')

help(type(user))

What sort of information do we get about a user? Take a look at the `user/show` documenation and then print out some of the user attributes below.

In [None]:
# let's print out a few attributes for our user
print(user.id)
print(user.screen_name)
print(user.location)
print(user.followers_count)
print(user.listed_count)
print(user.statuses_count)
print(user.created_at)

By calling `api.get_user()` above, the Tweepy library made an API call to Twitter. The result of the API call is technically a JSON string. As we did last week, we could parse it into primitive Python objects like lists and dictionaries and numbers and strings. Tweepy creates high-level objects to represent the result of an API call. This is why you access ".screen_name" and ".followers_count" as attributes of the object. 

Objects have both data and methods and the methods for this object are things like follow() and unfollow() the user. All of this conveniently wrapped up in a high-level object.

**Try This!** 

Modify the code above to get the user profile information for your own account, and then `@realDonaldTrump`, and then one other person of your choosing. 

In [None]:
# Your code here



### Send a Tweet!

If you were writing a bot (which we'll do soon), it would need to tweet! You can send out a tweet with one line of code.

We will be using the `statuses/update` API to send the tweet. Again, like all of Twitters API functionality, it has documentation and you can [read about posting status updates here.]( https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/post-statuses-update)

In [None]:
# send a tweet!
# you may want to modify this message before you send it :-)

api.update_status(status='I love this class!')

As we said, through the API, you can do anything through Python that you could do via your Twitter interface. So, you can tweet, you can make friends, you can send a direct message, you can read your direct messages... the works!

### Look at a User's Tweets

Now, let's look at [`@realDonaldTrump`](https://twitter.com/realDonaldTrump)'s tweets. If you've had enough of all that, replace it with [`@justinbieber`](https://twitter.com/justinbieber) or someone less anxious-making.

We will use the `statuses/user_timeline` API to do this. [Read the documentation here.](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html) Essentially it returns tweets in reverse chronological order -- newest first.

In [None]:
# get the justinbieber last 10 tweets
tweets = api.user_timeline(screen_name='justinbieber', count=10)
print(type(tweets))
print(len(tweets))

Like the user profile, the tweet object contains information like the text of the tweet, it's retweet count and so on. Here is some basic information about Bieb's first most recent tweet. This is stored under index 0 in the list of tweets we fetched from Twitter.

In [None]:
# select one of the 10 tweets, here the first one
tweet = tweets[0]

# print out some of the data associated with the "status" object
# remember our "dot" notation for accessing data and functionality
print(tweet.text)
print(tweet.created_at)
print(tweet.retweet_count)
print(tweet.source)

**Question** If we want to print out all 10 of these tweets, how would we do that? Would we do something like this?

In [None]:
print(tweets[0].text)
print(tweets[1].text)
print(tweets[2].text)
print(tweets[3].text)
print(tweets[4].text)
print(tweets[5].text)

What if we needed to print out 100 tweets instead of 10? This doesn't seem like the right approach.

## Introducing Loops! 

Loops are available in most programming languages and they simply allow code to be executed repeatedly. Let's see what this means by looking at an example.

In [None]:
# Introducing Loops! 
# say we have a list with three names (strings) in it
teachers_names = ['mark', 'emily', 'mike']

# we can print out the length of the list
print("The list has", len(teachers_names), "elements")

# we can also print out the 1st, 2nd and 3rd element in the list
print(teachers_names[0])
print(teachers_names[1])
print(teachers_names[2])

In [None]:
# the following is an example of a "for" loop
# we will "loop through" the list and print out each name

for name in teachers_names:
    # start of the code to run each time we go through the loop
    print(name)
    # end of the code to run each time we go through the loop


Let's put a `for` loop to use with some tweets. You can fetch the most recent 10 tweets from `@realDonaldTrump` and then loop over the tweets, printing each one out. Here, a tweet object from Tweepy has data attributes like the text of the tweet, stored in `".text"`

In [None]:
# get the "real" Donald's last 10 tweets
tweets = api.user_timeline(screen_name='realDonaldTrump', count=10)

# loop over the tweets and print out the tweet text
for tweet in tweets:
    print(tweet.text)

**Try This!**

Use the example above to get the latest tweets for yourself, `@nytimes`, etc. What other information would be useful to have besides the text of the tweet?

In [None]:
# Your code here!



Finally, here's how we pulled data about the shutdown hashtags last time. We used the search API from Twitter, made visible via Tweepy.

In [None]:
tweets = api.search("schumershutdown")

for tweet in tweets:
    print(tweet.text)

Now you try!

In [None]:
# Your code here




### A second swing at Schumer v. Trump and the shutdown ###

You've now seen different ways to represent a tweet in Python. We've seen dictionaries in the last class, a Tweepy object specially designed as part of a larger suite of code to help you interact with Twitter painlessly... and then as a Pandas DataFrame where each tweet is represented as a row in a table, with the columns holding fixed information for each tweet.

We will work interchangeably between these different formats depending on the kind of analysis we want to do. In the case of tables, we have a long, long history of processing spreadsheet data quickly and easily... and reproducibly. The code for Pandas, say, is readable and powerful. We can uncover easily some basic facts about the data with very little typing. The Tweepy lists or a list of dictionaries (one dictionary per tweet) is harder to work with in many cases.

So let's do a little Pandas review using the shutdown and the subsequent hashtag wars as an example. First, download the tweets per time period data sets "schumer_timeseries.csv" and "trump_timeseries.csv" from the data folder of our GitHub. (Download them again as I have updated them.)

In [None]:
# Read in the CSV files and store the DataFrames in variables called "trump_time" and "schumer_time.
# Have a look. What are you begging to do next?

from pandas import read_csv

trump_time = read_csv("trump_time.csv")
schumer_time = read_csv("schumer_time.csv")

schumer_time.head()

The resulting tables have one row per 10 minute interval and then the count of times #schumershutdown was used versus #trumpshutdown. We used the search API to pull the tweets. 

Now, given these counts, the natural next step is to try to make a graph to see how they compare. For that we will use plotly initially. It works well with the notebook. You first have to install it (again, Windows users replace %%sh with %%cmd).

In [None]:
%%sh
pip install plotly

And now we make a plot. It would be good if you all create your own plotly credentials by logging in at plot.ly and asking for an API key. (This is getting familiar right? Now we are sending data in our request and getting back a plot!)

In [None]:
from plotly.plotly import iplot, sign_in
import plotly.graph_objs as go 

# sign into the service (get your own credentials!)
sign_in("cocteautt","8YLww0QuMPVQ46meAMaq")

# create a plot of two lines, one for each hashtag
myplot_parts = [go.Scatter(x=trump_time["time"],y=trump_time["count_trump"],name="#trumpshutdown"),
                go.Scatter(x=schumer_time["time"],y=schumer_time["count_schumer"],name="#schumershutdown")]

# make a figure from these two lines...
myfigure = go.Figure(data=myplot_parts)

# ... and plot it (the filename is a convention plotly needs in case you want to use it later)
iplot(myfigure,filename="hashtag_usage")

A couple comments. The time is UTC so everyting is 5 hours ahead of ET. That means 5am on this chart is when the government shutdown. You see spikes in both curves then. Each curve is the number of times the different hashtags were mentioned, tabulated in 10-minute chunks. What do you see?


Now, we are going to read in the data of  the raw tweets from over the weekend around the shutdown.  Again, we have two files, one for #schumershutdown and one for #trumpshutdown -- called ["schumer_all.csv"](https://www.dropbox.com/s/m4xei1yz96juka5/schumer_all.csv.gz?dl=0) and ["trump_all.csv"](https://www.dropbox.com/s/itni7y9lqax12zh/trump_all.csv.gz?dl=0) respectively (hosted on Dropbox because they were a little big for Github). Load them into DataFrames called "trump" and "schumer". 

In [None]:
# read in the trump_all.csv and schumer_all.csv, calling them "trump" and "schumer" respectively

trump = read_csv("trump_all.csv")
trump.head()

In [None]:
schumer = read_csv("schumer_all.csv")
schumer.head()

Each row is a tweet mentioning one of our hashtags. You see the screen name of the person tweeting, when they tweeted, what they tweeted and how they tweeted. 

Who is active? Who has been using these hashtags a lot? The following command will count the number of times each person tweeted #schumershutdown, ordering with the most active tweeter at the top.

In [None]:
schumer["screen_name"].value_counts()

Now, have a look!

In [None]:
from pandas import set_option

# Set an option so we can display the full tweet text
set_option("display.max_colwidth",280)

# have a look at a few of the tweets (remember your subsetting)

schumer[schumer["screen_name"]=="JaxBay"]


What do you notice about this person and their rate of tweeting? We can now pick a few tweeters and plot out when they tweeted.

In [None]:
# make a list of the screen_names of the popular (frequent) tweeters of 
# either hashtag, and call it "popular" -- maybe 10 or so

popular = ["TottenBill","pattykvilla","KurtKrisher","Nikluk","congresstalks",
           "CongressRTBot","DJBurn77","rjakes65","mbsutton350",
           "JaxBay","StellaLelohan","Johnjon12532857","McGillGail","grassfed_butter"]

popular = schumer["screen_name"].value_counts().index[:25]
# then we can use the following construction to keep just those rows corresponding
# to tweets with one of the given screen names, calling the resulting dataframe "ptweets"
# uncomment the top line if you want to look at trump's popular tweeters instead of schumer's

ptweets = schumer[schumer["screen_name"].isin(popular)]

In [None]:
# You can also just take the top 25, say, and look at them

popular = schumer["screen_name"].value_counts().index[:25]
# then we can use the following construction to keep just those rows corresponding
# to tweets with one of the given screen names, calling the resulting dataframe "ptweets"
# uncomment the top line if you want to look at trump's popular tweeters instead of schumer's

ptweets = schumer[schumer["screen_name"].isin(popular)]

Finally, have a look. 

In [None]:
myplot_parts = [go.Scatter(x=ptweets["time"],y=ptweets["screen_name"],mode="markers")]
mylayout = go.Layout(autosize=False, height=900,width=1000,margin=go.Margin(l=150,r=50,b=100,t=100,pad=4))

myfigure = go.Figure(data=myplot_parts,layout=mylayout)

iplot(myfigure,filename="whoistweeting")

Here we pick someone and just look at their tweets. Next we might see if certain tweets were retweeted a lot. Who is involved? Can we uncover a network of timed coordination?

In [None]:
schumer[schumer["screen_name"]=="cheryl42058"]

## Appendix

Latency of AT&T Network: http://ipnetwork.bgtmo.ip.att.net/pws/network_delay.html