<a href="https://colab.research.google.com/github/NIP-Data-Computation/show-and-tell/blob/master/piercel_week3_notes3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Author**: Pierce Lopez <br>
**Date Created**: August 19, 2020 <br>
**Last Updated**: August 20, 2020 <br> 
**Description**: Contains my notes on the Data Analyst lesson: _Intermediate Importing Data in Python_.

# Intermediate Importing Data in Python

## Chapter 1: Importing Data from the Internet

For this chapter, we will make use of `pandas` functions, so do not forget to import the necessary modules!

```
import pandas as pd
```

### Section 1: Importing flat files from the web
1. The `urllib` Package
* Has functions capable of fetching data on the Internet.
* `urlopen()`: function that accepts URLs for filenames.

2. How to Automate File Downloads in Python?

```
# import package
from urllib.request import urlretrieve

# write contents of the url to a file
urlretrieve(url, "contentsarewrittenhere.csv")

# read file into a DataFrame
df = pd.read_csv("contentsarewrittenhere.csv")
```

<br>

### Section 2: HTTP requests to import files from the web
1. URL (Universal Resource Locator)
* Ingredients:
  * Protocol identifier - `http:`
  * Resource name - `datacamp.com`

2. HTTP (Hyper Text Transfer Protocol)
* Foundation of data communication for the web.
* Going to a website sends an HTTP request (i.e. GET request).
* `urlretrieve()` performs a GET request.

3. GET Requests Using `urllib`

```
# import package
from urllib.request import urlopen, Request

# make a GET request
request = Request(url)

# send request and catch response
response = urlopen(request)

# return HTML as a string
html = response.read()

# do not forget to close the response!
response.close()
```

4. GET Requests Using `requests`

```
# import package
import requests

# package and send request plus catch response
r = requests.get(url)

# return HTML as a string
text = r.text
```

<br>

### Section 3: Scraping the web in Python
1. BeautifulSoup
* Parses and extracts structured data from HTML

```
# import package
from bs4 import BeautifulSoup
import requests

# package and send request plus catch response
r = requests.get(url)

# return HTML as a string
html_doc = r.text

# Beautify!
soup = BeautifulSoup(html_doc)
```

2. Exploring BeautifulSoup
* `soup.title` gets title
* `soup.get_text()` gets text
* `soup.find_all()` can extract URLs of all hyperlinks

```
# extract URLs of all hyperlinks
for link in soup.find_all('a'):
  print(link.get('href'))
```

<br>

## Chapter 2: Interacting with APIs to Import Data From the Web

### Section 1: Introduction to APIs and JSONs
1. APIs (Application Programming Interfaces)
* A set of protocols and routines for building and interacting with software applications.

2. JSONs (JavaScript Object Notations)
* Dictionaries in Python!

3. Loading JSONs in Python

```
# import package
import json

# read by context manager
with open('file.json', 'r') as json_file:
  json_data = json.load(json_file)

# display key-value pairs
for key, value in json_data.items():
  print(key + ":", + value)
```

<br>

### Section 2: APIs and interacting with the world wide web
1. More Information About APIs
* Code that allows software programs to communicate with each other.

2. Connecting to an API in Python

```
# import package
import requests

# set url
url = 'http://www.omdbapi.com/?t=hackers'
# package and send request plus catch response
r = requests.get(url)

# return a dictionary (JSON data)
json_data = r.json()

# display key-value pairs
for key, value in json_data.items():
  print(key + ":", + value)
```

3. What URL Thooooose?
* `http` - making an HTTP request.
* `www.omdbapi.com` - querying the OMDB API.
* `?t=hackers` - query string that returns data for a movie title `t` _Hackers_ .

<br>

## Chapter 3: Diving Deep into the Twitter API

### Section 1: The Twitter API and authentication
1. Twitter Has a Number of APIs
* REST (Representational State Transfer) APIs - allows users to read and write Twitter data.
* Streaming API - monitor/process Tweets in real-time.
* Firehose API - access all public statuses.

**Note:** Tweets are returned as JSONs.

2. An Example of Using `tweepy` to Stream Tweets

```
# import packages
import tweepy, json

# add keys and access tokens
access_token = "secret"
access_token_secret = "secret"
consumer_key = "secret" 
consumer_key_secret = "secret"

# authentication handling
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# initialize stream listener
l = MyStreamListener()

# create stream object with authentication
stream = tweepy.Stream(auth, l)

# filter Twitter streams to track data by keywords:
stream.filter(track = ['clinton', 'trump', 'sanders', 'cruz'])
```