What is an API?
- Application Programming Interface
- Structured way to expose specific functionality and data access to users
- Web APIs usually follow the "REST" standard

Where to get started?
- Set time aside, these are not all plug and play.
- Many sites offer free api connects if you first signup to the site
- This can also be helpful with large companies that use services like Microsoft, Amazon, Google or other cloud solutions. 

- Documentation, searching for code examples and just setting time aside to understand the data you are getting  and how to parse through it is key. API data request can end up giving you data in many different formats with different parsing rules. 

How to interact with a REST API:
- Make a "request" to a specific URL (an "endpoint"), and get the data back in a "response"
- Most relevant request method for us is GET (other methods: POST, PUT, DELETE)
- Response is often JSON format
- Web console is sometimes available (allows you to explore an API)
- Most APIs require you to have an access key (which you should store outside your code)
- Most APIs limit the number of API calls you can make (per day, hour, minute, etc.)
- Not all APIs are free
- Not all APIs are well-documented
- Pay attention to the API version

Python wrapper is another option for accessing an API:
- Set of functions that "wrap" the API code for ease of use
- Potentially simplifies your code
- But, wrapper could have bugs or be out-of-date or poorly documented
'''



Public Api list
https://github.com/public-apis/public-apis

Lets start with a connection to some stocks data
This link is where you can set up and get an api key
https://www.alphavantage.co/support/#api-key


In [1]:
import requests
import json
import pandas as pd

In [2]:

# replace the "demo" apikey below with your own key from https://www.alphavantage.co/support/#api-key
url = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=IBM&interval=5min&apikey=CLBW4A2DP4598PHT'
r = requests.get(url)
data = r.json()
df = pd.DataFrame(data)
print(df)

                                                             Meta Data  \
1. Information       Intraday (5min) open, high, low, close prices ...   
2. Symbol                                                          IBM   
3. Last Refreshed                                  2024-01-05 19:55:00   
4. Interval                                                       5min   
5. Output Size                                                 Compact   
...                                                                ...   
2024-01-05 10:45:00                                                NaN   
2024-01-05 10:40:00                                                NaN   
2024-01-05 10:35:00                                                NaN   
2024-01-05 10:30:00                                                NaN   
2024-01-05 10:25:00                                                NaN   

                                                    Time Series (5min)  
1. Information                        

In [3]:
df.head()

Unnamed: 0,Meta Data,Time Series (5min)
1. Information,"Intraday (5min) open, high, low, close prices ...",
2. Symbol,IBM,
3. Last Refreshed,2024-01-05 19:55:00,
4. Interval,5min,
5. Output Size,Compact,


In [4]:
df.columns

Index(['Meta Data', 'Time Series (5min)'], dtype='object')

now that you have the stocks api set up you can begin reviewing documentation and expeirmenting
Lets look at another api the omdb open movie database

In [5]:
import requests
key =''
url ='http://www.omdbapi.com/?t=Fellowship+of+the+ring&apikey=8392c9fa'

r = requests.get(url, key)

Are we connected?

In [6]:
r.status_code

200

In [7]:
#you can manually parse data or find the right libarary and function that has done the code for you
r.json()
# view the raw response text
r.text

'{"Title":"The Lord of the Rings: The Fellowship of the Ring","Year":"2001","Rated":"PG-13","Released":"19 Dec 2001","Runtime":"178 min","Genre":"Action, Adventure, Drama","Director":"Peter Jackson","Writer":"J.R.R. Tolkien, Fran Walsh, Philippa Boyens","Actors":"Elijah Wood, Ian McKellen, Orlando Bloom","Plot":"A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle-earth from the Dark Lord Sauron.","Language":"English, Sindarin","Country":"New Zealand, United States","Awards":"Won 4 Oscars. 125 wins & 127 nominations total","Poster":"https://m.media-amazon.com/images/M/MV5BN2EyZjM3NzUtNWUzMi00MTgxLWI0NTctMzY4M2VlOTdjZWRiXkEyXkFqcGdeQXVyNDUzOTQ5MjY@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"8.9/10"},{"Source":"Rotten Tomatoes","Value":"91%"},{"Source":"Metacritic","Value":"92/100"}],"Metascore":"92","imdbRating":"8.9","imdbVotes":"1,973,498","imdbID":"tt0120737","Type":"movie","DVD":"28 Jun 20

In [8]:
r.headers

{'Date': 'Mon, 08 Jan 2024 22:01:48 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'public, max-age=86400', 'Expires': 'Mon, 08 Jan 2024 23:01:48 GMT', 'Last-Modified': 'Mon, 08 Jan 2024 22:01:48 GMT', 'Vary': '*, Accept-Encoding', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Access-Control-Allow-Origin': '*', 'CF-Cache-Status': 'MISS', 'Server': 'cloudflare', 'CF-RAY': '8427b89aad06307e-SEA', 'Content-Encoding': 'gzip'}

In [9]:
# decode the JSON response body into a dictionary
resp_json = r.json()
resp_json

{'Title': 'The Lord of the Rings: The Fellowship of the Ring',
 'Year': '2001',
 'Rated': 'PG-13',
 'Released': '19 Dec 2001',
 'Runtime': '178 min',
 'Genre': 'Action, Adventure, Drama',
 'Director': 'Peter Jackson',
 'Writer': 'J.R.R. Tolkien, Fran Walsh, Philippa Boyens',
 'Actors': 'Elijah Wood, Ian McKellen, Orlando Bloom',
 'Plot': 'A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle-earth from the Dark Lord Sauron.',
 'Language': 'English, Sindarin',
 'Country': 'New Zealand, United States',
 'Awards': 'Won 4 Oscars. 125 wins & 127 nominations total',
 'Poster': 'https://m.media-amazon.com/images/M/MV5BN2EyZjM3NzUtNWUzMi00MTgxLWI0NTctMzY4M2VlOTdjZWRiXkEyXkFqcGdeQXVyNDUzOTQ5MjY@._V1_SX300.jpg',
 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '8.9/10'},
  {'Source': 'Rotten Tomatoes', 'Value': '91%'},
  {'Source': 'Metacritic', 'Value': '92/100'}],
 'Metascore': '92',
 'imdbRating': '8.9',
 'imdbVotes'

In [10]:
resp_json = r.json()

In [11]:
r.json()['Year']

'2001'

In [12]:
# what happens if the movie name is not recognized?
r = requests.get('http://www.omdbapi.com/?t=blahblahblah&r=json&type=movie')
r.status_code

401

In [13]:
# bonus lets review and turn all of this into a function

In [14]:
# define a function to return the year
def get_movie_year(title):
    r = requests.get('http://www.omdbapi.com/?t=' + title + '&r=json&type=movie'+ '&apikey=8392c9fa')
    info = r.json()
    if info['Response'] == 'True':
        return int(info['Year'])
    else:
        return "error"

In [15]:
import pandas as pd
response_df = pd.DataFrame.from_dict(r.json(), orient = "index")
#movies_df = pd.DataFrame(foo)

In [16]:
# test the function
get_movie_year('The Two Towers')

2002

In [17]:
# test error
get_movie_year('blahblahblah')

'error'

### Practice API calls
you can use this list Public Api list https://github.com/public-apis/public-apis to explore or google or better yet
    what are some of your favorite websites and what is there api documenation. I began to prepare an example of a boardgamegeek.com api example
    however they use xml and in order to get into any examples it could have derailed just showing how api works. 
    Keep in mind you will want to set time aside to read documentation, experiement, become familiar and then begin building your process of collecting data.

In [18]:
# you can use the exercises template below to look at other API's and get curious. 

In [19]:
# Try on your own
# Link to the Census Bureau language stats API description page

# Look through the API description links and examples to see what use you have avaialble

# Use the requests library to interact with a URL

import requests
# Use a URL example in a browser to see the result returned and the use request to access with python
# http://api.census.gov/data/2013/language?get=EST,LANLABEL,NAME&for=state:06&LAN=625
r = requests.get('http://api.census.gov/data/2013/language?get=EST,LANLABEL,NAME&for=state:06&LAN=625')

In [20]:
# modify the request to get languges 625 through 650 so we can see a larger sample of what is returned from the request
# Hint the syntax for more than one language number is similar to one we use for multiple elements in a list

In [21]:
# check the status: 200 means success, 4xx means error


In [22]:
# view the raw response text

In [23]:
# Convert to json()

In [24]:
#look at the contents of the output of the json() method.  It looks like it can easily become a list of lists


In [None]:
# Convert the json() method output into a dataframe with the first list as the column header and the rest as rows of data

In [None]:
# Sort the dataframe decending by the number of people speaking the language
# Check the data type of 'EST', the number of people that speak the language

In [None]:
# Now create a new request that brings in the stats for all the US and primary languages
# See the websites links for syntax for US and range of language nunbers

In [None]:
### Bonus
# Create a loop that will collect the counts of Spanish language speakers by state

### Web Scraping

What is web scraping?
- Extracting information from websites (simulates a human copying and pasting)
- Based on finding patterns in website code (usually HTML)

What are best practices for web scraping?
- Scraping too many pages too fast can get your IP address blocked
- Pay attention to the robots exclusion standard (robots.txt)
- Let's look at http://www.imdb.com/robots.txt

What is HTML?
- Code interpreted by a web browser to produce ("render") a web page
- Let's look at example.html
- Tags are opened and closed
- Tags have optional attributes

How to view HTML code:
- To view the entire page: "View Source" or "View Page Source" or "Show Page Source"
- To view a specific part: "Inspect Element"
- Safari users: Safari menu, Preferences, Advanced, Show Develop menu in menu bar
- Let's inspect example.html

Steps
- Fine the URL you want to scrape
- Inspect the page(right click and choose inspect)
- Find the data you wan to extract
- Write the code
- Run code and extract the data
- Store data in required format
'''


In [22]:
import requests
from bs4 import BeautifulSoup

page = requests.get('https://toscrape.com/') # Getting page HTML through request
b = BeautifulSoup(page.content, 'html.parser') # Parsing content using beautifulsoup

print(soup.prettify(b))
print(b)
print(type(b))

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <title>
   Scraping Sandbox
  </title>
  <link href="./css/bootstrap.min.css" rel="stylesheet"/>
  <link href="./css/main.css" rel="stylesheet"/>
 </head>
 <body>
  <div class="container">
   <div class="row">
    <div class="col-md-1">
    </div>
    <div class="col-md-10 well">
     <img class="logo" src="img/zyte.png" width="200px"/>
     <h1 class="text-right">
      Web Scraping Sandbox
     </h1>
    </div>
   </div>
   <div class="row">
    <div class="col-md-1">
    </div>
    <div class="col-md-10">
     <h2>
      Books
     </h2>
     <p>
      A
      <a href="http://books.toscrape.com">
       fictional bookstore
      </a>
      that desperately wants to be scraped. It's a safe place for beginners learning web scraping and for developers validating their scraping technologies as well. Available at:
      <a href="http://books.toscrape.com">
       books.toscra

In [23]:
page = requests.get('https://scrapethissite.com/') # Getting page HTML through request
b = BeautifulSoup(page.content, 'html.parser') # Parsing content using beautifulsoup
print(type(b))
print(b)
print(soup.prettify(b))




<class 'bs4.BeautifulSoup'>
<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Scrape This Site | A public sandbox for learning web scraping</title>
<link href="/static/images/scraper-icon.png" rel="icon" type="image/png"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="A public sandbox for learning web scraping" name="description"/>
<link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" integrity="sha256-MfvZlkHCEqatNoGiOXveE8FIwMzZg4W85qfrfIFBfYc= sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ==" rel="stylesheet"/>
<link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css"/>
<link href="/static/css/styles.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<nav id="site-nav">
<div class="container">
<div class="col-md-12">
<ul class="nav nav-tabs">
<li id="nav-homepage">
<a class="nav

In [24]:
# 'find' method returns the first matching Tag (and everything inside of it)
b.find(name='body')
b.find(name='h1')
b.find(name='title')

<title>Scrape This Site | A public sandbox for learning web scraping</title>

In [25]:
# Tags allow you to access the 'inside text'
b.find(name='h1').text

'\n                            Scrape This Site\n                        '

In [26]:
# 'find_all' method is useful for finding all matching Tags
b.find(name='p')        # returns a Tag
b.find_all(name='p')    # returns a ResultSet (like a list of Tags)

[<p class="lead">
                             The internet's best resource for learning <strong>web scraping</strong>.
                         </p>]

In [27]:
# ResultSets can be sliced like lists
len(b.find_all(name='p'))
b.find_all(name='p')[0]
b.find_all(name='p')[0].text

"\n                            The internet's best resource for learning web scraping.\n                        "

In [28]:
# iterate over a ResultSet
results = b.find_all(name='p')
for tag in results:
    print(tag.text)


                            The internet's best resource for learning web scraping.
                        
