---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Data Acquisition</h1>

---
<h1 align="center">Lecture 4 (Acquiring Data from APIs)</h1>

---
<img align="left" width="450" src="images/acq.PNG"  >
<img align="center" width="500" height="850"  src="images/whatisapi.PNG"  >

## Table of Contents

- What is an API?
- Introduction: Understanding APIs for Data Transfer
- Making API Requests in Python
- Example: Weather API
- API Status Codes
- API Documentation
- Reading API Documentation (Example: the NASA Image and Video Library)¶
- Reasons to Use API

# What is an API?
<img align="right" width="600"   src="images/What-is-an-API.PNG"  >

- **`API`** stands for *`Application Programming Interface`.*
- In simple terms, it’s a` set of rules and protocol`s that allow different software applications to communicate and interact with each other.
- APIs define the methods and data formats that applications can use t`o request and exchan`ge information.

To retrieve data from a web server:
- A client application initiates a request.
- The server responds with the requesteddata.

- APIs facilitate this communication by serving as intermediaries.
- They allow seamless integration between diverse software systems.
- APIs act as bridges that enable the smooth exchange of data and functionality.
- They enhance interoperability across various applications.


## Introduction: Understanding APIs for Data Transfer
### Transfering Data Over the Internet
Where do we get our data? That depends on who is providing the data. In the simplest of cases, the data provider gives us a single file in CSV or JSON format. But some of the most important data providers are extremely big operations: social media companies, search engines, governments, finance firms, etc. Getting data from these organizations is more complicated than asking for a data file, but it is very possible to do using the application programming interfaces (APIs) that these organizations have set-up and made available online.

It's not feasible for these organizations to interact personally with every individual who requests data because every request may be unique, asking for different records, features, timeframes, locations, and formatting. In addition, to fulfill a request, the data provider might need information from the data requester first: for example, if I want Google Maps to give me the latitudinal and longitudinal coordinates of my house, I first need to tell Google Maps my address. In short, there are too many requests for humans to handle given that each request requires a conversation between the person providing the data and the person requesting the data.

Organizations instead store and share their data electronically through automated **web applications**. A web application has three parts: the user interface (the frontend), the data storage and organization system (the backend), and an API that connects the frontend and backend. Many authors, including [Huang, Chung, and Chan](https://www.oreilly.com/library/view/python-api-development/9781838983994/), use the analogy of a restaurant to describe an API. Customers place orders for the chefs to prepare, and waiters take the orders to the chefs, and return the food to the customers. In this analogy, customers are the frontend, issuing data requests; chefs are the backend, organizing and disseminating data; and waiters are the API, conducting the communication between the frontend and backend. Another analogy is the way our brains receive and react to sensory information. In this case, our senses comprise the API that connects the outside world to the internal workings of our brains.

APIs usually send and receive data in JSON (and sometimes XML) format. So being very comfortable with JSON formatted data is essential for working with APIs.

There are many reasons why big organizations would want to share their data. Governments might be required by law to provide general access to public data. For profit companies like Google and Twitter share data to enable the development of third-party web applications that do useful things with the data. Twitter recently bought out [aiden.ai](https://www.aiden.ai/), a company that uses Twitter data to train AI models for marketing. (It's a brilliant strategy. By making their data available, Twitter encourages the creation of start-ups that are entirely dependent on Twitter data with no additional investment from Twitter. Then Twitter can use its data as leverage to buy-out the most successful start-ups for below market price.) Financial firms and other companies charge subscription fees for access to their data.

A carefully-constructed API also gives data providers certain powers over the data users. First, data providers might first require that users register for access to the API, surrendering their own personal information in exchange for access keys. Second, data providers can tightly control what and how much data they share. This power is called **encapsulation**. [Taina Bucher (2013)](http://computationalculture.net/objects-of-intense-feeling-the-case-of-the-twitter-api/) writes that encapulation also gives data providers the power to shape the narrative that a selection of shared data will tell:

> While there is nothing wrong with using APIs to collect data, of course, researchers should be wary about letting any current obsessions with big data overshadow the fact that APIs are far from neutral tools. . . . APIs have ‘politics’, meaning that they can be seen as having ‘powerful consequences for the social activities that happen with them, and in the worlds imagined by them’.

So when using an API, it's a good idea to think about what data are being shared, and what data are not being shared, why, and what that means for the analysis you are working on.

# Making API Requests in Python

To work with APIs in Python, you'll need some tools, such as the 'requests' library. Before using it, you must install it in your system.

**Command to install 'requests':**
```bash
pip install requests


In [1]:
pip install requests

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [3]:
import requests

In [9]:
import pandas as pd

# Example: Weather API

Imagine you have a weather application on your smartphone that provides current weather conditions, forecasts, and other related information. This weather app relies on data from a remote weather service provider, which exposes its functionality through an API.

### Request
- When you open the weather app and want to check the current weather in your location, the app sends a request to the weather service's API.
- This request includes parameters like your location coordinates or city name.

### API Interaction
- The weather service's API receives the request and processes it.
- It then accesses its database or connects to external sources to gather the necessary weather data for your location.

### Response
- Once the weather service has retrieved the data, it sends a response back to the weather app.
- This response contains information such as the current temperature, humidity, wind speed, and weather conditions.

### App Display
- The weather app receives the response from the API and displays the weather information to you in a user-friendly format.
- This format may include a graphical interface showing temperature, icons representing weather conditions, and textual descriptions.

**In this scenario:**

- **API:** The weather service's API defines the methods (e.g., "getWeather") and data formats (e.g., JSON or XML) that the weather app can use to request weather information.

- **Client Application:** The weather app on your smartphone acts as the client application, which initiates requests to the weather service's API to retrieve weather data.

- **Server:** The weather service operates the server that hosts the API. It processes incoming requests, gathers the necessary data, and sends back responses toctionality and information.


> This example demonstrates how APIs enable different software systems, such as the weather app and the weather service, to communicate and exchange data, ultimately providing users with valuable functionality and information.
m

In [3]:
import requests
import json
# Function to get live stock data for a symbol
def get_stock_data():
	url = f"https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=IBM&interval=5min&outputsize=full&apikey=demo"
	response = requests.get(url)
	
	# Check if the response is successful
	if response.status_code == 200:
		data = response.json()
		last_refreshed = data["Meta Data"]["3. Last Refreshed"]
		price = data["Time Series (5min)"][last_refreshed]["1. open"]
		return price
	else:
		return None

stock_prices = {}
price = get_stock_data()
symbol="IBM"
if price is not None:
	stock_prices[symbol] = price

print(f"{symbol}: {price}")


IBM: 185.3000


### API Status Codes

Status codes provide information about the outcome of a request, indicating whether it was successfully executed or if there was an error during processing. They are returned with every request.

**Codes related to “GET” request:**
- **200 OK:** The server successfully processed the request, and the requested data is returned.
- **201 Created:** A new resource is created on the server as a result of the request.
- **204 No Content:** The request is successful, but there is no additional data to return.
- **300 Multiple Choices:** The requested resource has multiple representations, each with its own URL.
- **302 Found (Temporary Redirect):** The requested resource is temporarily located at a different URL.
- **304 Not Modified:** The client’s cached copy of the resource is still valid, and no re-download is necessary.
- **400 Bad Request:** The request has malformed syntax or contains invalid data, making it incomprehensible to the server.
- **401 Unauthorized:** Authentication is required, and the client’s credentials (e.g., API key) are missing or invalid.
- **500 Internal Server Error:** An unexpected server error occurred during request processing.
- **502 Bad Gateway:** Acting as a gateway or proxy, the server received an invalid response from an upstream server.

These status codes help communicate the outcome of API requests and guide developers and clients in understanding the results, errors, or necessary actions.


### API Documentation

API Documentation is essential for effective interaction. Here, we are using NewsAPI, which provides information regarding various news from different countries and celebrities. To get news updates from NewsAPI, we need a special key called an API key, stored in a variable named API_KEY.

Next, we build a specific URL that tells NewsAPI what kind of news we want, such as top business headlines from the United States. It's like specifying a preference to a librarian.

After setting up this request, our code sends a message to NewsAPI using this URL, similar to clicking on a link to see a webpage. NewsAPI replies with a status update, indicating if the request was successful. We then print out this status to check if everything worked as expected.


### Reading API Documentation (Example: the NASA Image and Video Library)
Most web-applications with an API include documentation to guide people who want to use the API. Knowing *how* to read the documentation, that is, knowing the jargon, is the most important skill for using an API. First, please note that like other kinds of documentation, some software authors take more care to produce good documentation than others. Still, all API documentation should share some universal characteristics:

* The most important piece of information to find in the documentation is the **root** and **endpoint**. The API root is a web-address for all resources that exist and are accessable through the API. The endpoint is a subdirectory where records of a specific type are stored. 

* Each endpoint will usually have a separate section in the API documentation that contains a description of the kinds of data stored in this endpoint, a list of **parameters** to narrow down the data being requested, and (hopefully lots of) examples. 

* Finally, if an API requires authentication, there should be instructions on how to register for an **access key**.

The goal is to generate a complete URL that points to the desired data by putting the root, endpoint, parameters, and access key together in the following way:

*root* **/** *endpoint* **?** *parameters=value* **&** *key=value*

Let's use as an example the API to access [NASA's Image and Video Library](https://images.nasa.gov/). The documentation for this API is available at https://images.nasa.gov/docs/images.nasa.gov_api_docs.pdf. This API does not require authentication, so we only need to find the root, endpoint, and parameters. Let's say that I want to find a photo of the moon. According to the documentation, the root is https://images-api.nasa.gov, and there are several endpoints including one dedicated to searches (`/search`). 

I scroll down to the section devoted to this endpoint and look at the parameters to see which ones will allow me to search for a photo of the moon. I can use the `q` parameter to enter free text as a search term, and the `media_type` parameter set equal to `image` to find a photo.

Therefore the complete URL for this API GET request is: https://images-api.nasa.gov/search?q=moon&media_type=image. Go ahead and click on this link. You will see a JSON file in which the metadata tells me there are 8626 hits in this search. Under the `data` header, the first result is an image entitled "Nearside of the Moon", which has a NASA ID code PIA12235.

Next I want to GET the URL for the JSON data related to this specific image. The endpoint to find a specific image is `/asset`. There's only one parameter, the NASA ID. Notice that the documentation lists this path as `asset/{nasa_id}` and NOT as `asset ? nasa_id = {nasa_id}`. That means that for this particular endpoint we simply enter a slash and the ID after the endpoint. So to see the JSON data stored for this image, the URL is https://images-api.nasa.gov/asset/PIA12235. This JSON contains the links to various image files. I can link to one of them to display it in this document:

<img src="https://images-assets.nasa.gov/image/PIA12235/PIA12235~small.jpg">

Some documentation also contain API schematics that visualize the endpoints and the interaction between the frontend and backend via the API. As an example, here's a schematic of an API for a project we are working on with [Code for Charlottesville](https://codeforcharlottesville.org), a local chapter of [Code for America](https://codeforamerica.org) that I help to run. Our project involves building software for the agenices in town that help people with housing insecurity find a place to live. Our web application will allow people at different agencies share their data about available rental properties and landlords who are willing to rent to people with vouchers and who might have other complicated situations. Here's our API's schematic:
<img src="https://github.com/code-for-charlottesville/housinghub/raw/master/backend/docs/JWT%20Auth%20Workflow.png" width=800>

In [4]:
import requests
# Replace 'API_KEY' with your actual API key from NewsAPI
API_KEY = '3805f6bbabcb42b3a0c08a489baf603d'
url = f"https://newsapi.org/v2/top-headlines?country=us&category=business&apiKey={API_KEY}"
response = requests.get(url)
print(response.status_code)


200


# Reasons to Use API

When it comes to obtaining data, the question arises: why use an API when we can get data in the form of a CSV file from the resource? Let's delve into some examples to understand the advantages of using APIs.

### Change in Data

- **Scenario:** Suppose we are using an API to collect data about the temperature of all the cities in India.
- **Challenge:** As the temperature changes with time, downloading a new CSV file every time we need the information would consume a significant amount of bandwidth and time for processing.
- **Advantage of API:** Using the API allows for easy and quick retrieval of the updated data.

### Size of Data

- **Scenario:** In certain scenarios, we may only require a small part of a large dataset.
- **Example:** If our goal is to obtain comments on a tweet, there's no need to download the entire dataset of Twitter.
- **Advantage of API:** APIs enable the extraction of specific, targeted data, minimizing unnecessary data transfer and optimizing the process for tasks that require only a fraction of the available information.
tion.


In [4]:
response = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US&page=1')

In [5]:
response

<Response [200]>

In [6]:
response.json()['results']

[{'adult': False,
  'backdrop_path': '/kXfqcdQKsToO0OUXHcrrNCHDBzO.jpg',
  'genre_ids': [18, 80],
  'id': 278,
  'original_language': 'en',
  'original_title': 'The Shawshank Redemption',
  'overview': 'Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.',
  'popularity': 123.595,
  'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg',
  'release_date': '1994-09-23',
  'title': 'The Shawshank Redemption',
  'video': False,
  'vote_average': 8.7,
  'vote_count': 25693},
 {'adult': False,
  'backdrop_path': '/tmU7GeKVybMWFButWEGl2M4GeiP.jpg',
  'genre_ids': [18, 80],
  'id': 238,
  'original_language': 'en',
  'original_title': 'The Godfather',
  'overview': '

In [10]:
df = pd.DataFrame()
for i in range(1,429):
    response = requests.get('https://api.themoviedb.org/3/movie/top_rated?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US&page={}'.format(i))
    temp_df = pd.DataFrame(response.json()['results'])[['id','title','overview','release_date','popularity','vote_average','vote_count']]
    df = df.append(temp_df,ignore_index=True)
df.head()

  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_index=True)
  df = df.append(temp_df,ignore_

Unnamed: 0,id,title,overview,release_date,popularity,vote_average,vote_count
0,278,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,1994-09-23,123.595,8.7,25693
1,238,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",1972-03-14,117.35,8.695,19520
2,240,The Godfather Part II,In the continuing saga of the Corleone crime f...,1974-12-20,83.042,8.576,11792
3,424,Schindler's List,The true story of how businessman Oskar Schind...,1993-12-15,69.774,8.566,15173
4,129,Spirited Away,"A young girl, Chihiro, becomes trapped in a st...",2001-07-20,98.97,8.539,15573


In [11]:
df.shape

(8560, 7)