# Finding Web APIs & Storing Data with Python (Rutgers Example)

In this tutorial, we’re going to learn a practical workflow for fetching data from web APIs and storing it into a database.

1. **Observe** what API request a website makes (using Chrome DevTools)
2. **Recreate** that same request in Python (with `requests`)
3. **Store** the data in a local **SQLite** database

We’ll use the [Rutgers Schedule of Classes website](https://classes.rutgers.edu/soc/#home) as a real example, but the process applies to almost any website.


### What you’ll end with
By the end, you’ll have:
- A working Python script (inside this notebook) that fetches real data from an API endpoint
- A SQLite database file saved locally in the `data/` folder
- A repeatable pattern you can use on other websites


## Before we start (quick notes)

- This tutorial is for **educational purposes**. Always respect a website’s Terms of Service and avoid sending excessive requests.
- When you recreate API requests outside the browser (such as with Python), some things may not work exactly the same way. This is normal. The goal is to learn the method, not just memorize one endpoint.
- This is a very basic introduction. Real-world APIs can be more complex, requiring authentication, rate limiting, and other considerations.

**Now let's get into the tutorial!**

## Background: What is an API?

An **API** (Application Programming Interface) is a structured way for one program to communicate with another.

When you use a modern website, your browser often does this behind the scenes:
- You click a button (like “Search”)
- The site sends a request to an API endpoint which is handled by the backend server
- The server responds with data (often in JSON format)
- The browser uses that JSON to update the page

That API request is what we’re going to “copy” — not by scraping the page HTML, but by calling the same endpoint directly.

If you want to learn more about APIs in detail, check out this article posted by AWS: [What is an API?](https://aws.amazon.com/what-is/api/)


## Background: What makes up a network request?

When you inspect a network request, you'll see it has several key components:

### Overview of components

##### 1) Method
This tells the server the type of request being made. Most common are **GET** and **POST**.
- GET requests usally retrieved data with the paramaters in the URL
- POST requests usually send data in the request body

##### 2) URL (Endpoint)
This is the web address where the request is sent.
* It often looks like `https://example.com/api/resource`


##### 3) Parameters
These control what data you get back in the case of a GET request, or what data you send in the case of a POST request.
- For GET requests, parameters are often appended to a URL as query strings (e.g., `?key=value&key2=value2` or the full url: `https://example.com/api/resource?key=value&key2=value2`)
- For POST requests, parameters are often in the request body (e.g., JSON or form data)

##### 4) Headers
These provide additional information about the request which can affect how the server processes it. Common headers include:
- `Content-Type`: Indicates the media type of the resource being sent for POST requests (e.g., `application/json`)
- `Accept`: Specifies the type of response the client can handle for GET requests (e.g., `application/json`) 
- `Authorization`: Contains credentials for authentication (if required)
- `User-Agent`: Identifies the client software making the request


##### 5) Response
This is the data the server sends back after processing the request. For APIs, it’s often in **JSON** format, but it can also be XML, HTML, or plain text.



## Inspecting Network Requests on the Schedule of Classes website with Chrome DevTools

Modern websites rarely load all their data at once. Instead, they make **network requests** to fetch data dynamically when you interact with the page.

To find the API request we want to recreate, we’ll use **Chrome DevTools**.

### Step 1: Open Chrome DevTools
1. Open Google Chrome and navigate to the [Rutgers Schedule of Classes website](https://classes.rutgers.edu/soc/#home) (https://classes.rutgers.edu/soc/#home).
2. Right-click anywhere on the page and select **Inspect** to open DevTools.
3. Click on the **Network** tab in DevTools to monitor network activity.

### Step 2: Filter for XHR/Fetch Requests

1. In the Network tab, look for the filter options and click on **XHR** or **Fetch**. This will only show the API requests made.
2. At the top of the inspect panel, also click on the Disable cache checkbox to ensure all requests are captured. (Cache prevents requests from being sent again if the data is already stored locally)

### Step 3: Perform a Search on the Website

1. On the Schedule of Classes website, fill out the search form (select a term, location, and level) and click the **Continue** button to submit the form.
2. Watch the Network tab for new requests that appear when you click Search.

### Step 4: Identify the Relevant API Request
1. Look for a request that seems to correspond to the search you just performed.
2. Click on that request and confirm that it returns the data you expect (by checking the **Response** tab).

## Step 5: Note down the request details
1. Copy the **URL** (Endpoint) by right-clicking on it and selecting **Copy > Copy URL**.
2. Navigate to the **Headers** tab of the selected request.
3. Note down the **Method** (GET or POST).
4. Scroll down to the **Request Headers** section and copy important headers like `Content-Type`, `Accept`, and any others that seem relevant. (For this example, we will only need `Content-Type` and `Accept` headers)




