# Lab 4: Working with JSON Data from the Open Brewery DB API

**Create a copy of this notebook and follow the steps below**

This notebook gives you starter code to use the **Open Brewery DB API** to extract data about breweries across the United States. This is a public API that does not require authentication, making it perfect for learning about APIs and JSON data.

You should review the DataCamp chapter **"Importing JSON Data and Working with APIs"** (Part of Course: Streamlined Data Ingestion with Pandas) before starting this exercise. This counts for class participation credit.

## Learning Objectives
- Understand how to work with RESTful APIs
- Parse JSON responses into Python dictionaries
- Convert JSON data to pandas DataFrames
- Use API query parameters to filter data
- Combine data from multiple API calls

## Resources
- **API Documentation:** http://178.156.206.171:8000/docs
- **What is an API?** https://www.mulesoft.com/resources/api/what-is-an-api
- **JSON and APIs with Python:** https://towardsdatascience.com/json-and-apis-with-python-fba329ef6ef0

## Let's start with importing libraries to extract data from the Brewery API

In [4]:
# Standard Python library for handling HTTP requests
import requests

# Import pandas for data manipulation
import pandas as pd

# Import json for pretty printing JSON data
import json

ModuleNotFoundError: No module named 'pandas'

## Understanding API Calls

Unlike the Yelp API, the Open Brewery DB API is completely **open and requires no authentication**. This means:
- ✅ No API key needed
- ✅ No registration required
- ✅ No rate limits for reasonable use

To GET data from the API, we need:
1. **A URL** - The base API endpoint
2. **Parameters** - Query parameters to filter/search data
3. **HTTP GET request** - Using the `requests` library

### Available Endpoints
Our brewery API has several endpoints:
- `/breweries` - List breweries with filters
- `/breweries/search` - Full-text search
- `/breweries/random` - Get random brewery(ies)
- `/breweries/autocomplete` - Name autocomplete
- `/breweries/{id}` - Get specific brewery by ID
- `/breweries/meta` - Get brewery count metadata

You can view interactive documentation at: http://178.156.206.171:8000/docs

## Example Query: Search for Breweries

Let's search for breweries with the term "dog" in their name. This will demonstrate the basic pattern for making API calls.

In [None]:
# The base API URL
base_url = 'http://178.156.206.171:8000'

# The search endpoint
search_url = f'{base_url}/breweries/search'

# Query parameters - search for breweries with 'dog' in the name
params = {
    'query': 'dog',
    'per_page': 10  # Limit to 10 results
}

print(f"Making request to: {search_url}")
print(f"With parameters: {params}")

In [None]:
# Make the GET request
# We set timeout=5 to stop waiting after 5 seconds
response = requests.get(search_url, params=params, timeout=5)

# Check if the request was successful
print(f"Status Code: {response.status_code}")
print(f"Response URL: {response.url}")

# Extract JSON data from the response
data = response.json()

# Print the JSON data (nicely formatted)
print("\nJSON Response:")
print(json.dumps(data[:2], indent=2))  # Show first 2 results only for readability

In [None]:
# Convert JSON data to pandas DataFrame
# The API returns a list of brewery dictionaries directly
df = pd.DataFrame(data)

# Display the top 5 rows
print(f"Found {len(df)} breweries")
df.head()

## Exploring the Data Structure

Let's examine what data is available for each brewery.

In [None]:
# View all available columns
print("Available columns:")
print(df.columns.tolist())

# View data types
print("\nData types:")
print(df.dtypes)

# View basic statistics
print("\nDataset shape:")
print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")

---

# Lab Assignment

Complete the following tasks for class participation credit:

## Task 1: Search Endpoint Queries (3 queries)

Use the `/breweries/search` endpoint with **three different search terms**. For each query:
- Create separate params (params1, params2, params3)
- Store responses in different variables (response1, response2, response3)
- Create separate DataFrames (df1, df2, df3)
- Display the top 5 rows of each DataFrame

**Example search terms:**
- "brewing"
- "mountain"
- "craft"
- "beer"
- Any term of your choice!

### Query 1
Write your code below:

In [None]:
# Your code for Query 1 here
params1 = {
    'query': 'YOUR_SEARCH_TERM',
    'per_page': 10
}

# Make request, convert to DataFrame, display results


### Query 2
Write your code below:

In [None]:
# Your code for Query 2 here


### Query 3
Write your code below:

In [None]:
# Your code for Query 3 here


---

## Task 2: Filter by Location

Use the `/breweries` endpoint with **filter parameters**. This endpoint allows you to filter by:
- `by_city` - Filter by city name (e.g., "San Diego")
- `by_state` - Filter by state name (e.g., "California") 
- `by_postal` - Filter by postal code (e.g., "92101")
- `by_type` - Filter by brewery type (micro, nano, regional, brewpub, large, planning, bar, contract, proprietor)

Create **one query** that uses the filter endpoint. Convert to DataFrame and display top 5 rows.

**Example:**

In [None]:
# Example: Find micro breweries in Champaign, Illinois
filter_url = f'{base_url}/breweries'

params_filter = {
    'by_city': 'Champaign',
    'by_state': 'Illinois',
    'by_type': 'micro',
    'per_page': 10
}

# Your code here to make the request and create DataFrame


---

## Task 3: Random Breweries

Use the `/breweries/random` endpoint to get random breweries. This endpoint accepts:
- `size` - Number of random breweries to return (default: 1, max: 50)

Get **5 random breweries**, convert to DataFrame, and display all rows.

**Hint:** The URL should be `http://178.156.206.171:8000/breweries/random`

In [None]:
# Your code for Task 3 here


---

## Task 4: Autocomplete Search

Use the `/breweries/autocomplete` endpoint to search for brewery names. This is useful for implementing search-as-you-type functionality.

Parameters:
- `query` - Search term (e.g., "stone")

Search for breweries starting with **"stone"** and display results.

**Note:** This endpoint returns simplified data (just `id` and `name`).

In [None]:
# Your code for Task 4 here


---

## Task 5: Get Brewery by ID

Use the `/breweries/{id}` endpoint to get a specific brewery by its UUID.

**Instructions:**
1. First, use any search/filter query to get a list of breweries
2. Extract the `id` of the first brewery in your results
3. Use that ID to fetch the full brewery details
4. Print the JSON response (formatted)

**Example ID:** You'll get this from your search results

In [None]:
# Step 1: Get a list of breweries to find an ID
# (Use any search/filter from previous tasks)


In [None]:
# Step 2: Extract the ID from the first result
# brewery_id = df['id'].iloc[0]  # Example

# Step 3: Fetch brewery by ID
# brewery_url = f'{base_url}/breweries/{brewery_id}'
# Your code here


---

## Task 6: Combining Multiple Queries

Create a query that combines data from **two different API calls**.

**Example:** 
- Get all breweries in California (`by_state=California`)
- Get all breweries in Texas (`by_state=Texas`)
- Combine both DataFrames using `pd.concat()`
- Show summary statistics (count by state, count by type, etc.)

**Your task:** Choose your own combination and analyze the results.

In [None]:
# Your code for Task 6 here
# Make two API calls
# Combine the results
# Perform analysis


---

## Submission Instructions

1. Complete all 6 tasks above
2. Ensure all code cells run without errors
3. Make sure DataFrames display properly
4. Download your completed notebook: File → Download → Download .ipynb
5. Submit to the **Class Participation Assignment** on Canvas

---

## Advanced Challenge (Optional - Extra Credit)

For students who want an extra challenge:

**Create a comprehensive brewery analysis:**
1. Find the **top 5 states** with the most breweries
2. For each of those states, get the **distribution of brewery types**
3. Create a summary table showing:
   - State name
   - Total brewery count
   - Count by type (micro, nano, brewpub, etc.)
4. Visualize the results using a bar chart (use `matplotlib` or `seaborn`)

**Hint:** You'll need to:
- Use the `/breweries/meta` endpoint to get counts
- Make multiple filtered API calls
- Use pandas groupby and aggregation functions
- Create visualizations

In [None]:
# Your advanced challenge code here (optional)


---

## Additional Resources

- **API Interactive Docs:** http://178.156.206.171:8000/docs
- **API Health Check:** http://178.156.206.171:8000/health
- **Pandas Documentation:** https://pandas.pydata.org/docs/
- **Requests Library:** https://requests.readthedocs.io/

## Tips for Success

1. **Always check the status code** - 200 means success
2. **Print the response URL** - Helps debug parameter issues
3. **Use `.json()` method** - Converts response to Python dict
4. **Use `json.dumps()`** - Pretty print JSON for readability
5. **Check DataFrame shape** - Ensure you got the expected data
6. **Use descriptive variable names** - Makes code easier to follow

## Common Errors and Solutions

| Error | Cause | Solution |
|-------|-------|----------|
| `ConnectionError` | API server unreachable | Check internet connection, verify URL |
| `Timeout` | Request took too long | Increase timeout value |
| `JSONDecodeError` | Response is not valid JSON | Check status code, print response.text |
| `KeyError` | Trying to access missing key | Check available keys with `.keys()` |
| Empty DataFrame | No results found | Verify your query parameters |

Good luck! 🍺