# APIs and Web Scraping: Exercise Results


## 1. Public API GET Request
- Practice retrieving data from a public REST API using Python.
- Make an HTTP GET request to a sample API endpoint (such as a placeholder API).
- Parse and print the JSON response.
- This exercise builds foundational skills for working with external data sources and understanding HTTP requests in data engineering workflows.


In [None]:
import requests

url = 'https://jsonplaceholder.typicode.com/posts'
response = requests.get(url)

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code {response.status_code}")


## 2. Authentication
- Make a request to an API endpoint that requires authentication using an API key (use a publicly available test or demo key).
- Add the API key to your request via headers or query parameters as specified by the API documentation.
- Print the response and briefly explain where the API key was used in your code.


In [None]:
import requests

# Using NASA's public demo API key for authentication
api_key = 'DEMO_KEY'
url = 'https://api.nasa.gov/planetary/apod'
params = {'api_key': api_key}

resp = requests.get(url, params=params)
print(resp.json())  # The API key is passed as a query parameter in the request


## 3. Parse JSON Response
- Retrieve data from an API in JSON format, extract a specific field (e.g., the title of the first post), and print its value. This exercise helps you practice working with JSON data and accessing nested fields using Python dictionaries.


In [None]:
import requests

resp = requests.get('https://jsonplaceholder.typicode.com/posts')
data = resp.json()
print(data[0]['title'])

## 4. Web Scraping
- Use BeautifulSoup to extract the title from a provided HTML string.
- Practice parsing and navigating HTML documents programmatically.
- This exercise will help you become familiar with web scraping fundamentals.


In [None]:
# 4. Web Scraping
# Use BeautifulSoup to extract the title from a provided HTML string.

from bs4 import BeautifulSoup

html = '<html><title>Test</title></html>'

# Parse the HTML string
soup = BeautifulSoup(html, 'html.parser')

# Extract and print the title
title = soup.title.text
print("Page title:", title)

---

### Challenge
- Practice ethical web scraping: Choose a popular news website, check its `robots.txt` file to confirm that scraping headlines is allowed, and then write Python code to extract all news headlines from the homepage using `requests` and BeautifulSoup. Print the list of headlines you collect.


In [None]:
import requests
from bs4 import BeautifulSoup

# Step 1: Check robots.txt (for demonstration, let's use Hacker News)
robots_url = 'https://news.ycombinator.com/robots.txt'
robots_resp = requests.get(robots_url)
print("robots.txt contents:\n", robots_resp.text)

# Step 2: If allowed (Hacker News allows /), scrape headlines
url = 'https://news.ycombinator.com/'
resp = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (ethical scraping for educational purposes)'})
soup = BeautifulSoup(resp.text, 'html.parser')
headlines = [a.text for a in soup.find_all('span', class_='titleline')]
# For legacy compatibility, also try:
if not headlines:
    headlines = [a.text for a in soup.find_all('a', class_='storylink')]

# Clean up and print
print([h if isinstance(h, str) else h.get_text(strip=True) for h in headlines])