# API and Web Scraping Tutorial

## Introduction to APIs

APIs (Application Programming Interfaces) allow different software applications to communicate with each other by defining specific methods and rules for interaction. When it comes to web scraping, APIs are a powerful tool that allows you to programmatically interact with web servers and extract data efficiently.

In simple terms, APIs let us retrieve data from web services without needing to manually scrape HTML pages.

### Why use APIs for web scraping?
- **Efficiency:** APIs provide data in a structured format (e.g., JSON, XML), making it easier to extract relevant information.
- **Legality:** Some websites offer official APIs, ensuring that scraping is done within the legal boundaries.
- **Rate Limiting and Authentication:** APIs can be used in a more controlled manner, reducing the chances of being blocked for excessive scraping.

## Using APIs for Web Scraping

In this section, we’ll explore how to interact with APIs to retrieve and process data.

### 1. Making GET Requests to an API
APIs generally provide data through HTTP requests. The most common method used for retrieving data is the **GET** request.

Let's start by making a simple GET request to an API that returns data in JSON format.

```python
import requests

# URL of the API
api_url = 'https://api.coindesk.com/v1/bpi/currentprice.json'

# Send a GET request
response = requests.get(api_url)

# Check if the request was successful
if response.status_code == 200:
    data = response.json()  # Convert JSON response into a dictionary
    print(data)
else:
    print('Error:', response.status_code)
```

In this code:
- `requests.get()` is used to send a GET request to the API.
- `response.json()` converts the JSON data into a Python dictionary.

### 2. Using API Data for Web Scraping

Web scraping with APIs involves fetching structured data, which is much easier to process than raw HTML. We will use an API to retrieve information from a public website and store it in a useful format.

Let’s consider an API that provides news headlines. We will make a GET request to fetch the latest headlines and display them.

```python
import requests

# API endpoint for fetching news
api_url = 'https://newsapi.org/v2/top-headlines?country=us&apiKey=your_api_key'

# Send GET request
response = requests.get(api_url)

# Check if request is successful
if response.status_code == 200:
    news_data = response.json()
    articles = news_data['articles']
    for article in articles:
        print(f'Title: {article['title']}')
        print(f'Description: {article['description']}')
        print(f'URL: {article['url']}')
        print('-' * 80)
else:
    print('Error:', response.status_code)
```

- **API Key:** Most APIs require an API key for authentication. This key is usually provided when you register with the API service.
- **JSON Parsing:** The response from the API is parsed as JSON and we extract relevant information such as titles, descriptions, and URLs.

### 3. Handling Authentication

Many APIs require authentication to ensure that only authorized users can access data. APIs often use API keys or OAuth tokens for authentication.

Here’s an example of how to include an API key in the request header for authentication.

```python
import requests

# Define API endpoint and your API key
api_url = 'https://api.example.com/data'
api_key = 'your_api_key'

# Define headers with the API key
headers = {
    'Authorization': f'Bearer {api_key}'
}

# Send GET request with authentication
response = requests.get(api_url, headers=headers)

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print('Error:', response.status_code)
```

In this example:
- The API key is sent in the request header using the `Authorization` field.
- The API responds with the requested data if the key is valid.

### 4. Handling Pagination in API Responses

Many APIs provide data in multiple pages. In such cases, you need to handle pagination to retrieve all available data.

Here’s an example of how to loop through multiple pages in an API that paginates its results.

```python
import requests

api_url = 'https://api.example.com/products'
page = 1

while True:
    response = requests.get(api_url, params={'page': page})
    
    if response.status_code == 200:
        data = response.json()
        if len(data['results']) == 0:
            break  # No more data to fetch
        
        # Process the current page's data
        for product in data['results']:
            print(product['name'])
        
        page += 1  # Move to the next page
    else:
        print('Error:', response.status_code)
        break
```

In this example:
- We keep sending requests with an updated page number (`params={'page': page}`).
- When no results are returned, we stop the loop.

## Using APIs for Dataset Creation

APIs can also be used to create datasets, which can be used for data science tasks like machine learning, data analysis, and visualization.

Let’s take the example of creating a dataset from an API that returns election data.

```python
import requests
import csv

# API endpoint for election results
api_url = 'https://api.example.com/election_results'

# Send GET request
response = requests.get(api_url)

if response.status_code == 200:
    election_data = response.json()
    
    # Create a CSV file to store the data
    with open('election_results.csv', mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['Candidate', 'Votes', 'Party'])
        
        for result in election_data['results']:
            writer.writerow([result['candidate'], result['votes'], result['party']])
    
    print('Dataset created successfully!')
else:
    print('Error:', response.status_code)
```

Here we use the API to get election results and then store that data in a CSV file, which can be later used for analysis.