# Data Acquisition with Python  
_Data acquisition_ refers to the process of gathering data from various sources. Two common techniques are **web scraping** and **fetching data from APIs**. In Python, you have several libraries at your disposal to simplify these tasks.

---

## 1. Web Scraping | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2003%20Pandas%20DataFrame%20using%20Web%20Scraping)

### What Is Web Scraping?
Web scraping is the automated process of extracting information from websites. It involves downloading the HTML content of a page and parsing it to extract the desired data. Common use cases include:
- Collecting product details from e-commerce sites.
- Extracting news headlines.
- Gathering research data.

### Common Python Libraries
- **`requests`**: For making HTTP requests to download web pages.
- **`BeautifulSoup`** (from **`bs4`**): For parsing HTML and XML documents.
- **`Scrapy`**: A powerful and scalable framework for large crawling projects.
- **`Selenium`**: For scraping dynamic content rendered by JavaScript (via browser automation).

### Example: Scraping a Web Page Using Requests and BeautifulSoup

```python
import requests
from bs4 import BeautifulSoup

# Define the target URL and headers (to mimic a browser)
url = "https://example.com"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

# Send an HTTP GET request to the URL
response = requests.get(url, headers=headers)
if response.status_code == 200:
    # Parse the HTML content
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Extract the page title as an example
    title = soup.find("title").get_text()
    print("Page Title:", title)
else:
    print("Failed to retrieve the page. Status code:", response.status_code)
```

**Key Points:**
- Always include a proper **User-Agent** header.
- Check for a successful response (HTTP status code 200).
- Use BeautifulSoup to navigate the HTML DOM and extract information.

---

## 2. Fetching Data from APIs | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2000%20API%20To%20DataFrame)

### What Is an API?
An **API (Application Programming Interface)** allows you to interact with an external service in a structured way. Instead of scraping HTML, you make HTTP requests (GET, POST, etc.) and receive data (often in JSON format).

### Why Use APIs?
- APIs provide structured data that is easier to parse.
- They often have clear documentation on how to query and retrieve data.
- They can be more reliable than scraping websites that might change their layout.

### Common Python Libraries
- **`requests`**: For sending HTTP requests.
- **`json`**: For parsing JSON responses (built into Python).
- Additional libraries (e.g., **`pandas`**) can help transform API data into dataframes for analysis.

### Example: Fetching Data from an API

```python
import requests

# Define the API endpoint and parameters
api_url = "https://api.example.com/data"
params = {
    "q": "search_term",
    "api_key": "YOUR_API_KEY"  # Replace with your actual API key if needed
}

# Make a GET request to the API
response = requests.get(api_url, params=params)
if response.status_code == 200:
    # Parse JSON data
    data = response.json()
    print("API Data:", data)
else:
    print("Error fetching data. Status code:", response.status_code)
```

**Key Points:**
- Most APIs use JSON; use `response.json()` to easily parse the response.
- Always refer to the API’s documentation for required parameters and authentication.
- Handle errors by checking the response status code.

---

## Best Practices & Considerations

- **Respect the Website's Terms:** Always check the website’s terms of service before scraping. Some sites prohibit automated scraping.
- **Rate Limiting:** Introduce delays between requests (using Python’s `time.sleep()`) to avoid overloading servers or getting blocked.
- **API Keys:** When working with APIs that require authentication, keep your keys secure (e.g., using environment variables or a `.env` file).
- **Dynamic Content:** For pages rendered by JavaScript, consider using tools like Selenium or Playwright.