# 10. The `requests` Library: Retrieving Data from Across the Web

A key part of modern exploration and data analysis is gathering information from remote sources across the internet. The `requests` library is the standard, user-friendly tool in Python for sending out "probes" (HTTP requests) to web servers and handling the "transmissions" (HTTP responses) they send back.

- Sending `requests` and receiving `responses`.
- Inspecting response properties like `status codes`, `headers`, and `content`.
- Installing and importing an external library into a `virtual environment`.

## 10.1. Sending Requests and Handling Responses
- The `requests` library is not built-in, so it must be installed first.
- It is best practice to install it in an activated `virtual environment`.
- In your terminal, run: `pip install requests`
- Documentation: https://requests.readthedocs.io/en/latest/

In [None]:
import requests
# Once we have 'requests' installed, we can import it

# The URL of the remote data source or API endpoint we want to query
target_url = "https://www.google.com/"
# A 'request' is our query to the server. The server sends back a 'response' object.
response = requests.get(url=target_url)

print(response) # Prints a summary of the response -> <Response [200]>

# You can inspect various attributes (url, status_code) of the response object:
print(f"Final URL (after any redirects): {response.url}")
print(f"HTTP Status Code: {response.status_code}") 

# HTTP Status Codes are like signal codes from the remote server:
"""
1xx (Informational): Request received, continuing process.
2xx (Success): The action was successfully received, understood, and accepted (e.g., 200 OK).
3xx (Redirection): Further action must be taken to complete the request (e.g., 301 Moved Permanently).

4xx (Client Error): The request contains bad syntax or cannot be fulfilled by the client.
 - 400: Bad Request
 - 401: Unauthorized (authentication required)
 - 403: Forbidden (you don't have permission)
 - 404: Not Found (the resource doesn't exist)

5xx (Server Error): The server failed to fulfill an apparently valid request.
 - 500: Internal Server Error
 - 503: Service Unavailable
"""

# Retrieving technical metadata about the response (headers, cookies, etc.)
print(f"Response Cookies: {response.cookies}") # Shows cookies sent by the server for state management
print(f"Response Headers: {response.headers}") # Shows technical metadata about the response


"""
COOKIES:
Small text files stored on the client's computer.
They contain information about the user's session, preferences, or authentication status.
In a form of key-value pairs.

HEADERS:
Additional metadata that provides information about the response like the type of content, size, etc.
In a form of key-value pairs.
"""

# Explaining a few key response headers:
"""
Date	            date and time of the response
Expires	            time after which the response is considered stale
Cache-Control	    settings for caching the response

Content-Type	    sets the MIME type of the content (e.g., text/html, application/json)
Content-Length	    sets the size of the content in bytes
Content-Encoding    specifies the encoding used for the content (e.g., gzip, deflate)

Server	            identifies the server software used to handle the request
Location	        used in redirection responses to indicate the new location of the resource
Connection	        indicates whether the connection should be closed after the response is delivered
"""

# To get the actual content (the payload of the transmission):

# For text-based content (like an HTML page), use the `.text` attribute:
page_html = response.text
print(page_html) # This would print the entire HTML source code of the page

# For content that is specifically in JSON format, use the `.json()` method.
# It automatically decodes the JSON into a Python dictionary or list. This is extremely common for APIs.
page_json = response.json() # raises an error if the content is not JSON
print(page_json) 

## practice

**Scenario:** You are a data operative tasked with gathering intelligence from various online sources (APIs and web pages).

**1. Basic Reconnaissance:**
- Choose a simple webpage or API endpoint. For example, use `http://api.open-notify.org/iss-now.json` for structured data, or `https://example.com` for simple HTML.
- **Before coding**, it's good practice to inspect the URL in your browser, using DevTools (`F12`) to see what kind of data to expect.
- Using the `requests` library in a Python script, get the response from your chosen URL.
- Print the following information to your console:
    - The final URL and the status code of the response.
    - The response headers and any cookies.
    - The content of the page (either HTML or JSON according on Content-Type - see Headers)..

---

**2. Challenge I: Archiving Retrieved Data**
- Building on the previous exercise, save the content you retrieved (the HTML text or JSON data) to a local file named `recon_data.txt`.

---

**3. Challenge II: Creating a Reusable Data Retrieval Tool**
- Create two functions to make your data gathering process modular:
    - **a) `get_content(url: str) -> str:`**
        - This function should take a URL string as a parameter.
        - If successful, it should `return` the content as a string.
    - **b) `save_content(data: str, file_path: str) -> None:`**
        - This function takes a string of data and a file path as parameters.
        - It then writes the provided data into the specified file.
- In the main part of your script, combine these two functions: call `fetch_content_from_url()` and `save_content_to_file()` to save it. Verify that the file was created and contains the correct data.

---
#### © Jiří Svoboda (George Freedom)
- Web: https://GeorgeFreedom.com
- LinkedIn: https://www.linkedin.com/in/georgefreedom/
- Book me: https://cal.com/georgefreedom