# Other Data Types


## HTML
HTML usually used for Webscrapping

If a webpage (HTML) has a table inside, we can easily extract it with Pandas and requests.


In [1]:
import pandas as pd
import requests
import io 

work with the following URL https://www.worldcoinindex.com/

In [2]:
url = 'https://www.worldcoinindex.com/'
crypto_url = requests.get(url)
crypto_url

<Response [200]>

Reponse Codes
- 200 
    - Successful

take the main body of the URL's HTML, we need to take the attribute text.

In [16]:
body = crypto_url.text
body = str(body)

Body now consists of full HTML source code of our webpage. Now if the HTML source has a table which is marked by the HTML tag <table></table> (this tag is used for defining a table in HTML) Pandas uses read_html() to extract the table from the HTML document.

In [17]:
body = io.StringIO(body)
crypto_data = pd.read_html(body)
print(type(crypto_data))
print(len(crypto_data))

<class 'list'>
1


From the above output, it is clear that there is a list with one element which is our table. Therefore

In [18]:
crypto_data = crypto_data[0]
crypto_data.head()

Unnamed: 0,#,Unnamed: 1,Name,Ticker,Last price,%,24 high,24 low,Price Charts 7d,24 volume,# Coins,Market cap
0,1,,Ethereum,ETH,"$ 3,595.03",+3.32%,"$ 3,627.43","$ 3,464.05",,$ 19.36B,122.37M,$ 439.93B
1,2,,Bitcoin,BTC,"$ 120,213",+0.77%,"$ 120,836","$ 119,141",,$ 14.47B,19.89M,$ 2.39T
2,3,,Ripple,XRP,$ 3.60,+3.27%,$ 3.66,$ 3.47,,$ 13.83B,59.13B,$ 212.81B
3,4,,Solana,SOL,$ 181.64,+3.21%,$ 181.99,$ 175.47,,$ 3.40B,424.35M,$ 77.07B
4,5,,Usd coin,USDC,$ 1.00,+0.01%,$ 1.00,$ 0.999498,,$ 3.25B,7.00B,$ 7.00B


What if there is no table in HTML?

If we want to extract information from HTML, which doesn't have a table, we need to use a different approach: **Scraping**. 

Fortunately, Python has a great package for this called **Beautiful Soup**.

For a simple scraping tutorial, follow the instructions in this resource from DataQuest.

https://web.compass.lighthouselabs.ca/p/ds-5/5a7f7ecb-e6d8-45ba-9dac-8fa75839b6d9#:~:text=For%20a%20simple%20scraping%20tutorial%2C%20follow%20the%20instructions%20in%20this%20resource%20from%20DataQuest.

## APIs

API is short for Application Programming Interface. APIs allow 2 computers to communicate with each other and exchange information. 


register for Foursquare Places API and obtain Client ID and Client Secret.

Foursquare Places API enables location discovery, venue search, and more directly from our program or application.

In [2]:
import requests
import os

API Key for test project

V53VJJ3ML1J5TAHYSZN2HKNZRITV1TPCT4D20KEPHJIIJZCV


2nd API key code attempt

EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK



curl --request GET \
  --url "https://places-api.foursquare.com/places/search?ll=45.6387,-122.6615&radius=100" \
  --header "Accept: application/json" \
  --header "Authorization: Bearer EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK" \
  --header "X-Places-API-Version: 2025-06-17"




  curl --request GET \
  --url "https://places-api.foursquare.com/places/search?ll=45.6387,-122.6615&radius=100" \
  --header "Accept: application/json" \
  --header "Authorization: $FOURSQUARE_API_KEY " \
  --header "X-Places-API-Version: 2025-06-17"





**export the API_KEY as a coded variable for temporary sessions**

- export FOURSQUARE_API_KEY="Bearer EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK"

**Encode tne API_KEY as a permanent variavle in the data_environment**

- conda env config vars set FOURSQUARE_API_KEY="Bearer EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK"

**Reactivate your data_environment for the encoded changes to take place**

- conda deactivate
- conda activate data_env312

**Confirm the success of permanently storing the API_KEY in your chosen data_environment**

- echo "$FOURSQUARE_API_KEY"

- $FOURSQUARE_API_KEY is succesful 

**To remove or modify the encoded API_KEY**

  - **unset**
    - conda env config vars unset FOURSQUARE_API_KEY

  - **reset the key by reassigning with 'set'** 
    - conda env config vars set FOURSQUARE_API_KEY="Bearer EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK"



In [10]:
# Confirm that the saved API_KEY environment variable is visible inside Python 
import os

print("FOURSQUARE_API_KEY" in os.environ)  # Should return True
print(os.environ.get("FOURSQUARE_API_KEY"))  # Should print your API key

True
Bearer EIXNH3TQFGGIDT5QYWZWDJR5GGHWNHEYLDDRIDMQEUKNROAK


### Loading API Keys into Python

In [17]:
api_key = os.environ["FOURSQUARE_API_KEY"]
# os.environ loads the dictionary with environment variables where os.environ.keys() are all variable names

### Envrionemnt Files for API

Although storing API keys via conda env config vars works, a more common and portable pattern for development in Python is to:

Store your keys in a .env file

Use python-dotenv to load them



In [12]:
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("FOURSQUARE_API_KEY")

### Using API_KEYS + Variables

In [18]:
location = "Toronto,Canada"

url = "https://api.foursquare.com/v3/places/search?near=" + location

Prepare a dictionary that can be sent as headers of our request. 

The below dictioaniry header is asking the API to send us data in JSON format if it is able.
- Accept: application/json

In [19]:
# Create dictionary for headers
headers = {"Accept": "application/json"}
# Add key with our API KEY
headers['Authorization'] = api_key

In [20]:
result = requests.get(url, headers=headers)

In [21]:
result

<Response [410]>

To make JSON data more manageable in Python after retrieving it from an API, you can use the following methods:

- Pretty Print JSON: Use the json module to pretty print the JSON data for better readability. python import json print(json.dumps(result.json(), indent=4))
- Convert to Pandas DataFrame: Convert the JSON data into a Pandas DataFrame for easier manipulation and analysis. python import pandas as pd data = result.json() df = pd.- json_normalize(data) print(df.head())
- Extract Specific Fields: Extract only the relevant fields from the JSON data to focus on the information you need. python venues = result.json().get('results', []) names = [venue['name'] for venue in venues] print(names) By using these methods, you can transform the raw JSON data into a more structured and readable format, making it easier to work with.

# Request Error Codes

- 401 : Unauthorized Status/Acess
    - To troubleshoot and resolve a 401 Unauthorized status code when accessing an API using the requests library, follow these steps:

        - Check API Key/Token: Ensure that the API key or token is correctly set and not expired. Verify that you are using the correct environment variable to load the key. python api_key = os.environ["<variable_name>"]
        - Authorization Header: Confirm that the Authorization header is correctly included in your request. python headers = {"Accept": "application/json", "Authorization": api_key}
        - API Endpoint: Verify that the URL you are using is correct and that it matches the API's documentation. python url = "https://api.foursquare.com/v3/places/search?near=Toronto,Canada"
        - Permissions: Ensure that the API key has the necessary permissions to access the endpoint.
        - Re-generate API Key: If all else fails, try regenerating the API key or token from the service provider's dashboard and update your environment variable.

# The GET Request

## Accessing HTTP with APIs

HTTP methods, such as GET and POST, determine which action you’re trying to perform when making an HTTP request.

The GET method indicates that you’re trying to get or retrieve data from a specified resource. 
To make a GET request using Requests, you can invoke requests.get().

In [23]:
import requests
requests.get("https://api.github.com")


<Response [200]>

## The Response
A Response is a powerful object for inspecting the results of the request. 
Make that same request again, but this time store the return value in a variable so that you can get a closer look at its attributes and behaviors:

In the example below, you’ll capture the return value of get(), which is an instance of Response, and store it in a variable called response

In [26]:
import requests
response = requests.get("https://api.github.com")

In [27]:
response

<Response [200]>

## Content
The response of a GET request often has some valuable information, known as a **payload**, in the message body. 

Using the attributes and methods of **Response Library**, you can view the payload in a variety of different formats.

To see the response’s content in bytes, you use .content:

In [32]:
response.content



b'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.

In [31]:
type(response.content)

bytes

In [34]:
response.text

'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.g

In [44]:
type(response.text)

str

Because the decoding of bytes to a str requires an encoding scheme, Requests will try to guess the encoding based on the response’s headers if you don’t specify one. You can provide an explicit encoding by setting .encoding before accessing .text:

In [41]:
response.encoding = "utf-8"  # Optional: Requests infers this.
response.text

'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.g

In [42]:
response.json()

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

In [43]:
type(response.json())


dict

The type of the return value of .json() is a dictionary, so you can access values in the object by key:

In [45]:
response_dict = response.json()
response_dict["emojis_url"]

'https://api.github.com/emojis'

## Headers
The response headers can give you useful information, such as the content type of the response payload and a time limit on how long to cache the response. To view these headers, access .headers:

In [46]:
response = requests.get("https://api.github.com")
response.headers

{'Date': 'Sun, 20 Jul 2025 20:12:52 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept,Accept-Encoding, Accept, X-Requested-With', 'ETag': 'W/"4f825cc84e1c733059d46e76e6df9db557ae5254f9625dfe8e1b09499c449438"', 'X-GitHub-Media-Type': 'github.v3; format=json', 'x-github-api-version-selected': '2022-11-28', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '0', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security

.headers returns a dictionary-like object, allowing you to access header values by key. For example, to see the content type of the response payload, you can access "Content-Type":



In [47]:
response.headers["Content-Type"]

'application/json; charset=utf-8'

## Query String Parameters
One common way to customize a GET request is to pass values through query string parameters in the URL. To do this using get(), you pass data to params. For example, you can use GitHub’s repository search API to look for popular Python repositories:

In [48]:
import requests

# Search GitHub's repositories for popular Python projects
response = requests.get(
    "https://api.github.com/search/repositories",
    params={"q": "language:python", "sort": "stars", "order": "desc"},
)

# Inspect some attributes of the first three repositories
json_response = response.json()
popular_repositories = json_response["items"]
for repo in popular_repositories[:3]:
    print(f"Name: {repo['name']}")
    print(f"Description: {repo['description']}")
    print(f"Stars: {repo['stargazers_count']}")
    print()

Name: free-programming-books
Description: :books: Freely available programming books
Stars: 363342

Name: public-apis
Description: A collective list of free APIs
Stars: 357166

Name: system-design-primer
Description: Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.
Stars: 311863



By passing a dictionary to the params parameter of get(), you’re able to modify the results that come back from the search API.

## Request Headers
To customize headers, you pass a dictionary of HTTP headers to get() using the headers parameter. For example, you can change your previous search request to highlight matching search terms in the results by specifying the text-match media type in the Accept header:

In [49]:
import requests

response = requests.get(
    "https://api.github.com/search/repositories",
    params={"q": '"real python"'},
    headers={"Accept": "application/vnd.github.text-match+json"},
)

# View the new `text-matches` list which provides information
# about your search term within the results
json_response = response.json()
first_repository = json_response["items"][0]
print(first_repository["text_matches"][0]["matches"])

[{'text': 'Real Python', 'indices': [23, 34]}]


## Other HTTP Methods
Aside from GET, other popular HTTP methods include POST, PUT, DELETE, HEAD, PATCH, and OPTIONS. For each of these HTTP methods, Requests provides a function, with a similar signature to get().

To try out these HTTP methods, you’ll make requests to **httpbin**.org. 
The httpbin service is a great resource created by the original author of Requests, Kenneth Reitz. The service accepts test requests and responds with data about the requests.

In [51]:
import requests

requests.get("https://httpbin.org/get")

requests.post("https://httpbin.org/post", data={"key": "value"})

requests.put("https://httpbin.org/put", data={"key": "value"})

requests.delete("https://httpbin.org/delete")

requests.head("https://httpbin.org/get")

requests.patch("https://httpbin.org/patch", data={"key": "value"})

requests.options("https://httpbin.org/get")


<Response [200]>

## The Message Body
According to the HTTP specification, POST, PUT, and the less common PATCH requests pass their data through the message body rather than through parameters in the query string. Using Requests, you’ll pass the payload to the corresponding function’s data parameter.

**data takes a dictionary, a list of tuples, bytes, or a file-like object**. You’ll want to adapt the data that send in the body of your request to the specific needs of the service that you’re interacting with.

In [52]:
response = requests.post("https://httpbin.org/post", json={"key": "value"})
json_response = response.json()
json_response["data"]
json_response["headers"]["Content-Type"]

'application/json'

## Authentication
Authentication helps a service understand who you are. 
Typically, you provide your credentials to a server by passing data through the Authorization header or a custom header defined by the service. All the functions of Requests that you’ve seen to this point provide a parameter called auth, which allows you to pass your credentials:

This is an example use case of API Keys, to provide authenticated user idenfication to receive the server's response.

A real-world example of an API that requires authentication is GitHub’s authenticated user API. This endpoint provides information about the authenticated user’s profile.

If you try to make a request without credentials, then you’ll see that the status code is 401 Unauthorized:

In [53]:
requests.get("https://api.github.com/user")

<Response [401]>

If you don’t provide authentication credentials when accessing a service that requires them, then you’ll get an HTTP error code as a response.

To make a request to GitHub’s authenticated user API, you first need to generate a personal access token with the read:user scope. Then you can pass this token as the second element in a tuple to get():

In [55]:
import requests

# Example, no offical user or user_token input

token = "<YOUR_GITHUB_PA_TOKEN>"
response = requests.get(
    "https://api.github.com/user",
    auth=("", token)
)
response.status_code

401

## SSL Certificate Verification

Anytime the data that you’re trying to send or receive is sensitive, security is important. The way that you communicate with secure sites over HTTP is by establishing an encrypted connection using SSL, which means that verifying the target server’s SSL certificate is critical.

The good news is that Requests does this for you by default. However, there are some cases where you might want to change this behavior.

In [56]:
import requests

requests.get("https://api.github.com", verify=False)



<Response [200]>

# AI in API Requests

Examples and assistance using AI for requests

using fake exmaple request website reqes.in

**API_KEY = reqres-free-v1**

**Add This Header to API Requests : x-api-key: reqres-free-v1**

In [58]:
import requests

# Set the URL and headers with your API key
url = "https://reqres.in/api/users?page=1"
headers = {
    "x-api-key": "reqres-free-v1"
}

# Make the GET request
response = requests.get(url, headers=headers)

# Check that the request was successful
if response.ok:
    print("Request was successful!")
else:
    print(f"Request failed with status code: {response.status_code}")

# Print the actual data from the response
print(response.json())


Request was successful!
{'page': 1, 'per_page': 6, 'total': 12, 'total_pages': 2, 'data': [{'id': 1, 'email': 'george.bluth@reqres.in', 'first_name': 'George', 'last_name': 'Bluth', 'avatar': 'https://reqres.in/img/faces/1-image.jpg'}, {'id': 2, 'email': 'janet.weaver@reqres.in', 'first_name': 'Janet', 'last_name': 'Weaver', 'avatar': 'https://reqres.in/img/faces/2-image.jpg'}, {'id': 3, 'email': 'emma.wong@reqres.in', 'first_name': 'Emma', 'last_name': 'Wong', 'avatar': 'https://reqres.in/img/faces/3-image.jpg'}, {'id': 4, 'email': 'eve.holt@reqres.in', 'first_name': 'Eve', 'last_name': 'Holt', 'avatar': 'https://reqres.in/img/faces/4-image.jpg'}, {'id': 5, 'email': 'charles.morris@reqres.in', 'first_name': 'Charles', 'last_name': 'Morris', 'avatar': 'https://reqres.in/img/faces/5-image.jpg'}, {'id': 6, 'email': 'tracey.ramos@reqres.in', 'first_name': 'Tracey', 'last_name': 'Ramos', 'avatar': 'https://reqres.in/img/faces/6-image.jpg'}], 'support': {'url': 'https://contentcaddy.io?utm_

Q: What are the most common methods of the requests library?

- A: The most common methods are:

    - requests.get() — for retrieving data

    - requests.post() — for sending data (e.g. creating resources)

    - requests.put() — for updating a resource

    - requests.patch() — for partial updates

    - requests.delete() — for deleting a resource

Q: What are the attributes of a Response object?

- A: Common attributes include:

    - .status_code — HTTP status code (e.g. 200, 404)

    - .ok — Boolean indicating success (True if status code is < 400)

    - .text  — Raw response content as a string

    - .json() — Parsed JSON response (if applicable)

    - .headers — Response headers (as a dictionary)

    - .url — Final URL after any redirects

    - .reason — Reason phrase returned by server (e.g. "OK", "Not Found")

Q: What is the most common form of a response?

- A: Most APIs return responses in JSON format. You can parse them using .json() on the Response object.

Q: What other information can I obtain from the user’s endpoint?

- A: A typical user API endpoint may provide:

    - User ID

    - Name (first and last)

    - Email

    - Avatar (profile image URL)

    - Pagination data (e.g. total pages, current page)

    - Metadata (e.g. request time, support contact)

Q: How are arguments typically used in a GET request string?

- A: Arguments are included in the query string after the ?, in key=value pairs separated by &:

    - Example:
        - https://api.example.com/users?page=2&limit=10
        
    - You can also pass them using the params argument in requests.get():
        - requests.get(url, params={"page": 2, "limit": 10})

Q: What's the difference between using a string and a parameter dictionary for passing arguments?

- A: Using a string means manually constructing the full URL, which can be error-prone:

    - requests.get("https://api.example.com/users?page=2&limit=10")

Using a parameter dictionary is cleaner and automatically handles URL encoding:

    - requests.get("https://api.example.com/users", params={"page": 2, "limit": 10})

The dictionary method is preferred for safety and readability.