## Rest APIs


REST (Representational State Transfer) API is a web-based service that uses HTTP methods to get, send, delete, or update data. Many online platforms provide REST APIs to interact with their data, and in most cases, this data is returned in JSON format.

To work with REST APIs, you will need a package for sending HTTP requests. Python's requests package is a great choice for this task. You can install it using pip:

`pip install requests`

Let's assume we have a REST API that returns a JSON response. In our example, we will use a public API that provides information about users: https://jsonplaceholder.typicode.com/users.

Here is how you can read this data into a pandas DataFrame:


In [None]:
# Import required libraries
import pandas as pd
import requests

# Define the API endpoint
url = "https://jsonplaceholder.typicode.com/users"

# Send a GET request to the REST API
response = requests.get(url)

# Check that the GET request was successful
if response.status_code == 200:
    # Parse the JSON response to a Python dictionary
    data = response.json()

    # Convert the dictionary to a pandas DataFrame
    df = pd.DataFrame(data)

else:
    print(f"Request failed with status code {response.status_code}")

df.head()

This script will output a DataFrame where each row corresponds to a user, and columns correspond to user attributes (id, name, username, etc.). If you want to read only a specific attribute from the API response, you can do it by selecting this attribute when creating the DataFrame. Let's say you are interested in the names and email addresses of the users only:


In [None]:
df = pd.DataFrame(data, columns=["name", "email"])

df.head()

Please be aware that not all APIs are public and some require authentication. The procedure to authenticate will depend on the specific API, so always refer to the API's documentation for instructions. Also, always be aware of the API's rate limits. If you send too many requests in a short period of time, the server might block your IP address. To avoid this, you can add pauses between your requests using the `time.sleep()` function. Finally, keep in mind that API responses can be large, and downloading and processing them can take a lot of time. Consider filtering the data on the server side (if the API supports this) or processing the data in chunks.


## Web Scraping


Web scraping is the process of extracting data from websites. Python, with its rich ecosystem of libraries, is a popular choice for web scraping tasks. In this tutorial, we'll cover the basics of web scraping using Python and two popular libraries: Requests and Beautiful Soup.


### Installing the Libraries


Before starting, you need to install Requests and Beautiful Soup libraries. To do this, run the following command:

```bash
pip install requests beautifulsoup4
```


### Fetching a Web Page


First, we need to fetch the web page's content. We'll use the Requests library for this purpose. In this tutorial, we'll use the [Wikipedia page for "List of state and union territory capitals in India"](https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India) as an example. The following code fetches the content of the web page and stores it in the `page_content` variable:


In [4]:
! pip install requests

Collecting requests
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting certifi>=2017.4.17 (from requests)
  Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached certifi-2024.8.30-py3-none-any.whl (167 kB)
Installing collected packages: certifi, requests
Successfully installed certifi-2024.8.30 requests-2.32.3



[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
import requests

url = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
response = requests.get(url)

if response.status_code == 200:
    print("Page successfully fetched!", end="\n\n")
    page_content = response.text
    print(f"Object type: {type(page_content)}", end="\n\n")
    print("Page content:", end="\n\n")
    print(page_content)
else:
    print(f"Error {response.status_code}: unable to fetch the page.")

ModuleNotFoundError: No module named 'requests'

### Parsing the Web Page


Now that we have the HTML content, we need to parse it to navigate and extract the data. Beautiful Soup will help us with this task:


In [None]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(page_content, "html.parser")

print(f"Object type: {type(soup)}", end="\n\n")
print("Page content:", end="\n\n")
print(soup.prettify())

### Extracting the Data


After parsing the HTML, we can now locate and extract the desired information using Beautiful Soup. Let's say we want to extract all the headings (h1, h2, and h3 tags) from the page. We can do this using the `find_all` method:


In [None]:
headings = soup.find_all(["h1", "h2", "h3"])

for heading in headings:
    print(heading.get_text())

We can extract a specific heading by using the `find` method and passing the tag id as a keyword argument:


In [None]:
specific_heading = soup.find(id="List")

print(specific_heading.get_text())

#### Extracting Tables to Pandas DataFrames


The page contains tables with of states and union territories. We can extract the tables using the `find_all` method of Beautiful Soup and convert them to a list of Pandas DataFrames using the `read_html` method of Pandas.

Note you may need to install the `lxml` and `html5lib` libraries to use the `read_html` method. You can install them using the following command:

```bash
pip install lxml html5lib
```


In [None]:
import pandas as pd

tables = soup.find_all("table")

df_list = pd.read_html(str(tables))

for df in df_list:
    display(df)

### Further Reading


Check out the following resources for more information on importing data into Python:

- [pandas: How to Read and Write Files](https://realpython.com/pandas-read-write-fi)
- [pandas Documentation: IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)
- [REST APIs with Python](https://realpython.com/api-integration-in-python/)
- [Web Scraping with Python: A Beginner's Guide](https://realpython.com/python-web-scraping-practical-introduction/)
- [Beautiful Soup: Build a Web Scraper With Python](https://realpython.com/beautiful-soup-web-scraper-python/)
- [Python Web Scraping Tutorial](https://www.freecodecamp.org/news/how-to-scrape-websites-with-python-2/)
- [Requests Documentation](https://requests.readthedocs.io/en/master/)
- [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
