## Web Scraping Bounding Boxes for US States

In this notebook, I am scraping bounding box data for US states from a given URL. The data includes the state's name along with its coordinates (`xmin`, `xmax`, `ymin`, and `ymax`). We are using Python libraries like `requests`, `BeautifulSoup`, and `pandas` to fetch the page content, parse the HTML table, and store the data in a structured format as a DataFrame.

In [1]:
# importing the necessary libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
def scrape_bounding_boxes():
    """
    This function scrapes a table of bounding box coordinates for all US states 
    and returns the data as a pandas DataFrame.

    The table contains the following information for each state:
    - State name
    - xmin: Minimum longitude
    - xmax: Maximum longitude
    - ymin: Minimum latitude
    - ymax: Maximum latitude

    The extracted data is stored in a pandas DataFrame with columns:
    'State', 'xmin', 'xmax', 'ymin', and 'ymax'.
    """

    # URL of the page to scrape
    url = 'https://pathindependence.wordpress.com/2018/11/23/bounding-boxes-for-all-us-states/'

    # Send a request to fetch the page content
    response = requests.get(url)

    # Check if the request was successful (status code 200)
    if response.status_code != 200:
        print(f"Failed to fetch the page. Status code: {response.status_code}")
        return None

    # Parse the page content using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find the table with the bounding box data
    table = soup.find('table', class_='js-csv-data')

    # Initialize an empty list to store the rows of the table
    data = []

    # Iterate over each row in the table body
    for row in table.find('tbody').find_all('tr'):
        # Extract the columns from the row
        cols = row.find_all('td')

        # Extract relevant data: State, xmin, ymin, xmax, ymax
        state = cols[4].text.strip()
        xmin = float(cols[5].text.strip())
        ymin = float(cols[6].text.strip())
        xmax = float(cols[7].text.strip())
        ymax = float(cols[8].text.strip())
        
        # Append the extracted data as a tuple to the data list
        data.append((state, xmin, xmax, ymin, ymax))

    # Convert the list of data into a pandas DataFrame
    df = pd.DataFrame(data, columns=['State', 'xmin', 'xmax', 'ymin', 'ymax'])

    return df

In [3]:
bounding_boxes_df = scrape_bounding_boxes()

In [4]:
bounding_boxes_df

Unnamed: 0,State,xmin,xmax,ymin,ymax
0,Alabama,-88.473227,-84.88908,30.223334,35.008028
1,Alaska,-179.148909,179.77847,51.214183,71.365162
2,American Samoa,-171.089874,-168.1433,-14.548699,-11.046934
3,Arizona,-114.81651,-109.045223,31.332177,37.00426
4,Arkansas,-94.617919,-89.644395,33.004106,36.4996
5,California,-124.409591,-114.131211,32.534156,42.009518
6,Colorado,-109.060253,-102.041524,36.992426,41.003444
7,Commonwealth of the Northern Mariana Islands,144.886331,146.064818,14.110472,20.553802
8,Connecticut,-73.727775,-71.786994,40.980144,42.050587
9,Delaware,-75.788658,-75.048939,38.451013,39.839007


In [100]:
bounding_boxes_df.to_csv("bounding_boxes_us_states.csv", index=False)