# Web Scraping for Collecting Airport Data

## Introduction:

This Jupyter Notebook focuses on collecting data about airports in Florida, USA using web scraping from Wikipedia. The objective is to gather essential information about airport locations, names, and other relevant details.

## Objectives:

1. Obtain data on airports in Florida from Wikipedia.
2. Extract key information such as airport names, locations, and other relevant details.
3. Store the collected data in a suitable format for further analysis and use in our database.

## Libraries

* **Requests**: The requests library allows users to make HTTP requests to web pages, facilitating the download of HTML content from Wikipedia for further processing.

* **Beautiful Soup (bs4)**: Beautiful Soup is a useful tool for parsing and searching HTML elements in the downloaded content. It enables users to extract specific information from Wikipedia pages, such as titles, paragraphs, links, and more.

* **Pandas**: Pandas is an essential library for structuring and manipulating extracted data. It allows users to organize data into rows and columns, facilitating operations such as cleaning, filtering, and processing.


In [1]:
import requests
import bs4
import requests
import pandas as pd
import re

In [7]:
def get_airport():
    
    url = "https://en.wikipedia.org/wiki/List_of_airports_in_Florida"

    response = requests.get(url)

    soup = bs4.BeautifulSoup(response.text, 'html.parser')

    airport_table = soup.find('table', {'class': 'wikitable'})

    data = {"Airport Name": [], "Location": [], "FAA Code": [], "ICAO Code": [], "Usage": []}

    for row in airport_table.find_all('tr')[1:]:
        columns = row.find_all(['th', 'td'])
        
        # Check if ICAO Code is there
        icao_code = columns[3].get_text(strip=True)
        if icao_code:
            data["Airport Name"].append(columns[0].get_text(strip=True))
            data["Location"].append(columns[1].get_text(strip=True))
            data["FAA Code"].append(columns[2].get_text(strip=True))
            data["ICAO Code"].append(icao_code)
            data["Usage"].append(columns[4].get_text(strip=True))

    df = pd.DataFrame(data)
    return df

    df = pd.DataFrame(data)
    
    return df

In [8]:
df = get_airport()

In [9]:
df.head()

Unnamed: 0,Airport Name,Location,FAA Code,ICAO Code,Usage
0,Daytona Beach,DAB,DAB,KDAB,Daytona Beach International Airport
1,Fort Lauderdale,FLL,FLL,KFLL,Fort Lauderdale–Hollywood International Airport
2,Fort Myers,RSW,RSW,KRSW,Southwest Florida International Airport
3,Fort Walton Beach,VPS,VPS,KVPS,Destin–Fort Walton Beach Airport/Eglin Air For...
4,Gainesville,GNV,GNV,KGNV,Gainesville Regional Airport


In [10]:
df.to_csv('Airports_Data.csv')