# Web Scraping for Collecting Airport Data

## Introduction:

This Jupyter Notebook focuses on collecting data about airports in Central America using web scraping from Wikipedia. The objective is to gather essential information about airport locations, names, and other relevant details.

## Objectives:

1. Obtain data on airports in Central America from Wikipedia.
2. Extract key information such as airport names, locations, and other relevant details.
3. Store the collected data in a suitable format for further analysis and use in our database.

## Libraries

* **Requests**: The requests library allows users to make HTTP requests to web pages, facilitating the download of HTML content from Wikipedia for further processing.

* **Beautiful Soup (bs4)**: Beautiful Soup is a useful tool for parsing and searching HTML elements in the downloaded content. It enables users to extract specific information from Wikipedia pages, such as titles, paragraphs, links, and more.

* **Pandas**: Pandas is an essential library for structuring and manipulating extracted data. It allows users to organize data into rows and columns, facilitating operations such as cleaning, filtering, and processing.


In [1]:
import requests
import bs4
import requests
import pandas as pd

In [84]:
def get_airlines():
    
    url = "https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_Central_America"

    response = requests.get(url)

    soup = bs4.BeautifulSoup(response.text, 'html.parser')

    
    airport_table = soup.find_all('table')[3]

    
    data = {"Country": [], "Airport Name": [], "IATA Code": [],"ICAO Code":[], "City Served": [], "Passengers": []}

    
    for row in airport_table.find_all('tr')[1:]:
        columns = row.find_all('td')
        data["Country"].append(columns[1].get_text(strip=True))
        data["Airport Name"].append(columns[2].get_text(strip=True))
        data["IATA Code"].append(columns[3].get_text(strip=True).split('/')[0])
        data["ICAO Code"].append(columns[3].get_text(strip=True).split('/')[1])
        data["City Served"].append(columns[4].get_text(strip=True))
        data["Passengers"].append(int(columns[5].get_text(strip=True).replace(",", "").split('[')[0]))
        

    df = pd.DataFrame(data)
                
    return df

In [85]:
get_airlines()

Unnamed: 0,Country,Airport Name,IATA Code,ICAO Code,City Served,Passengers
0,Panama,Tocumen International Airport,PTY,MPTO,Panamá City,14741937
1,Costa Rica,Juan Santamaría International Airport,SJO,MROC,San José,6456750
2,El Salvador,Comalapa International Airport,SAL,MSLP,San Salvador,2984764
3,Guatemala,La Aurora International Airport,GUA,MGGT,Guatemala City,2579123
4,Nicaragua,Augusto C. Sandino International Airport,MGA,MNMG,Managua,1533034
5,Costa Rica,Daniel Oduber International Airport,LIR,MRLB,Liberia,1182123
6,Belize,Philip S. W. Goldson International Airport,BZE,MZBZ,Belize City,867976
7,Honduras,Ramón Villeda Morales International Airport,SAP,MHLM,San Pedro Sula,867747
8,Honduras,Toncontín International Airport,TGU,MHTG,Tegucigalpa,697925
9,Honduras,Juan Manuel Gálvez International Airport,RTB,MHRO,Roatán,371657
