Photo Credit: Visual Capitalist
Welcome to my web scraping project that extracts data from the List of Largest Companies in the United States by Revenue. The project utilizes Python, the Beautiful Soup library for web scraping, and the Pandas library for data manipulation. The scraped data is then saved to a CSV file.
The primary goal of this project is to demonstrate how web scraping can be used to gather valuable information from websites and present it in a structured format.
Certainly, here are the steps to extract a table from a website using Python and Beautiful Soup:
Step 1: Import Required Libraries
from bs4 import BeautifulSoup
import requests
Step 2: Define the URL of the Website
url = "https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue"
Step 3: Send a GET Request and Parse HTML
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
Step 4: Locate the Table
table = soup.find('table', class_='wikitable sortable')
Step 5: Extract Data from the Table
rows = table.find_all('table')[1]
print(row)
Step 6: Extract Header from the Table
world_titles = table.find_all('th')
world_table_titles = [title.text.strip() for title in world_titles]
print(world_table_titles)
Step 7: Create a Pandas DataFrame
import pandas as pd
df = pd.DataFrame(columns=world_table_titles)
Step 8: Extract Data from Rows
column_data = table.find_all('tr')
for row in column_data[1:]:
row_data = row.find_all('td')
individual_row_data = [data.text.strip() for data in row_data]
print(individual_row_data)
length = len(df)
df.loc[length] = individual_row_data
Step 9: Display the DataFrame
print(df)
Step 10: Save Data to CSV
df.to_csv(r'C:\Users\Asad Ali\Desktop\Courses\Alex the Analyst\Python/Companies.csv',
- Web Scraping Skills: Gained valuable experience in web scraping, which is a powerful technique for extracting data from websites for various purposes.
- Beautiful Soup: Become familiar with Beautiful Soup, a Python library that simplifies web scraping by parsing HTML and XML documents.
- Pandas for Data Manipulation: Used Pandas to organize and manipulate the extracted data, demonstrating how versatile it is for data analysis tasks.