Web Scrapping Project - List of Largest United State Companies by Revenue

Photo Credit: Visual Capitalist

Project Overview:

Welcome to my web scraping project that extracts data from the List of Largest Companies in the United States by Revenue. The project utilizes Python, the Beautiful Soup library for web scraping, and the Pandas library for data manipulation. The scraped data is then saved to a CSV file.

The primary goal of this project is to demonstrate how web scraping can be used to gather valuable information from websites and present it in a structured format.

Web Scrapping Step by Step:

Certainly, here are the steps to extract a table from a website using Python and Beautiful Soup:

Step 1: Import Required Libraries

from bs4 import BeautifulSoup
import requests

Step 2: Define the URL of the Website

url = "https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue"

Step 3: Send a GET Request and Parse HTML

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

Step 4: Locate the Table

table = soup.find('table', class_='wikitable sortable')

Step 5: Extract Data from the Table

rows = table.find_all('table')[1]
     print(row)

Step 6: Extract Header from the Table

world_titles = table.find_all('th')
   world_table_titles = [title.text.strip() for title in world_titles]
   print(world_table_titles)

Step 7: Create a Pandas DataFrame

import pandas as pd
df = pd.DataFrame(columns=world_table_titles)

Step 8: Extract Data from Rows

column_data = table.find_all('tr')
for row in column_data[1:]:
    row_data = row.find_all('td')
    individual_row_data = [data.text.strip() for data in row_data]
    print(individual_row_data)
    length = len(df)
    df.loc[length] = individual_row_data

Step 9: Display the DataFrame

print(df)

Step 10: Save Data to CSV

df.to_csv(r'C:\Users\Asad Ali\Desktop\Courses\Alex the Analyst\Python/Companies.csv',

Key Takeways:

Web Scraping Skills: Gained valuable experience in web scraping, which is a powerful technique for extracting data from websites for various purposes.
Beautiful Soup: Become familiar with Beautiful Soup, a Python library that simplifies web scraping by parsing HTML and XML documents.
Pandas for Data Manipulation: Used Pandas to organize and manipulate the extracted data, demonstrating how versatile it is for data analysis tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Companies.csv		Companies.csv
README.md		README.md
Web Scrabbing from real website + Pandas.ipynb		Web Scrabbing from real website + Pandas.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scrapping Project - List of Largest United State Companies by Revenue

Project Overview:

Web Scrapping Step by Step:

Key Takeways:

About

Releases

Packages

Languages

Tayyaba-Abro/Web-Scrapping-Project-Python

Folders and files

Latest commit

History

Repository files navigation

Web Scrapping Project - List of Largest United State Companies by Revenue

Project Overview:

Web Scrapping Step by Step:

Key Takeways:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages