Skip to content

Solidx74/PublicCompanies-WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Largest Public Companies Data Analysis & Web Scraping

A Python 3 project to scrape, clean, and analyze data of the largest public companies. This repository demonstrates web scraping from public sources like Wikipedia and provides insights into company revenue, market capitalization, sectors, and other key metrics.


Features

  • Web Scraping: Extract company data using requests and BeautifulSoup.
  • Data Cleaning: Preprocess and clean the scraped data using pandas.
  • Data Analysis: Analyze top companies by revenue, market cap, and sector.
  • Visualization: Generate charts and plots with matplotlib and seaborn.
  • Export Options: Save cleaned datasets to Excel for further use.

Technologies Used

  • Python 3
  • Jupyter Notebook
  • BeautifulSoup4
  • Requests
  • pandas
  • matplotlib
  • seaborn
  • Excel

Example Output

  • Top 10 companies by revenue
  • Market capitalization distribution across sectors
  • Revenue vs Market Cap scatter plot
  • Sector-wise pie chart

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements.


License

This project is licensed under the MIT License. See LICENSE for details.


Author

Kareeb Sadab
Email: kareebsadab@gmail.com

About

This project demonstrates web scraping and data analysis of the largest public companies using Python 3. It includes scripts to extract company data from sources like Wikipedia, clean and transform the data, and perform insightful analysis on revenue, market capitalization, sector distribution, and other key metrics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors