A Python 3 project to scrape, clean, and analyze data of the largest public companies. This repository demonstrates web scraping from public sources like Wikipedia and provides insights into company revenue, market capitalization, sectors, and other key metrics.
- Web Scraping: Extract company data using
requestsandBeautifulSoup. - Data Cleaning: Preprocess and clean the scraped data using
pandas. - Data Analysis: Analyze top companies by revenue, market cap, and sector.
- Visualization: Generate charts and plots with
matplotlibandseaborn. - Export Options: Save cleaned datasets to Excel for further use.
- Python 3
- Jupyter Notebook
- BeautifulSoup4
- Requests
- pandas
- matplotlib
- seaborn
- Excel
- Top 10 companies by revenue
- Market capitalization distribution across sectors
- Revenue vs Market Cap scatter plot
- Sector-wise pie chart
Contributions are welcome! Please open an issue or submit a pull request for any improvements.
This project is licensed under the MIT License. See LICENSE for details.
Kareeb Sadab
Email: kareebsadab@gmail.com