Welcome to the Data Analysis on Hiring Trends project! This project involves the development of a Python script called scrapper.py that utilizes BeautifulSoup (bs4), Selenium, and Pandas libraries to extract data from Bumble Inc.'s page on LinkedIn. The collected data is then cleaned, structured, and stored in a CSV file for further analysis and visualization using Power BI. This README file provides an overview of the project, its components, and instructions on how to use the script, access the CSV data, and visualize it using Power BI.
Before running the scrapper.py script, make sure you have the following components installed on your system:
- Visual Studio Code (VS Code) or any other code editor of your choice
- Python (version 3.6 or above)
- BeautifulSoup (bs4) library
- Selenium library
- Pandas library
- Chrome WebDriver (compatible with your Chrome browser version)
- Power BI Desktop or Power BI Browser (for viewing the visualization)
To get started with the Data Analysis on Hiring Trends project in Visual Studio Code, follow these steps:
- Clone or download the project repository from GitHub.
- Open Visual Studio Code.
- Open the downloaded project folder in Visual Studio Code.
- Install the required libraries mentioned in the 'Requirements' section by opening the integrated terminal in Visual Studio Code and running the command
pip install beautifulsoup4 selenium pandas. - Obtain the Chrome WebDriver and save it in the project folder.
The data extraction process in this project utilizes Selenium and BeautifulSoup. Follow these steps to understand the data extraction process:
- The
scrapper.pyscript uses Selenium to automate the browser and navigate to Bumble Inc.'s LinkedIn page. - Once on the page, BeautifulSoup is used to extract relevant information such as hiring data, job titles, experience, and more.
- Selenium interacts with the webpage elements, while BeautifulSoup parses the HTML content and extracts the desired data.
- The extracted data is stored in variables or data structures for further processing and cleaning.
To clean the extracted data using Pandas, the scrapper.py script utilizes various Pandas functionalities. Follow these steps to understand the data cleaning process:
- After extracting data from the website using BeautifulSoup and Selenium, the script creates a Pandas DataFrame to store the raw data.
- The DataFrame allows for easier manipulation and cleaning of the data.
- The script applies various cleaning techniques such as removing unnecessary columns, handling missing values, converting data types, and performing other transformations as needed.
- Pandas provides functions like
drop,fillna,replace, andastypeto perform these cleaning operations efficiently. - Once the data is cleaned and structured, the script exports the cleaned DataFrame to a CSV file using the Pandas
to_csvfunction. - The exported CSV file contains the cleaned and structured data, ready for further analysis and visualization.
Once the scrapper.py script has run successfully, you can access the cleaned and structured data stored in the CSV file. Follow these instructions:
- Locate the generated CSV file in the project folder.
- Open the CSV file using a spreadsheet software (e.g., Microsoft Excel, Google Sheets) or a text editor.
- Explore and analyze the data in the CSV file to gain insights into hiring trends and other relevant information.
The visualization of the hiring trends data is performed using Power BI. We have created a visualization dashboard using Power BI Desktop. Follow the steps below to access the visualization:
- Launch Power BI Desktop or open Power BI Browser on your computer.
- If using Power BI Desktop, open the provided PBIX file using Power BI Desktop. If using Power BI Browser, upload the PBIX file to Power BI and open it.
- The PBIX file contains the pre-built visualization dashboard.
- Connect Power BI Desktop or Power BI Browser to the CSV file generated by the
scrapper.pyscript. - Customize and interact with the visualizations to gain deeper insights into the hiring trends and patterns.
The Data Analysis on Hiring Trends project enables you to extract data from Bumble Inc.'s LinkedIn page using the scrapper.py script, clean and structure it using Pandas, and store it in a CSV file for analysis. The data extraction process utilizes Selenium and BeautifulSoup, while the data cleaning process utilizes Pandas. You can access the CSV data and visualize it using Power BI. By following the instructions mentioned above, you can successfully run the script in Visual Studio Code, extract and clean the data, store it in a CSV file, and explore the visualization dashboard in Power BI. Feel free to customize the script or the visualization to suit your specific needs.
For any further questions or issues, please contact the project team at [ch20btech11012@iith.ac.in]. We appreciate your interest in our project and hope it proves useful for analyzing hiring trends.
Thank you for choosing the Data Analysis on Hiring Trends project!
Note: Please ensure that you comply with the LinkedIn and Bumble Inc. terms of service and policies when using the scraper to gather data.