Skip to content

Ekshan267/Data-Analysis-Hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Data-Analysis-Hackathon

Project Name: Data Analysis on Hiring Trends

Introduction

Welcome to the Data Analysis on Hiring Trends project! This project involves the development of a Python script called scrapper.py that utilizes BeautifulSoup (bs4), Selenium, and Pandas libraries to extract data from Bumble Inc.'s page on LinkedIn. The collected data is then cleaned, structured, and stored in a CSV file for further analysis and visualization using Power BI. This README file provides an overview of the project, its components, and instructions on how to use the script, access the CSV data, and visualize it using Power BI.

Requirements

Before running the scrapper.py script, make sure you have the following components installed on your system:

  • Visual Studio Code (VS Code) or any other code editor of your choice
  • Python (version 3.6 or above)
  • BeautifulSoup (bs4) library
  • Selenium library
  • Pandas library
  • Chrome WebDriver (compatible with your Chrome browser version)
  • Power BI Desktop or Power BI Browser (for viewing the visualization)

Getting Started

To get started with the Data Analysis on Hiring Trends project in Visual Studio Code, follow these steps:

  1. Clone or download the project repository from GitHub.
  2. Open Visual Studio Code.
  3. Open the downloaded project folder in Visual Studio Code.
  4. Install the required libraries mentioned in the 'Requirements' section by opening the integrated terminal in Visual Studio Code and running the command pip install beautifulsoup4 selenium pandas.
  5. Obtain the Chrome WebDriver and save it in the project folder.

Data Extraction with Selenium and BeautifulSoup

The data extraction process in this project utilizes Selenium and BeautifulSoup. Follow these steps to understand the data extraction process:

  1. The scrapper.py script uses Selenium to automate the browser and navigate to Bumble Inc.'s LinkedIn page.
  2. Once on the page, BeautifulSoup is used to extract relevant information such as hiring data, job titles, experience, and more.
  3. Selenium interacts with the webpage elements, while BeautifulSoup parses the HTML content and extracts the desired data.
  4. The extracted data is stored in variables or data structures for further processing and cleaning.

Data Cleaning with Pandas

To clean the extracted data using Pandas, the scrapper.py script utilizes various Pandas functionalities. Follow these steps to understand the data cleaning process:

  1. After extracting data from the website using BeautifulSoup and Selenium, the script creates a Pandas DataFrame to store the raw data.
  2. The DataFrame allows for easier manipulation and cleaning of the data.
  3. The script applies various cleaning techniques such as removing unnecessary columns, handling missing values, converting data types, and performing other transformations as needed.
  4. Pandas provides functions like drop, fillna, replace, and astype to perform these cleaning operations efficiently.
  5. Once the data is cleaned and structured, the script exports the cleaned DataFrame to a CSV file using the Pandas to_csv function.
  6. The exported CSV file contains the cleaned and structured data, ready for further analysis and visualization.

Accessing the CSV Data

Once the scrapper.py script has run successfully, you can access the cleaned and structured data stored in the CSV file. Follow these instructions:

  1. Locate the generated CSV file in the project folder.
  2. Open the CSV file using a spreadsheet software (e.g., Microsoft Excel, Google Sheets) or a text editor.
  3. Explore and analyze the data in the CSV file to gain insights into hiring trends and other relevant information.

Data Visualization with Power BI

The visualization of the hiring trends data is performed using Power BI. We have created a visualization dashboard using Power BI Desktop. Follow the steps below to access the visualization:

  1. Launch Power BI Desktop or open Power BI Browser on your computer.
  2. If using Power BI Desktop, open the provided PBIX file using Power BI Desktop. If using Power BI Browser, upload the PBIX file to Power BI and open it.
  3. The PBIX file contains the pre-built visualization dashboard.
  4. Connect Power BI Desktop or Power BI Browser to the CSV file generated by the scrapper.py script.
  5. Customize and interact with the visualizations to gain deeper insights into the hiring trends and patterns.

Conclusion

The Data Analysis on Hiring Trends project enables you to extract data from Bumble Inc.'s LinkedIn page using the scrapper.py script, clean and structure it using Pandas, and store it in a CSV file for analysis. The data extraction process utilizes Selenium and BeautifulSoup, while the data cleaning process utilizes Pandas. You can access the CSV data and visualize it using Power BI. By following the instructions mentioned above, you can successfully run the script in Visual Studio Code, extract and clean the data, store it in a CSV file, and explore the visualization dashboard in Power BI. Feel free to customize the script or the visualization to suit your specific needs.

For any further questions or issues, please contact the project team at [ch20btech11012@iith.ac.in]. We appreciate your interest in our project and hope it proves useful for analyzing hiring trends.

Thank you for choosing the Data Analysis on Hiring Trends project!

Note: Please ensure that you comply with the LinkedIn and Bumble Inc. terms of service and policies when using the scraper to gather data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages