# Web Scraping Project 1.0
**Project Name - Job Analytics** 

**URL** =  https://www.instahyre.com/python-jobs

**Project Objective** - In this project, I employed Python web scraping to extract over 300 job listings pertaining to Python roles from the Instahyre website. Subsequently, I compiled a dataset containing specific details for each job listing.

**Description** - This job analytics web scraping project utilized various Python libraries for enhanced data extraction. Additionally, I employed other Python libraries such as Pandas to store the data into a dataframe, and implemented functions to extract specific data points. The extracted data included company names, positions, locations, founding details, and required skills for each job.

**About the website** - Instahyre is a job posting website where new jobs are posted daily. Its filter section allows users to search for specific jobs of interest.


In [1]:
# python libraries for web scraping 
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
import requests

# for creating a dataframe
import pandas as pd

  from pandas.core import (


**Checking the response of the website**

In [2]:
# getting the url 

url = "https://www.instahyre.com/python-jobs"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"}
response = requests.get(url, headers = headers)

#checking the response
print(response)

<Response [200]>


**Using Selenium library for extracting the data**

In [3]:
web = "https://www.instahyre.com/python-jobs"

#for specifying the web driver
driver = webdriver.Chrome()

# for website automation
driver.get(web)

In [4]:
'''Creating a blank list to store all the data'''

job_name = []
location = []
founded = []
employees = []
about = []
skills = []
links = []

# for iterating to next page:
for j in range(80): # 80 because i want to extract more than 300 details and there are some null values
    
    n = driver.find_elements(By.XPATH, '//div[@class = "employer-details col-md-10 col-sm-8 col-xs-10"]//div[@class = "employer-job-name"]')
    l = driver.find_elements(By.XPATH, '//div[@class="employer-locations"]/span/span[@class = "ng-binding ng-scope"]')
    f = driver.find_elements(By.XPATH, '//span[@ng-if="opp.employer.company_founded"]/span[@class="ng-binding"]') 
    e = driver.find_elements(By.XPATH, '//span[@ng-if="opp.employer.employee_count"]/span[@class="ng-binding"]')
    a = driver.find_elements(By.XPATH, '//div[@ng-if="opp.employer.instahyre_note"]')
    s = driver.find_elements(By.XPATH, '//div[@class="job-skills ng-scope"]/ul[@class="tags candidate-opp-keywords"]')
    li = driver.find_elements(By.XPATH, '//div[@class="opportunity-action-links opportunity-action-links-desktop col-md-2 col-sm-2 col-xs-12 pull-right"]//a[@target="_blank"]')
    
    for i in range(20): # for extracting all 20 details that is in one page one by one for 300 details 

        # using try and except method to avoid errors
        #for name
        try:
            job_name.append(n[i].text)
        except Exception:
            job_name.append(None)
            
        #for location:
        try:
            location.append(l[i].text)
        except Exception:
            location.append(None)
        
        #for founded:
        try:
            founded.append(f[i].text)
        except Exception:
            founded.append(None)
            
        #for employees:
        try:
            employees.append(e[i].text)
        except Exception:
            employees.append(None)
        
        #for about:
        try:
            about.append(a[i].text)
        except Exception:
            about.append(None)
            
        #for skills:
        try:
            skills.append(s[i].text.split("\n"))
        except Exception:
            skills.append(None)
            
        #for links:
        try:
            links.append(li[i].get_attribute("href"))
        except Exception:
            links.append(None)
    
    #code for going to next page 
    time.sleep(2) 
    
    next_button = driver.find_element(By.XPATH, '//li[@ng-click="nextPage()"]')    
    driver.execute_script("arguments[0].click();", next_button) # for click 
    driver.implicitly_wait(2) # waiting for 2 seconds to load the website fully 
    
driver.quit() #after completing the tast quit the driver


# Creating a dataframe

In [6]:
df = pd.DataFrame({"Company Name & Postion": job_name, "Location": location, 
                   "Founded": founded, "Employees": employees,
                   "About": about, "Skills": skills, "Job portal Link": links})

# Cleaning

In [10]:
#checking the null values
df.isnull().sum().any()

True

In [22]:
# removing the null values and creating a new dataframe

df2 = df.dropna()
df2.reset_index(drop = True)

Unnamed: 0,Company Name & Postion,Location,Founded,Employees,About,Skills,Job portal Link
0,Adobe - Site Reliability Engineer,Job available in Noida,Founded in 1982,More than 1000 employees,"Founded in 1982 and headquartered in San Jose,...","[Python, AWS, Ansible, Build Tools, CI - CD, C...",https://www.instahyre.com/job-312716-site-reli...
1,ANZ Bank - Data Engineer,Job available in Bangalore,Founded in 1900,More than 1000 employees,ANZ Bank was formed in 1835 in London. They pr...,[Python],https://www.instahyre.com/job-312745-data-engi...
2,Broadridge - AI Senior Developer,Job available in Bangalore,Founded in 1962,More than 1000 employees,Broadridge is a financial services organizatio...,"[Python, Machine Learning, SQL, TensorFlow, AWS]",https://www.instahyre.com/job-312867-ai-senior...
3,Broadridge - QA Automation Tester (Python),Job available in Bangalore,Founded in 1962,More than 1000 employees,Broadridge is a financial services organizatio...,"[Python, Automation Testing, Selenium, Quality...",https://www.instahyre.com/job-312785-qa-automa...
4,CS Soft Solutions - API Test Engineer,Job available in Work From Home,Founded in 2009,200 - 500 employees,CS Soft Solutions is an IT service company tha...,"[Python, API Testing, Selenium, Automation Tes...",https://www.instahyre.com/job-312786-api-test-...
...,...,...,...,...,...,...,...
498,epiFi - Product Analyst,Job available in Bangalore,Founded in 2019,50 - 200 employees,Ganit provides solutions at the intersection o...,"[Python, Power BI, QlikView, R, SQL, Tableau]",https://www.instahyre.com/job-298168-product-a...
499,Ganit - Senior Data Engineer - GCP,Job available in Bangalore,Founded in 2017,50 - 200 employees,HSBC is a British multinational banking and fi...,"[Python, Google Cloud, Spark, Scala, ETL]",https://www.instahyre.com/job-298251-senior-da...
500,HSBC - Software Test Engineer,Job available in Bangalore,Founded in 1865,More than 1000 employees,Kenvue is a healthcare company and a subsidiar...,"[Python, Selenium, Manual Testing, Automation ...",https://www.instahyre.com/job-298516-software-...
501,Kenvue - Principal Engineer,Job available in Bangalore,Founded in 2022,More than 1000 employees,Luxoft is a leading provider of software devel...,"[Python, DevOps, AWS, Azure, Ansible]",https://www.instahyre.com/job-298541-principal...


In [23]:
df2.isnull().sum()

Company Name & Postion    0
Location                  0
Founded                   0
Employees                 0
About                     0
Skills                    0
Job portal Link           0
dtype: int64

# Exporting the dataframe into a excel file

In [26]:
df2.to_excel(r"C:\Users\CW\OneDrive\Desktop\MASAI\PROJECTS\PYTHON PROJECTS\Web Scraping project 1\Job detail.xlsx", index = False)