<a href="https://colab.research.google.com/github/JayShekhavat/Project-on-Web-Scraping-of-job-website/blob/main/Numerical_Programming_Mid_Course_Summative_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Problem Statement: Navigating the Data Science Job Landscape**

🚀 Unleash your creativity in crafting a solution that taps into the heartbeat of the data science job market! Envision an ingenious project that seamlessly wields cutting-edge web scraping techniques and illuminating data analysis.

🔍 Your mission? To engineer a tool that effortlessly gathers job listings from a multitude of online sources, extracting pivotal nuggets such as job descriptions, qualifications, locations, and salaries.

🧩 However, the true puzzle lies in deciphering this trove of data. Can your solution discern patterns that spotlight the most coveted skills? Are there threads connecting job types to compensation packages? How might it predict shifts in industry demand?

🎯 The core objectives of this challenge are as follows:

1. Web Scraping Mastery: Forge an adaptable and potent web scraping mechanism. Your creation should adeptly harvest data science job postings from a diverse array of online platforms. Be ready to navigate evolving website structures and process hefty data loads.

2. Data Symphony: Skillfully distill vital insights from the harvested job listings. Extract and cleanse critical information like job titles, company names, descriptions, qualifications, salaries, locations, and deadlines. Think data refinement and organization.

3. Market Wizardry: Conjure up analytical tools that conjure meaningful revelations from the gathered data. Dive into the abyss of job demand trends, geographic distribution, salary variations tied to experience and location, favored qualifications, and emerging skill demands.

4. Visual Magic: Weave a tapestry of visualization magic. Design captivating charts, graphs, and visual representations that paint a crystal-clear picture of the analyzed data. Make these visuals the compass that guides users through job market intricacies.

🌐 While the web scraping universe is yours to explore, consider these platforms as potential stomping grounds:

* LinkedIn Jobs
* Indeed
* Naukri
* Glassdoor
* AngelList

🎈 Your solution should not only decode the data science job realm but also empower professionals, job seekers, and recruiters to harness the dynamic shifts of the industry. The path is open, the challenge beckons – are you ready to embark on this exciting journey?








##**Modulary of code**

#**Scrapping the websites**

**Import libraries**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



**Install the Request and beautifulSoup libraries**

In [2]:
! pip install beautifulsoup4
! pip install requests



Import libraries for web scrapping

In [3]:
# Import the necessary libraries
import requests
from bs4 import BeautifulSoup
import re
import csv

##**Scrapping the Indeed job website**

In [6]:

# URL of Indeed job search page
url = 'https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&searchTextSrc=as&searchTextText=%22Data+Science%22&txtKeywords=%22Data+Science%22&txtLocation='

# Send GET request to the URL
response = requests.get(url)


# Parse HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all job cards
job_cards = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

print(job_cards)



[<li class="clearfix job-bx wht-shd-bx">
<header class="clearfix">
<!--
-->
<!-- -->
<h2>
<a href="https://www.timesjobs.com/job-detail/data-science-hyrefox-consultants-chennai-5-to-8-yrs-jobid-dMBdCrIRA0NzpSvf__PLUS__uAgZw==&amp;source=srp" onclick="logViewUSBT('view','69115164','data science  ,  data cleaning  ,  dashboards','Chennai','5 - 8','IT Software : Software Products &amp; Services','1','as' )" target="_blank">
<strong class="blkclor">Data</strong> <strong class="blkclor">Science</strong></a> </h2>
<h3 class="joblist-comp-name">
    HyreFox Consultants
    
    </h3>
</header>
<ul class="top-jd-dtl clearfix">
<li><i class="material-icons">card_travel</i>5 - 8 yrs</li>
<li>
<i class="material-icons">location_on</i>
<span title="Chennai">Chennai</span>
</li>
</ul>
<ul class="list-job-dtl clearfix">
<li>
<label>Job Description:</label>
Must have experience in any programming languageExpertise in data science programming languageResponsible for the design ,  development ,  an

In [71]:
lst = soup.find('ul', class_="top-jd-dtl clearfix")

In [107]:
lst1 = soup.find('li',class_="clearfix job-bx wht-shd-bx")

In [111]:
print(lst1.text.strip())

Data Science 

    HyreFox Consultants
    
    


card_travel5 - 8 yrs

location_on
Chennai




Job Description:
Must have experience in any programming languageExpertise in data science programming languageResponsible for the design ,  development ,  and maintenance of data pipelinesPer... More Details


KeySkills:

data science  ,  data cleaning  ,  dashboards
        
       





Apply




Posted 1 day ago


In [72]:
print(lst.text.strip())

card_travel5 - 8 yrs

location_on
Chennai


In [73]:
print(lst.text.strip().split('-')[1])

 8 yrs

location_on
Chennai


In [74]:
print(lst.text.strip().split('-')[1].split('\n'))

[' 8 yrs', '', 'location_on', 'Chennai']


In [75]:
print(lst.text.strip().split('-')[1].split('\n')[0])

 8 yrs


In [81]:
print(lst.text.strip().split('-')[1].split('\n')[3:])

['Chennai']


In [49]:
lst = soup.find('ul', class_="list-job-dtl clearfix")

In [51]:
print(lst.text.strip())

Job Description:
Must have experience in any programming languageExpertise in data science programming languageResponsible for the design ,  development ,  and maintenance of data pipelinesPer... More Details


KeySkills:

data science  ,  data cleaning  ,  dashboards


In [53]:
print(lst.text.strip().split(':')[1].split('\n')[1])

Must have experience in any programming languageExpertise in data science programming languageResponsible for the design ,  development ,  and maintenance of data pipelinesPer... More Details


In [61]:
print(lst.text.strip().split(':')[2].split('\n')[2:])

['data science  ,  data cleaning  ,  dashboards']


In [59]:
print(lst.text.strip().split(':')[1:])

['\r\nMust have experience in any programming languageExpertise in data science programming languageResponsible for the design ,  development ,  and maintenance of data pipelinesPer... More Details\n\n\nKeySkills', '\n\ndata science  ,  data cleaning  ,  dashboards']


In [100]:
job_title = []
company_name = []
job_description = []
skills = []
location = []
experience = []


for i in job_cards:
  job_title.append(i.find('h2').text.strip())
  company_name.append(i.find('h3', class_="joblist-comp-name").text.strip())
  job_description.append(i.find('ul', class_ = "list-job-dtl clearfix").text.strip().split(':')[1].split('\n')[1])
  skills.append(i.find('ul', class_ = "list-job-dtl clearfix").text.strip().split(':')[2].split('\n')[2:])
  experience.append(i.find('ul', class_="top-jd-dtl clearfix").text.strip().split('-')[1].split('\n')[0])
  location.append(i.find('ul', class_="top-jd-dtl clearfix").text.strip().split('-')[1].split('\n')[3:])


dicts = {'job_title':job_title, 'company_name':company_name, 'job_description':job_description, 'skills':skills, 'experience':experience,'location':location}

df = pd.DataFrame(dicts)

In [101]:
df.head(9)

Unnamed: 0,job_title,company_name,job_description,skills,experience,location
0,Data Science,HyreFox Consultants,Must have experience in any programming langua...,"[data science , data cleaning , dashboards]",8 yrs,[Chennai]
1,Data Science,tcg digital solutions pvt ltd,Education Masters / Bachelors degree in Comput...,"[ \r, data analytics , functi...",5 yrs,[Kolkata]
2,Data Science,innefu labs pvt. ltd.,Location,[],6 yrs,"[Delhi, Delhi/NCR]"
3,Data Science,bprise pvt ltd,Develop and plan required analytic projects in...,"[ \r, hive , algorithms , a...",6 yrs,[Mumbai]
4,Data Science Internship in Ahmedabad,Maxgen Technologies\r\n (More Jobs),Maxgen Technologies Pvt Ltd is it company base...,"[ \r, .\r, \r, ,...",1 yrs,"[Ahmedabad, Mehsana, Rajkot, Surat, Surend..."
5,Data Science Internship in Ahmedabad,Maxgen Technologies\r\n (More Jobs),Maxgen Technologies Pvt Ltd is it company base...,"[ \r, .\r, \r, ,...",1 yrs,"[Ahmedabad, Mehsana, Rajkot, Surat, Surend..."
6,Data Science-LLM,LTIMindtree Ltd.\r\n (More Jobs),8+ years of IT experience and minimum of 5+ ye...,"[ \r, LLM , ""Large language m...",12 yrs,[]
7,Data Science Internship in Ahmedabad,Maxgen Technologies\r\n (More Jobs),Maxgen Technologies pvt ltd offers live projec...,"[ \r, .]",1 yrs,"[Ahmedabad, Bhavnagar, Gandhinagar, Jamnaga..."
8,Data Science Internship in Pune,Maxgen Technologies\r\n (More Jobs),Maxgen Technologies pvt ltd offering live proj...,"[ \r, .\r, \r, ,...",1 yrs,"[Pune, Jalgaon, Kolhapur, Nagpur, Solapur]"


In [84]:
df.shape

(25, 6)

In [85]:
df.isnull().sum()

job_title          0
company_name       0
job_description    0
skills             0
salary             0
location           0
dtype: int64

In [112]:
df.columns

Index(['job_title', 'company_name', 'job_description', 'skills', 'experience',
       'location'],
      dtype='object')

In [88]:
df['job_title'].value_counts

<bound method IndexOpsMixin.value_counts of 0                             Data Science
1                             Data Science
2                             Data Science
3                             Data Science
4     Data Science Internship in Ahmedabad
5     Data Science Internship in Ahmedabad
6                         Data Science-LLM
7     Data Science Internship in Ahmedabad
8          Data Science Internship in Pune
9     Data Science Internship In Ahmedabad
10    Data Science Internship In Ahmedabad
11    Data Science Internship in Ahmedabad
12          Data Science Classroom Trainer
13    Data Science Internship in Ahmedabad
14    Data Science Internship in Ahmedabad
15    Data Science Internship in Ahmedabad
16         Data Science Internship In Pune
17         Data Science Internship in Pune
18         Data Science Internship In Pune
19         Data Science Internship In Pune
20         Data Science Internship in Pune
21         DATA SCIENCE INTERNSHIP IN PUNE
22        

In [89]:
df['company_name'].value_counts

<bound method IndexOpsMixin.value_counts of 0                           HyreFox Consultants
1                 tcg digital solutions pvt ltd
2                         innefu labs pvt. ltd.
3                                bprise pvt ltd
4       Maxgen Technologies\r\n     (More Jobs)
5       Maxgen Technologies\r\n     (More Jobs)
6          LTIMindtree Ltd.\r\n     (More Jobs)
7       Maxgen Technologies\r\n     (More Jobs)
8       Maxgen Technologies\r\n     (More Jobs)
9       Maxgen Technologies\r\n     (More Jobs)
10      Maxgen Technologies\r\n     (More Jobs)
11      Maxgen Technologies\r\n     (More Jobs)
12    NARESH I TECHNOLOGIES\r\n     (More Jobs)
13      Maxgen Technologies\r\n     (More Jobs)
14      Maxgen Technologies\r\n     (More Jobs)
15      Maxgen Technologies\r\n     (More Jobs)
16      Maxgen Technologies\r\n     (More Jobs)
17      Maxgen Technologies\r\n     (More Jobs)
18      Maxgen Technologies\r\n     (More Jobs)
19      Maxgen Technologies\r\n     (More Jo

In [90]:
df['job_description'].value_counts

<bound method IndexOpsMixin.value_counts of 0     Must have experience in any programming langua...
1     Education Masters / Bachelors degree in Comput...
2                                             Location 
3     Develop and plan required analytic projects in...
4     Maxgen Technologies Pvt Ltd is it company base...
5     Maxgen Technologies Pvt Ltd is it company base...
6     8+ years of IT experience and minimum of 5+ ye...
7     Maxgen Technologies pvt ltd offers live projec...
8     Maxgen Technologies pvt ltd offering live proj...
9     Maxgen Technologies Pvt Ltd is it company base...
10    Maxgen Technologies pvt ltd offers live projec...
11    Maxgen technologies pvt ltd offering live proj...
12    Designation / Position - Data Science Classroo...
13    Maxgen Technologies pvt ltd offers live projec...
14    Maxgen Technologies Pvt ltd offers live projec...
15    Maxgen Technologies Pvt ltd offers live projec...
16    Maxgen technologies pvt ltd offering college p...
17  

In [115]:
df['experience'].value_counts

<bound method IndexOpsMixin.value_counts of 0       8 yrs
1       5 yrs
2       6 yrs
3       6 yrs
4       1 yrs
5       1 yrs
6      12 yrs
7       1 yrs
8       1 yrs
9       1 yrs
10      1 yrs
11      1 yrs
12     10 yrs
13      1 yrs
14      1 yrs
15      1 yrs
16      1 yrs
17      1 yrs
18      1 yrs
19      1 yrs
20      1 yrs
21      1 yrs
22      1 yrs
23      1 yrs
24      1 yrs
Name: experience, dtype: object>

In [92]:
df['location'].value_counts

<bound method IndexOpsMixin.value_counts of 0                                             [Chennai]
1                                             [Kolkata]
2                                   [Delhi,  Delhi/NCR]
3                                              [Mumbai]
4     [Ahmedabad,  Mehsana,  Rajkot,  Surat,  Surend...
5     [Ahmedabad,  Mehsana,  Rajkot,  Surat,  Surend...
6                                                    []
7     [Ahmedabad,  Bhavnagar,  Gandhinagar,  Jamnaga...
8        [Pune,  Jalgaon,  Kolhapur,  Nagpur,  Solapur]
9     [Ahmedabad,  Mehsana,  Rajkot,  Surat,  Surend...
10    [Ahmedabad,  Bhavnagar,  Gandhinagar,  Jamnaga...
11    [Ahmedabad,  Mehsana,  Rajkot,  Surat,  Surend...
12                                                   []
13    [Ahmedabad,  Mehsana,  Rajkot,  Surat,  Surend...
14    [Ahmedabad,  Bhavnagar,  Gandhinagar,  Jamnaga...
15    [Ahmedabad,  Bhavnagar,  Gandhinagar,  Jamnaga...
16       [Pune,  Jalgaon,  Kolhapur,  Nagpur,  Solapur]
17  

In [116]:
df.drop('skills',axis = 1,  inplace  = True)

In [117]:
df.head()

Unnamed: 0,job_title,company_name,job_description,experience,location
0,Data Science,HyreFox Consultants,Must have experience in any programming langua...,8 yrs,[Chennai]
1,Data Science,tcg digital solutions pvt ltd,Education Masters / Bachelors degree in Comput...,5 yrs,[Kolkata]
2,Data Science,innefu labs pvt. ltd.,Location,6 yrs,"[Delhi, Delhi/NCR]"
3,Data Science,bprise pvt ltd,Develop and plan required analytic projects in...,6 yrs,[Mumbai]
4,Data Science Internship in Ahmedabad,Maxgen Technologies\r\n (More Jobs),Maxgen Technologies Pvt Ltd is it company base...,1 yrs,"[Ahmedabad, Mehsana, Rajkot, Surat, Surend..."


In [118]:
#saving filter data as Filter Data in csv format
df.to_csv('Jobs Data.csv',index=False)

##**Summary and Technical document**

I Extracted data from https://www.timesjobs.com website, fetched top data science jobs from timejobs website.
There are 25 records and 6 columns in final dataset.
Most jobs are from data science and data analysis.
For all jobs need experience to apply examination form.

