# **Scraping the Full Time Jobs on Apna .com**

![](https://i.imgur.com/6o1URlU.png)

* **Scraping :- Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.**

* **https://apna.co/jobs/ is a job consult website from you can get information about job vacancies in many companies. Here you can apply for job according to your profession, location,companies,qualification.On the site you can easily see the about job title, company name,location of office and salary which is company offered.**

* **we will scrape https://apna.co/jobs/ to get the details of jobs like their Jobs-Title,Company-Name,Location, and Salary using python libraries [requests](https://datagy.io/python-requests/) and [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) Then we will fetch the data in .CSV format using Pandas Library for further analysis.**

* **Outline of the steps we will follow:-**
1. **download the web-page using 'Requests'.**
2. **Parse the HTML Source code using 'BeautifulSoup'.**
3. **Extract the Job-Title,Company-Name,Location,Salary from the web-page.**
4. **Complile the extracted information into python lists and Dictionaries.**
5. **Extract and combine data from multiple pages.**
6. **Save the Extracted information into .CSV format**

* **At the end of the project we will create the csv file in the following format**
```
  Job-Name, Company, Location, Salary
  Business Head, Kalandoor Entertainments Private Limited,Elamakkara, ₹1,00,000 - ₹1,49,999............................................**
````

# How to Run code

You can execute the code using "Run" buttom at the top of this page and selecting "Run on Binder". 
you can make changes and save your own version of notebook to [jovian](https://jovian.com/) by executing the cells.


* firstly we need to install and import the library to download the webpages

In [1]:
!pip install requests --upgrade --quiet
!pip install BeautifulSoup4 --upgrade --quiet

In [2]:
import requests
from bs4 import BeautifulSoup

In [3]:
url="https://apna.co/jobs/full_time-jobs?page=1&work_type=full_time"

In [4]:
# using variable to get the webpage
response=requests.get(url,headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'})

In [5]:
# Checking the status code of the downloaded page
response.status_code

200

In [6]:
# checking the length of words 
len(response.text)

220032

In [7]:
page_content=response.text

In [8]:
# Checking the first 1000 words of the website
page_content[:1000]

'<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png"/><link rel="manifest" href="/site.webmanifest"/><link rel="preconnect" href="https://cdn.apna.co"/><meta name="theme-color" content="#4d3951"/><meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/><title>Full Time Jobs – Find 24,731 Full Time Job Vacancies  | apna.co</title><meta name="title" content="Full Time Jobs – Find 24,731 Full Time Job Vacancies  | apna.co"/><meta name="description" content="25-Aug-2023 – Apply for 24,731 Full Time Jobs on apna. Register for Free &amp; Find ✓ Online ✓ Work from Home ✓ Freshers ✓ Women Job Vacancies that are Full Time "/><meta name="image" content="https://apna.co/apna-time-icon.png"/><link rel="canonical" href="https://apna.co/jobs/f

In [9]:
# Creation a file and loading the page contents in it.

with open('webpage.html','w') as f:
    f.write(page_content)

# Use BeautifulSoup to Parse and Extract the Information

* Using BeautifulSoup Library to extract information of the web-page.

In [10]:
doc=BeautifulSoup(page_content,'html.parser')

In [11]:
# Check the type of doc which will be beautifulSoup object.
type(doc)

bs4.BeautifulSoup

In [12]:
# Checking the Title of the webpage
doc.title

<title>Full Time Jobs – Find 24,731 Full Time Job Vacancies  | apna.co</title>

# Using Properties and Methods to Extract the Required Information.

## Creating Function to grab all Jobs Title

In [13]:
def job(doc):
    job_name=[]
    for i in doc.find_all('div',class_='JobListCardstyles__JobTitle-ffng7u-7 cuaBGE'):
        job_name.append(i.text)
    return job_name

In [14]:
job(doc)

['Business Head',
 'Gynecologist',
 'Gynecologist',
 'UI Developer',
 'ASP.NET Developer',
 'Pediatrician And Neonatologist',
 'Freight Forwarder']

## Creating Function to  Grab the Name of Companies 

In [15]:
def  company(doc):
    company_name=[]
    for i in doc.find_all('div',class_='JobListCardstyles__JobCompany-ffng7u-8 gguURM'):
        company_name.append(i.text)
    return company_name

In [16]:
company(doc)

['Kalandoor Entertainments Private Limited',
 'Villemed Healthcare',
 'Bedi Hospital',
 'CBL Solutions',
 'Enix Software Private Limited',
 'Bedi Hospital',
 'Agraga Solutions']

## Extract the location of job

In [17]:
location=doc.find('div',class_='JobListCardstyles__DisplayFlexCenter-ffng7u-10 fNeylV')

In [18]:
location

<div class="JobListCardstyles__DisplayFlexCenter-ffng7u-10 fNeylV"><svg fill="none" height="16" width="16" xmlns="http://www.w3.org/2000/svg"><g clip-path="url(#a)"><path clip-rule="evenodd" d="M14 13.162V6.029a.81.81 0 0 0-.397-.69L8.515 2.147a.957.957 0 0 0-1.018 0l-5.1 3.182a.844.844 0 0 0-.397.7v7.134c0 .459.41.838.906.838h3.192V9.2h3.75V14h3.246c.497 0 .906-.379.906-.838Z" fill="#8C8594" fill-rule="evenodd"></path></g><defs><clippath id="a"><path d="M2 2h12v12H2z" fill="#fff"></path></clippath></defs></svg>  <!-- -->Elamakkara</div>

# Creating Function to Grab the Location of job 

In [19]:
def location(doc):
    job_location=[]
    for i in doc.find_all('div',class_='JobListCardstyles__DisplayFlexCenter-ffng7u-10 fNeylV')[::2]:
        job_location.append(i.text.strip())
    return job_location
location(doc)

['Elamakkara',
 'Bakhtiarpur',
 'Sector 33D chandigarh',
 'HITEC City',
 'Parsik Shiv Mandir',
 'Sector 33D chandigarh',
 'Mahadevapura Cross']

# Creating Function to grab the Salary

In [20]:
def salary(doc):
    salary_paid=[]
    for i in doc.find_all('div',class_='JobListCardstyles__DisplayFlexCenter-ffng7u-10 fNeylV')[1:2]:
        salary_paid.append(i.text.strip())
    return salary_paid

# Now, We Will Scrape the Multiple Pages of the Web-site using Functions

* Here we will scrape the multiple pages of the website, so that we can collect the data of all jobs present in the website

* For that we will create a Function Scrape_data() which will takes the page no as input as give us the data upto that pages.

* Here we have used the .Extend() method will add the specified list elements to the end of current list.

In [21]:
def scrap_data():
    all_details={'Job-Title':[],'Company-Name':[],'Location':[],'Salary':[]}
    for i in range(0,30):
        url=f"https://apna.co/jobs/full_time-jobs?page={i}&work_type=full_time"
        response=requests.get(url,headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'})
        if response.status_code!=200:
            raise Exception('failed to load page {}'.format(url))
        doc=BeautifulSoup(response.text)
        all_details['Job-Title'].extend(job(doc))
        all_details['Company-Name'].extend(company(doc))
        all_details['Location'].extend(location(doc))
        all_details['Salary'].extend(salary(doc))
    return all_details

# Now We Will Import the Pandas Library and Store the Data in It

* jobs_df variable will store the information in Pandas dataframe.
* We will scrape the first 30 pages of the website.
* 30 pages will give us 210 rows* 4  columns.

In [22]:
import pandas as pd
jobs_df=pd.DataFrame.from_dict(scrap_data(),orient='index').T

In [23]:
jobs_df

Unnamed: 0,Job-Title,Company-Name,Location,Salary
0,Business Head,Kalandoor Entertainments Private Limited,Elamakkara,"₹1,00,000 - ₹1,49,999"
1,Gynecologist,Villemed Healthcare,Bakhtiarpur,"₹1,00,000 - ₹1,49,999"
2,Gynecologist,Bedi Hospital,Sector 33D chandigarh,"₹1,00,000 - ₹1,49,999"
3,UI Developer,CBL Solutions,HITEC City,"₹80,000 - ₹1,40,000"
4,ASP.NET Developer,Enix Software Private Limited,Parsik Shiv Mandir,"₹30,000 - ₹1,30,000"
...,...,...,...,...
205,Chemical Engineer,Bhagwati Trading Company,Huda Sector 25,
206,Brand Manager,Phifer India,Anna Nagar,
207,Business Data Analyst,Your Car,S.D Planets School,
208,Business Development Executive,GNR Solution,Lone Phata,


# Now We Will Save Our Data in a .CSV Format

In [24]:
jobs_df.to_csv('jobs.csv' , index= None)

![jobs.csv](https://i.imgur.com/fTgaXSR.png)

In [25]:
import jovian

In [26]:
jovian.commit(Project ="web-scraping-project")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "dharmendradiwaker12/web-scraping-final-project-using-python" on https://jovian.com[0m
[jovian] Committed successfully! https://jovian.com/dharmendradiwaker12/web-scraping-final-project-using-python[0m


'https://jovian.com/dharmendradiwaker12/web-scraping-final-project-using-python'

# References and Future Work

## Summary

* The Project was a web scraping project which composite the libraries and function to fatch the data in .csv format and then can be downloaded as excel file for further analysis.

* We have used request and BeautifulSoup library to downloaded and scrao the web-pages respectively.

* we used find() and find_all() mothods to find different tags required from the website.

* then we created multiple functions to grab the below information.
1. Job-Title
2. Company-Name
3. Location
4. Salary

* After scraping the multiple functions we store the data in Pandas dataframe which is a library.

* Finally we convert the stored data into .CSV file and the downloaded as Excel.

## Future Scope

* We can dig more and collect more information job-wise mention below.
1. Qualification 
2. Experience required
3. Test

## References 

* Website scraping https://apna.co/
* beautifulSoup Document https://beautiful-soup-4.readthedocs.io/en/latest/
* Extend and Append method https://www.geeksforgeeks.org/append-extend-python/
