<a href="https://colab.research.google.com/github/PawanYadav007s/Web_Scraping/blob/main/Copy_of_Numerical_Programming_in_Python_Web_Scraping_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Exploring Data Science Job Opportunities 🔍🚀**

Your mission is to design a specialized tool that extracts and analyzes data science job listings from a single online source of your choice. Focus on crafting a laser-focused web scraping solution tailored to the chosen platform, ensuring the collection of crucial details such as job titles, company names, experience requirements, salary ranges, and locations.

### **Key Tasks:**

1. **Source Selection:**
Choose a preferred online platform for data science job listings. Platforms like TimesJobs, LinkedIn Jobs, Indeed, Naukri, Glassdoor are potential options. Specify your chosen source in your solution. 🎯
2. **Web Scraping Precision:**
Engineer a targeted web scraping mechanism adept at extracting specific information from the chosen platform. ⚙️
3. **Data Extraction:**
Focus on extracting essential details from job listings, including but not limited to job titles, company names, required experience levels, salary ranges, and locations. 📊
4. **Data Organization:**
Ensure efficient organization and cleaning of the extracted data. The emphasis should be on presenting the information in a clear and understandable format.🧹
5. **Insights Generation:**
Develop tools for analyzing the gathered data to generate insights. Explore patterns related to job titles, experience requirements, salary distributions, and geographic preferences. 🔍
6. **Visualization:**
Create visual representations such as charts and graphs to communicate the insights effectively. Your visuals should provide a user-friendly interpretation of the data. 📈

In [173]:
# Install and Import libraries

In [174]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd
import datetime
import csv

In [175]:
# Define the URL of the webpage to be scraped
url='https://www.timesjobs.com/jobfunction/it-software-jobs'

In [176]:
# Make a request to the webpage and get the response
response=requests.get(url)

In [177]:
# Create a BeautifulSoup object to parse the HTML content of the webpage
soup = BeautifulSoup(response.text, "html.parser")

In [178]:
title = soup.title.string

In [179]:
print(title)

It Software Jobs - 14958 It Software Jobs Openings In India - TimesJob.com


In [180]:
# print content of request

In [181]:
print(response.content)

b'\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html>\n\t<head>\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t    \n\t\t\t\t\t    <link rel="canonical" href="https://www.timesjobs.com/jobfunction/it-software-jobs"/>\n\t\t\t\t\t\t<link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.timesjobs.com/jobfunction/it-software-jobs"/>\n\t\t\t\t\t    \n\t\t\t\t\t    \n\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\n\t\t\t\n\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t<link rel="next" href="https://www.timesjobs.com/jobfunction/it-software-jobs/&sequence=2&startPage=1"/>\n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\n\t\t\n\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\t

In [182]:

# Find Data For Comapny name
s=soup.find('ul', class_="joblist")
comp_data=s.find_all('h3', class_="joblist-comp-name")



In [183]:
# Create an empty list to store data of comany_name
company_names=[]
for company in comp_data:
    # Use .contents to get the list of children inside the <h3> tag
    company_name = ""

    if company.contents:  # Check if company.contents is not empty
        company_name = company.contents[0].strip()  # Use the first child's text content
    else:
        company_name = company.get_text().strip()  # Fall back to using get_text() if contents is empty

    company_names.append(company_name)


In [184]:
# Print or use the Company_name as needed
company_names

['CLOUD VISA IMMIGRATION LLP',
 'PRIMEX IMMIGRATION LLP',
 'FLIGHT TO SUCESS IMMIGRATION LLP',
 'Maxgen Technologies',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'CLOUD VISA IMMIGRATION LLP',
 'Sarosh Karki',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'AMAR TECHNOLABS PVT LTD',
 'MVS ENGINEERING PRIVATE LIMITED',
 'Sapphire Software Solutions',
 'Maxgen Technologies',
 'WALKWAY IMMIGRATION SERVICES LLP',
 'ADAL IMMIGRATIONS LLP',
 'MCL',
 'Jovial trip',
 'Solay Indu Priya',
 'Vasanthi',
 'IPS e Services Pvt. Ltd.',
 'Maxgen Technologies',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'Maxgen Technologies',
 'CLOUD VISA IMMIGRATION LLP',
 'MCL',
 'Vasanthi',
 'Maxgen Technologies',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'Maxgen Technologies',
 'Vasanthi',
 'Maxgen Technologies',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'Maxgen Technologies',
 'Vasanthi',
 'SAMPOORNA CONSULTANTS PVT LTD',
 'SAMPOORNA CONSULTANTS PVT LTD',
 

In [185]:
# Create an empty list to store data of job_titles
job_titles = []
text = s.find_all('a', attrs={'class': None})

for i in text:
    job_title = i.get_text().split('\n')[0].strip()
    job_titles.append(job_title)

print(job_titles)



['UX Designer Job In Singapore', 'More Details', 'PHP Developer Hiring For SINGAPORE', 'More Details', 'Full Stack Developer in Abroad no ielts mandatory', 'More Details', 'AWS Solution Architect Internship in Pune', 'More Details', 'Staff Engineer - DOTNET-FULLSTACK - REF20215I Chennai - J47465', 'More Details', 'Graphic Designer Job  In United Kingdom', 'More Details', 'Software Engineer,Software Manager,Manager,It Support Manager', 'More Details', 'Sr Staff Engineer-DOTNET-FULLSTACK-REF20432E Bengalore - J47468', 'More Details', 'PHP Developer,Laravel Developer,Software Developer', 'More Details', 'IT Administrator', 'More Details', 'PHP Developer', 'More Details', 'Node Js Internship in Ahmedabad', 'More Details', 'Software Engineer Job In SINGAPORE', 'More Details', 'Software Engineer Required In SINGAPORE', 'More Details', 'Perl Developer', 'More Details', 'Earn handful of money by sitting at home', 'More Details', 'Senior Java Applications Developer,Java (J2EE) Developer', 'More

In [203]:
experience=[]
e = s.find_all('ul', class_='job-more-dtl clearfix')
for i in e:
  experience.append(i.find('li').text.split('\n')[0].strip()+" yrs")
print(experience)

['8 - 13 yrs', '7 - 12 yrs', '8 - 13 yrs', '0 - 1 yrs', '6 - 9 yrs', '7 - 12 yrs', '20 - 25 yrs', '7 - 11 yrs', '0 - 1 yrs', '3 - 5 yrs', '1 - 5 yrs', '0 - 1 yrs', '6 - 11 yrs', '6 - 11 yrs', '3 - 8 yrs', '0 - 3 yrs', '5 - 10 yrs', '0 - 3 yrs', '4 - 9 yrs', '0 - 1 yrs', '6 - 9 yrs', '7 - 11 yrs', '0 - 1 yrs', '5 - 10 yrs', '3 - 8 yrs', '0 - 3 yrs', '0 - 1 yrs', '6 - 9 yrs', '13 - 18 yrs', '0 - 1 yrs', '0 - 3 yrs', '0 - 1 yrs', '7 - 11 yrs', '13 - 18 yrs', '0 - 1 yrs', '0 - 3 yrs', '7 - 11 yrs', '13 - 15 yrs', '0 - 3 yrs', '7 - 11 yrs', '7 - 12 yrs', '0 - 3 yrs', '6 - 9 yrs', '0 - 3 yrs', '6 - 9 yrs', '0 - 3 yrs', '6 - 9 yrs', '0 - 3 yrs', '6 - 9 yrs', '0 - 3 yrs']


In [210]:
salary = []

for item in e:
    li_elements = item.find_all('li')

    # Check if there are at least two <li> elements
    if len(li_elements) >= 2:
        salary_info = li_elements[1].text.strip()
        salary.append(salary_info)
    else:
        salary.append(None)  # Append None if no second <li> element found

# Print or use the 'salary' list as needed
print(salary)



['Rs 50.00 - 90.00 Lacs p.a.', 'Rs 50.00 - 90.00 Lacs p.a.', 'Rs 40.10 - 69.65 Lacs p.a.', 'Rs 1.00 - 2.00 Lacs p.a.', 'Rs 15.00 - 32.00 Lacs p.a.', 'Rs 50.00 - 90.00 Lacs p.a.', 'Rs 10.00 - 45.95 Lacs p.a.', 'Rs 16.00 - 36.00 Lacs p.a.', 'Best in Industry', 'As per Industry Standards', 'As per Industry Standards', 'As per Industry Standards', 'Rs 50.00 - 90.00 Lacs p.a.', 'Rs 50.00 - 90.00 Lacs p.a.', 'Rs 6.00 - 12.00 Lacs p.a.', 'Rs 3.00 - 7.05 Lacs p.a.', 'As per Industry Standards', 'Rs 4.35 - 6.55 Lacs p.a.', 'Best in Industry', 'Rs 1.00 - 2.00 Lacs p.a.', 'Rs 15.00 - 32.00 Lacs p.a.', 'Rs 16.00 - 36.00 Lacs p.a.', 'As per Industry Standards', 'Rs 50.00 - 90.00 Lacs p.a.', 'Rs 8.00 - 15.00 Lacs p.a.', 'Rs 4.35 - 6.55 Lacs p.a.', 'As per Industry Standards', 'Rs 15.00 - 32.00 Lacs p.a.', 'Rs 20.00 - 37.00 Lacs p.a.', 'Rs 1.00 - 2.00 Lacs p.a.', 'Rs 4.35 - 6.55 Lacs p.a.', 'As per Industry Standards', 'Rs 16.00 - 36.00 Lacs p.a.', 'Rs 20.00 - 37.00 Lacs p.a.', 'Rs 1.00 - 2.00 Lacs p

In [212]:
location=[]
for item in e:
    li_elements = item.find_all('li')

    # Check if there are at least two <li> elements
    if len(li_elements) >= 3:
       location_info = li_elements[2].text.strip()
       location.append(location_info)
    else:
        location.append(None)  # Append None if no second <li> element found

# Print or use the 'salary' list as needed
print(location)



['Singapore', 'Singapore', 'Germany,  Sweden', '27-Dec-2023 - 23-Feb-2024 | 09:30 AM - 04:30 PM |  509 , 5th Floor, Pride Icon, Kharadi, Near Atithi Veg Restaurant Pune.', 'Chennai', '', 'Patna,  Bilaspur,  Palanpur,  Jind,  Bilaspur', 'Bengaluru / Bangalore', 'Ahmedabad', '13-Jan-2024 - 27-Jan-2024 | 11:00 AM - 03:30 PM |  Delhi', 'Ahmedabad', 'Ahmedabad,  Bhavnagar,  Gandhinagar,  Jamnagar,  Mehsana', 'Singapore', 'Singapore', 'Mumbai', 'Tezpur,  Gir,  Yavatmal,  Imphal,  Shillong', 'Bengaluru / Bangalore,  Delhi/NCR,  Hyderabad/Secunderabad,  Mumbai,  Noida/Greater Noida', 'Nadiad,  Navsari,  Palanpur,  Patan,  Porbandar', 'Mumbai', '27-Dec-2023 - 23-Feb-2024 | 10:00 AM - 03:30 PM |  509, 5th Floor, Pride Icon, Kharadi, Near Atithi Veg restaurant  Pune,', 'Bengaluru / Bangalore,  Chennai,  Gurgaon,  Hyderabad/Secunderabad,  Jaipur', 'Hyderabad/Secunderabad', '26-Dec-2023 - 22-Feb-2024 | 09:00 AM - 04:00 PM |  303 shoppers plaza 4, Chimanlal Girdharlal Rd, Ahmedabad', 'Canada', 'Mumb