# 📊 Task 10: Scrape and Analyze Job Listings for Data Analyst Roles

### 🎯 Objective:
- Scrape job listings from Naukri.com
- Extract Job Title, Company, Location, Salary, Skills
- Analyze top locations and most in-demand skills
- Visualize results using pie and bar charts

In [None]:
# 1️⃣ Import Libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter

In [None]:
# 2️⃣ Scrape Job Listings from Naukri.com
def scrape_naukri_jobs(url):
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    job_titles, companies, locations, salaries, skills = [], [], [], [], []

    job_cards = soup.find_all('article', class_='jobTuple')

    for job in job_cards:
        job_titles.append(job.find('a', class_='title').text.strip() if job.find('a', class_='title') else None)
        companies.append(job.find('a', class_='subTitle').text.strip() if job.find('a', class_='subTitle') else None)
        locations.append(job.find('li', class_='location').text.strip() if job.find('li', class_='location') else None)
        salaries.append(job.find('li', class_='salary').text.strip() if job.find('li', class_='salary') else 'Not mentioned')
        skill_ul = job.find('ul', class_='tags has-description')
        skills.append(skill_ul.text.strip() if skill_ul else None)

    return pd.DataFrame({
        'Job Title': job_titles,
        'Company': companies,
        'Location': locations,
        'Salary': salaries,
        'Skills': skills
    })

In [None]:
# 3️⃣ Run Scraper and Save Data
url = "https://www.naukri.com/data-analyst-jobs"
df = scrape_naukri_jobs(url)
df.to_csv("data_analyst_jobs.csv", index=False)
df.head()

In [None]:
# 4️⃣ Basic Summary
print(f"Total jobs scraped: {len(df)}")
print("\nTop Locations:")
print(df['Location'].value_counts().head())

In [None]:
# 5️⃣ Most In-Demand Skills
all_skills = []
for skill_list in df['Skills'].dropna():
    all_skills.extend([s.strip().lower() for s in skill_list.split(',')])
top_skills = Counter(all_skills).most_common(10)
for skill, freq in top_skills:
    print(f"{skill}: {freq}")

In [None]:
# 6️⃣ Visualize Top 5 Locations
top_locations = df['Location'].value_counts().head(5)
plt.figure(figsize=(6, 6))
top_locations.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title("Top 5 Job Locations")
plt.ylabel("")
plt.show()

In [None]:
# 7️⃣ Visualize Top Skills
skills_df = pd.DataFrame(top_skills, columns=["Skill", "Frequency"])
plt.figure(figsize=(10, 5))
plt.bar(skills_df["Skill"], skills_df["Frequency"], color='skyblue')
plt.title("Top 10 In-Demand Skills")
plt.xticks(rotation=45)
plt.xlabel("Skills")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()

### ✅ Summary:
- **Total Jobs Scraped**: Based on live site (use `len(df)`)
- **Top Locations**: Found using `.value_counts()`
- **Top Skills**: Counted from the `Skills` column
- **Visuals**: Pie chart (locations), Bar chart (skills)
- **Challenges**: 
    - Some job cards may lack salary/skills
    - Data changes frequently; best effort scraping