# Data Collection from a Jobs API (Python)

**Objective:** Retrieve job posting data from an API endpoint, compute job counts by technology and by location, and export summary results to Excel.

**Tools:** Python, Requests, Pandas, SQLite-ready tabular outputs (optional), OpenPyXL


## Dataset & Source

This notebook uses the **Jobs API** dataset provided in the IBM Skills Network labs (originally sourced from a public dataset of job postings). The API returns job records in JSON format.


## Tasks

- Count job postings for specific **technologies** (e.g., Python, Java, SQL).
- Count job postings for specific **locations** (e.g., Los Angeles, New York, Seattle).
- Export technology counts to an Excel file (`job-postings.xlsx`).


### Locations to check

Los Angeles, New York, San Francisco, Washington DC, Seattle, Austin, Detroit


In [1]:
# Import required libraries
import requests
import pandas as pd
from openpyxl import Workbook


In [2]:
# Jobs API endpoint (returns JSON records)
API_URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

def _load_jobs(url: str):
    """Fetch jobs JSON from the API and return a list of job records."""
    r = requests.get(url, timeout=30)
    r.raise_for_status()
    data = r.json()

    # Handle common shapes: list OR {"data":[...]} OR {"jobs":[...]}
    if isinstance(data, list):
        return data
    if isinstance(data, dict):
        for key in ["data", "jobs", "result"]:
            if key in data and isinstance(data[key], list):
                return data[key]
    return []

def get_number_of_jobs_T(technology: str):
    """Return (technology, count) where count is number of postings mentioning the technology."""
    jobs = _load_jobs(API_URL)
    tech = technology.strip().lower()

    count = 0
    for job in jobs:
        title = str(job.get("Job Title", "")).lower()
        skills = str(job.get("Key Skills", "")).lower()
        if tech and (tech in title or tech in skills):
            count += 1
    return technology, count

def get_number_of_jobs_L(location: str):
    """Return (location, count) where count is number of postings in the given location."""
    jobs = _load_jobs(API_URL)
    loc = location.strip().lower()

    count = 0
    for job in jobs:
        job_loc = str(job.get("Location", "")).lower()
        if loc and loc in job_loc:
            count += 1
    return location, count


### Quick checks


In [3]:
get_number_of_jobs_T("Python")


('Python', 1188)

In [4]:
get_number_of_jobs_L("Los Angeles")


('Los Angeles', 640)

## Export technology counts to Excel


In [10]:
import pandas as pd

technologies = [
    "C", "C#", "C++", "Java", "JavaScript", "Python", "Scala",
    "Oracle", "SQL Server", "mysql", "PostgreSQL", "MongoDB"
]

results = []

for tech in technologies:
    tech_name, count = get_number_of_jobs_T(tech)
    results.append({"Technology": tech_name, "Number of Jobs": count})

df = pd.DataFrame(results)
df.head()

Unnamed: 0,Technology,Number of Jobs
0,C,25973
1,C#,555
2,C++,513
3,Java,3547
4,JavaScript,2254


In [11]:
df.to_excel("job-postings.xlsx", index=False)
print("File saved successfully")

File saved successfully
