# Collecting Data Using a Mock Jobs API

This section outlines the process of collecting job posting data through a locally hosted mock Jobs API. The API is built using Flask and simulates real-world endpoints commonly used in job market analytics.

## Objective:
- Collect job data through a mock API endpoint
- Store the retrieved data in a structured format for further analysis

### Setup Instructions:
## Hosting the Mock API Locally

To run the API simulation, you must first download and host the Flask-based mock Jobs API. This setup enables retrieval of job posting data in a structured format through HTTP requests.

### Instructions:

1. **Download the API Notebook**  
   Access the `Jobs_API.ipynb` notebook from the following link:  
   [Jobs_API.ipynb](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb)

2. **Upload the Notebook**  
   Upload the notebook into your current Jupyter environment using the “Upload” button.  
   > Ensure the file is stored in the same directory as your working `.ipynb` file.

   If you are using a **local Jupyter environment**, follow the same upload procedure within your local interface:

   ![Upload Instructions](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Upload.PNG)

3. **Start the Flask API Server**  
   Open the uploaded `Jobs_API.ipynb` notebook and execute all cells.  
   Once the Flask server is active, a local API endpoint URL will be displayed. This URL can now be used to fetch data.

> Optional: To better understand how the Flask framework operates, refer to the [Flask API documentation](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/FLASK_API.md.html).

Once the API is running, proceed to request and retrieve job data for use in the next step of the project.






## Dataset Used in This Assignment

The dataset used in this assignment is based on the following publicly available source:  
[https://www.kaggle.com/promptcloud/jobs-on-naukricom](https://www.kaggle.com/promptcloud/jobs-on-naukricom)

This dataset is distributed under a **Public Domain license**.

> Note: A modified subset of the original dataset is used for this assignment. To ensure compatibility with the lab instructions, please use the dataset provided within the course resources rather than downloading it directly from the original source.

The original file format was a CSV. For the purposes of this lab, it has been converted to JSON to support the API-based data retrieval workflow.


## Collecting Job Postings Using the Jobs API

This section focuses on retrieving job posting counts for selected technologies and geographic locations using the mock Jobs API.

The objective is to determine the current number of open positions across key U.S. cities, providing a foundational understanding of location-based demand in the tech job market.

### Target Locations:
- Los Angeles  
- New York  
- San Francisco  
- Washington, D.C.  
- Seattle  
- Austin  
- Detroit

Data for each location will be retrieved through API calls and parsed for use in downstream analysis.


In [3]:
#Import dependencies
import pandas as pd
import json

## Function to Retrieve Job Postings by Technology

We’ll write a function that filters the job data by technology so we can quickly get the number of job postings for skills like Python.

The JSON dataset includes key fields such as:

- Job Title  
- Key Skills  
- Experience Required  
- Location  
- Industry  
- Role  

You can view the dataset structure here:  
[jobs.json](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json)


In [20]:
api_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"
import requests

def get_number_of_jobs_T(technology):
    response_api = requests.get(api_url)
    number_of_jobs = 0

    if response_api.ok:
        jobs = response_api.json()
        for job in jobs:
            key = job.get("Key Skills", "")
            if technology.lower() in key.lower():
                number_of_jobs += 1

    return technology, number_of_jobs


## Testing the Function with Python as the Target Skill

We'll now run the function using "Python" as the input to confirm it's working as expected and returning the correct job count.


In [21]:
get_number_of_jobs_T("Python")

('Python', 1173)

## Creating a Function to Retrieve Job Counts by Location

Next, we’ll define a function that returns the number of job postings for a given U.S. location.


In [26]:
def get_number_of_jobs_L(location):
    
    response_api = requests.get(api_url)

    number_of_jobs = 0

    if response_api.ok:            
        jobs = response_api.json()

    for job in jobs:
        loc = job.get('Location')

        if loc.find(location) > -1 :
            number_of_jobs = number_of_jobs + 1

    number_of_jobs
    return location,number_of_jobs
                

Calling the function for Los Angeles to check if it's working as expected.


In [27]:
get_number_of_jobs_L("Los Angels")

('Los Angels', 0)

## Storing the Results in an Excel File

We’ll call the API for all specified technologies and export the job counts to an Excel spreadsheet.


Define a list of technologies to analyze job demand across the market.


In [36]:
def get_number_of_jobs_L(location):
    response = requests.get(api_url)

    if response.ok:
        jobs_data = response.json()
    else:
        print("Failed to fetch data")
        return []  

    technologies = []

    for job in jobs_data:
        if job.get('Location', '').lower() == location.lower(): 
            skills = job.get('Key Skills', '')
            tech_list = [skill.strip() for skill in skills.split(',') if skill.strip()]
            technologies.extend(tech_list)

    unique_technologies = sorted(set(technologies))

    print(f"Technologies found in location '{location}':")
    print(unique_technologies)

    return unique_technologies



Import libraries required for writing the results to an Excel file.


In [37]:
from openpyxl import Workbook

Create a new Excel workbook and activate the default worksheet.


In [38]:
wb1 = Workbook()
ws1 = wb1.active

## Retrieving Job Counts and Writing to Excel

Loop through each technology, get the number of job postings, and write the results to the spreadsheet.


In [46]:
unique_technologies = sorted(set(technologies)) 

ws1.append(['Key Skills', 'Number of Jobs'])

for i in range(len(unique_technologies)):
    tech = unique_technologies[i]
    count = 0
    for job in jobs_data:
        if tech.lower() in job.get('Key Skills', '').lower():
            count += 1
    ws1.append([tech, count])

Save the results to an Excel file named `job-postings.xlsx`.


In [43]:
wb1.save('2.a-job-postings (Collected from API).xlsx')
wb1.close()

## Extending the Analysis to Additional Technologies

Use the same approach to retrieve job counts for the following languages and store the results in an Excel sheet:


- C  
- C#  
- C++  
- Java  
- JavaScript  
- Python  
- Scala  
- Oracle  
- SQL Server  
- MySQL Server  
- PostgreSQL  
- MongoDB  


In [45]:
wb2 = Workbook()
ws2 = wb2.active

languages = ['C','C#','C++','Java','JavaScript','Python','Scala','Oracle','SQL Server','MySQL Server','PostgreSQL','MongoDB']

ws2.append(['Languages','Number of Jobs'])

for i in range(len(languages)):
    ws2.append(get_number_of_jobs_T(languages[i]))

wb2.save('2.a-job-postings-languages (Collected from API).xlsx')
wb2.close()

---