# **Collecting Job Data Using APIs**


## Dataset Used

The dataset used here comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom

> Note: The dataset in this notebook is modified. The original dataset is a csv and is converted to json here.

<h2>Objectives</h2>

<ul>
    <li>Determine the number of jobs open for various technologies</li>
    <li>Determine the number of jobs open for various locations</li>
    <li>Export the data to an Excel spreadsheet</li>
</ul>


<h2>Locations</h2>

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit

<h2>Technologies</h2>

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB



 <h2>JSON Keys</h2>
 
 * Job Title
 * Job Experience Required
 * Key Skills
 * Role Category
 * Location
 * Functional Area
 * Industry
 * Role 
 
You can also view the json file contents from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.


<h2>Using the Jobs API</h2>

<p>The API was provided by IBM in the original lab.</p>

<ol>
    <li>Download the Jobs_API.ipynb file</li>
    <li>Place the file in the same folder as this file</li>
    <li>Run all the cells in the Jobs API notebook</li>
</ol>

Original Source: [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb)

In [None]:
#Import libraries
import requests
import pandas as pd
import json

<h3>Function to find the number of jobs for a technology</h3>

In [None]:
api_url="http://127.0.0.1:5000/data"
#json_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

def get_number_of_jobs_T(technology):
    payload = {"Key Skills" : technology}
    response = requests.get(api_url, params=payload)
    if response.ok:
        data = response.json()
        number_of_jobs = len(data)
        return technology, number_of_jobs
    else:
        return f"Error has occurred or {technology} was not found."

In [None]:
# Testing the function with Python as an argument
get_number_of_jobs_T("Python")

('Python', 1173)

### Function to find number of jobs in US for a location


In [54]:
def get_number_of_jobs_L(location):
    payload = {"Location" : location}
    response = requests.get(api_url, payload)
    if response.ok:
        data = response.json()
        number_of_jobs = len(data)
        return location, number_of_jobs
    else:
        return f"Error has occurred or {location} was not found."

In [None]:
# Testing the function with Los Angeles as an argument
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 640)

### Store the results in an excel file


In [None]:
# List of locations
locations = ['Los Angeles', 'New York', 'San Francisco', 'Washington DC', 'Seattle', 'Austin', 'Detroit']

# List to store the # of job postings in the location
job_postings_L = []

# Get the # of jobs in each location and appending to the list
for location in locations:
    l, n = get_number_of_jobs_L(location)
    job_postings_L.append(n)

# Create a dataframe to store the data
df = pd.DataFrame({'Location': locations, 'Number of Jobs': job_postings_L})

# Save the dataframe to an excel spreadsheet
df.to_excel("job-postings-location.xlsx", index=False)

In [60]:
# List of the technologies
technologies = ['C', 'C#', 'C++', 'Java', 'JavaScript', 'Python', 'Scala', 'Oracle', 'SQL Server', 'MySQL Server', 'PostgreSQL', 'MongoDB']

# List to store the # of job postings for the technologies
job_postings_T = []

# Find the # of job postings using the defined function and append to list
for technology in technologies:
    t, n = get_number_of_jobs_T(technology)
    job_postings_T.append(n)

# Create a dataframe with pandas using the lists
df_technology = pd.DataFrame({'Technology': technologies, 'Number of Jobs': job_postings_T})

# Export the dataframe to excel 
df_technology.to_excel("job-postings-technology.xlsx", index=False)

<h2>Note:</h2>

<p>This is a modified version of the original IBM Lab. Many things were removed to make it easier to understand and follow along.</p>