# **Collecting Job Data Using APIs**
Objectives

- Collect job data using Jobs API
- Store the collected data into an excel spreadsheet.


## Dataset Used in this Assignment

The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.

> Note: We are using a modified subset of that dataset for the lab, so to follow the lab instructions successfully please use the dataset provided with the lab, rather than the dataset from the original source.

The original dataset is a csv. We have converted the csv to json as per the requirement of the lab.


## Warm-Up Exercise


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [2]:
import requests
import pandas as pd

In [3]:
api_url = "http://api.open-notify.org/astros.json"

In [4]:
response = requests.get(api_url)

In [5]:
if response.ok:
    data = response.json()

In [6]:
print(data)

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}], 'number': 12, 'message': 'success'}


In [7]:
print(data.get('number'))

12


In [8]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

There are 12 astronauts on ISS
And their names are :
Oleg Kononenko
Nikolai Chub
Tracy Caldwell Dyson
Matthew Dominick
Michael Barratt
Jeanette Epps
Alexander Grebenkin
Butch Wilmore
Sunita Williams
Li Guangsu
Li Cong
Ye Guangfu


## Lab: Collect Jobs Data using Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [9]:
import pandas as pd
import json

The keys in the json are

- Job Title
- Job Experience Required
- Key Skills
- Role Category
- Location
- Functional Area
- Industry
-  Role

In [10]:
api_url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"
response = requests.get(api_url)
print(response.headers['Content-Type'])

application/json


In [11]:
data = response.json()
print(data[:3])

[{'Id': 0, 'Job Title': 'Digital Media Planner', 'Job Experience Required': '5 - 10 yrs', 'Key Skills': 'Media Planning| Digital Media', 'Role Category': 'Advertising', 'Location': 'Los Angeles', 'Functional Area': 'Marketing , Advertising , MR , PR , Media Planning', 'Industry': 'Advertising, PR, MR, Event Management', 'Role': 'Media Planning Executive/Manager'}, {'Id': 1, 'Job Title': 'Online Bidding Executive', 'Job Experience Required': '2 - 5 yrs', 'Key Skills': 'pre sales| closing| software knowledge| clients| requirements| negotiating| client| online bidding| good communication| technology', 'Role Category': 'Retail Sales', 'Location': 'New York', 'Functional Area': 'Sales , Retail , Business Development', 'Industry': 'IT-Software, Software Services', 'Role': 'Sales Executive/Officer'}, {'Id': 2, 'Job Title': 'Trainee Research/ Research Executive- Hi- Tech Operations', 'Job Experience Required': '0 - 1 yrs', 'Key Skills': 'Computer science| Fabrication| Quality check| Intellectu

Write a function to get the number of jobs for the Python technology.

In [12]:
def get_number_of_jobs_T(technology):
    number_of_jobs = sum(technology in job.get("Key Skills").split("|")for job in data)
    return technology, number_of_jobs

Calling the function for Python and checking if it works.


In [13]:
get_number_of_jobs_T("Python")

('Python', 1164)

#### Write a function to find number of jobs in US for a location of your choice


In [14]:
def get_number_of_jobs_L(location):
   number_of_jobs = sum(location in job.get("Location") for job in data)
   return location, number_of_jobs

Call the function for Los Angeles and check if it is working.


In [15]:
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 640)

### Store the results in an excel file


Create a python list of all locations for which you need to find the number of jobs postings.

In [16]:
countries = ['Los Angeles', 'New York', 'San Francisco', 'Washington DC', 'Seattle', 'Austin', 'Detroit']

Import libraries required to create excel spreadsheet


In [22]:
from openpyxl import Workbook

Create a workbook and select the active worksheet


In [23]:
wb = Workbook()
ws = wb.active
# Add row
ws.append(["Country", "Job Postings"])

Find the number of jobs postings for each of the technology in the above list.
Write the technology name and the number of jobs postings into the excel spreadsheet.


In [24]:
# Loop through the list of countries and get the job counts
job_counts = [get_number_of_jobs_L(location) for location in countries]
# only extract the number
numbers = [count for _, count in job_counts]

In [25]:
# Add data as rows
for country, count in zip(countries, numbers):
    ws.append([country, count])

Save into an excel spreadsheet named **job-postings.xlsx**.


In [26]:
wb.save("job-postings.xlsx")
print("Saved")

Saved


#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [21]:
technologies = ['C', 'C#', 'C++', 'Java', 'JavaScript', 'Python', 'Scala', 'Oracle', 'SQL Server', 'MySQL Server', 'PostgreSQL', 'MongoDB']

In [22]:
# Loop through the list of technologies and get the job counts
job_counts = [get_number_of_jobs_T(technology) for technology in technologies]
# only extract the number
numbers = [count for _, count in job_counts]

# Add to wb
ws.append(numbers)
# Print the job counts for each country
print(numbers)

[148, 215, 118, 470, 2, 1164, 0, 70, 30, 0, 1, 15]


In [41]:
def get_number_of_jobs_L_T(technologies, countries):
    final_list = []

    # Loop through each technology
    for technology in technologies:
        number_of_jobs_list = [technology]
        for location in countries:
            number_of_jobs = sum(
                location == job.get("Location") and technology in job.get("Key Skills", "").split("|")
                for job in data
            )
            number_of_jobs_list.append(number_of_jobs)
        final_list.append(number_of_jobs_list)
    return final_list

get_number_of_jobs_L_T(technologies, countries)

[['C', 4, 19, 2, 31, 22, 3, 13],
 ['C#', 4, 23, 3, 50, 34, 4, 36],
 ['C++', 2, 15, 2, 23, 16, 2, 10],
 ['Java', 6, 53, 9, 100, 66, 3, 62],
 ['JavaScript', 0, 0, 0, 0, 0, 0, 0],
 ['Python', 24, 143, 17, 256, 133, 15, 166],
 ['Scala', 0, 0, 0, 0, 0, 0, 0],
 ['Oracle', 3, 10, 4, 7, 8, 0, 14],
 ['SQL Server', 1, 1, 0, 4, 3, 0, 7],
 ['MySQL Server', 0, 0, 0, 0, 0, 0, 0],
 ['PostgreSQL', 0, 0, 0, 1, 0, 0, 0],
 ['MongoDB', 0, 3, 1, 4, 2, 0, 4]]

In [44]:
final_data = get_number_of_jobs_L_T(technologies, countries)
df_data = pd.DataFrame(final_data, columns = ['Technology'] + countries)
df_data

Unnamed: 0,Technology,Los Angeles,New York,San Francisco,Washington DC,Seattle,Austin,Detroit
0,C,4,19,2,31,22,3,13
1,C#,4,23,3,50,34,4,36
2,C++,2,15,2,23,16,2,10
3,Java,6,53,9,100,66,3,62
4,JavaScript,0,0,0,0,0,0,0
5,Python,24,143,17,256,133,15,166
6,Scala,0,0,0,0,0,0,0
7,Oracle,3,10,4,7,8,0,14
8,SQL Server,1,1,0,4,3,0,7
9,MySQL Server,0,0,0,0,0,0,0


In [None]:
df_data.to_excel("job-postings-country.xlsx", index=False)
print("Saved")

Saved


Copyright © IBM Corporation.


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
