# **Collecting Job Data Using APIs**


## The Dataset

In this part, I demonstrate the concept of using an API to perform the ETL process for the dataset, which will be used in the following parts of the project. The dataset used in this part comes from the following source: [Jobs on Naukri.com](https://www.kaggle.com/promptcloud/jobs-on-naukricom) under a **Public Domain license**.


In [None]:
import requests # you need this module to make an API call
import pandas as pd

In [None]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [None]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [None]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [None]:
print(data)   # print the data just to check the output or for debugging

{'message': 'success', 'people': [{'name': 'Jasmin Moghbeli', 'craft': 'ISS'}, {'name': 'Andreas Mogensen', 'craft': 'ISS'}, {'name': 'Satoshi Furukawa', 'craft': 'ISS'}, {'name': 'Konstantin Borisov', 'craft': 'ISS'}, {'name': 'Oleg Kononenko', 'craft': 'ISS'}, {'name': 'Nikolai Chub', 'craft': 'ISS'}, {'name': "Loral O'Hara", 'craft': 'ISS'}], 'number': 7}


Print the number of astronauts currently on ISS.


In [None]:
print(data.get('number'))

7


Print the names of the astronauts currently on ISS.


In [None]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

There are 7 astronauts on ISS
And their names are :
Jasmin Moghbeli
Andreas Mogensen
Satoshi Furukawa
Konstantin Borisov
Oleg Kononenko
Nikolai Chub
Loral O'Hara


In [None]:
#Import required libraries
import pandas as pd
import json


## Function counts the number of job




The following function counts the number of job listings related to a specific technology from a JSON dataset obtained from an API. It takes the technology as input, searches for job listings containing that technology as a key skill, and returns the count.

In [None]:
api_url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"
payload={'Location':'Los Angeles'}

response = requests.get(api_url,params=payload)
if response.ok:
    data2 = response.json()

def get_number_of_jobs_T(technology):


    job_count = 0

    key_skill=technology


    for item in data2:
        if 'Key Skills' in item and key_skill in item['Key Skills'] :
            job_count += 1

    number_of_jobs=job_count
    return technology,number_of_jobs


In [None]:
get_number_of_jobs_T("Python")

('Python', 1173)

## Function to find number of jobs in US for a location


In [None]:
def get_number_of_jobs_L(location):

    job_count=0
    locations = location
    for item in data2:
        if  item['Location'] == locations:
            job_count += 1
    number_of_jobs=job_count
    return location,number_of_jobs

Call the function for Los Angeles and check if it is working.




In [None]:
#your code goes here
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 640)

## Store the results in an excel file


Call the API for all the diffrent  technologies and locations
write the results in an excel spreadsheet.


In [None]:
locations = ['Los Angeles', 'New York', 'San Francisco', 'Washington DC', 'Seattle', 'Austin', 'Detroit']


Import libraries required to create excel spreadsheet


In [None]:
!pip install openpyxl



In [None]:
from openpyxl import Workbook

Create a workbook and select the active worksheet


In [None]:
wb=Workbook()
ws=wb.active

In [None]:
ws.append(['jobs_postings','Location'])


In [None]:
for i in locations:
    Location,jobs_postings=get_number_of_jobs_L(i)
    ws.append([jobs_postings,Location])


Save into an excel spreadsheet named 'job-postings.xlsx'.


In [None]:
wb.save("job-postings.xlsx")

Donig the same stpes for the diffrent technologies

In [None]:

languages= ["C", "C#", "C++", "Java", "JavaScript", "Python", "Scala", "Oracle", "SQL Server", "MySQL Server", "PostgreSQL", "MongoDB"]
wb2=Workbook()
ws2=wb2.active
ws2.append(['jobs_postings','languages'])
for i in languages:
    languages,jobs_postings=get_number_of_jobs_T(i)
    ws2.append([jobs_postings,languages])

wb2.save("job-postingsbythec.xlsx")
