# **Collecting Job Data Using APIs**


In [1]:
import requests # you need this module to make an API call
import pandas as pd

ModuleNotFoundError: No module named 'requests'

In [2]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [3]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [4]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [5]:
print(data)   # print the data just to check the output or for debugging

{'number': 10, 'people': [{'name': 'Sergey Prokopyev', 'craft': 'ISS'}, {'name': 'Dmitry Petelin', 'craft': 'ISS'}, {'name': 'Frank Rubio', 'craft': 'ISS'}, {'name': 'Stephen Bowen', 'craft': 'ISS'}, {'name': 'Warren Hoburg', 'craft': 'ISS'}, {'name': 'Sultan Alneyadi', 'craft': 'ISS'}, {'name': 'Andrey Fedyaev', 'craft': 'ISS'}, {'name': 'Jing Haiping', 'craft': 'Tiangong'}, {'name': 'Gui Haichow', 'craft': 'Tiangong'}, {'name': 'Zhu Yangzhu', 'craft': 'Tiangong'}], 'message': 'success'}


Print the number of astronauts currently on ISS.


In [6]:
print(data.get('number'))

10


Print the names of the astronauts currently on ISS.


In [7]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

There are 10 astronauts on ISS
And their names are :
Sergey Prokopyev
Dmitry Petelin
Frank Rubio
Stephen Bowen
Warren Hoburg
Sultan Alneyadi
Andrey Fedyaev
Jing Haiping
Gui Haichow
Zhu Yangzhu


Hope the warmup was helpful. Good luck with your next lab!


## Lab: Collect Jobs Data using Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [11]:
#Import required libraries
import pandas as pd
import json

#### Write a function to get the number of jobs for the Python technology.<br>
> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.
  Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01">link</a>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 
 
You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.


In [42]:
import requests

api_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

def get_number_of_jobs_T(technology):
    response = requests.get(api_url)

    if response.status_code == 200:
        jobs = response.json()
        number_of_jobs = sum(technology in job.get("Key Skills", "") for job in jobs)
    else:
        print("Failed to fetch jobs.")
        number_of_jobs = 0

    return technology, number_of_jobs



Calling the function for Python and checking if it works.


In [43]:
technology, number_of_jobs = get_number_of_jobs_T("Python")
print(f"Number of jobs for {technology}: {number_of_jobs}")



Number of jobs for Python: 1173


#### Write a function to find number of jobs in US for a location of your choice


In [4]:
import requests

api_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

def get_number_of_jobs_in_location(location):
    response = requests.get(api_url)

    if response.status_code == 200:
        jobs = response.json()
        number_of_jobs = sum(location in job.get("Location", "") for job in jobs)
    else:
        print("Failed to fetch jobs.")
        number_of_jobs = 0

    return location, number_of_jobs



Call the function for Los Angeles and check if it is working.




In [5]:
location, number_of_jobs = get_number_of_jobs_in_location("Los Angeles")
print(f"Number of jobs in {location}: {number_of_jobs}")




Number of jobs in Los Angeles: 640


### Store the results in an excel file


Call the API for all the given technologies above and write the results in an excel spreadsheet.


If you do not know how create excel file using python, double click here for **hints**.

<!--

from openpyxl import Workbook        # import Workbook class from module openpyxl
wb=Workbook()                        # create a workbook object
ws=wb.active                         # use the active worksheet
ws.append(['Country','Continent'])   # add a row with two columns 'Country' and 'Continent'
ws.append(['Eygpt','Africa'])        # add a row with two columns 'Egypt' and 'Africa'
ws.append(['India','Asia'])          # add another row
ws.append(['France','Europe'])       # add another row
wb.save("countries.xlsx")            # save the workbook into a file called countries.xlsx


-->


Create a python list of all locations for which you need to find the number of jobs postings.


In [8]:
#your code goes here
locations_to_search = ["Los Angeles", "New York", "Boston", "Seattle"]

for location in locations_to_search:
    location_name, number_of_jobs = get_number_of_jobs_in_location(location)
    print(f"Number of jobs in {location_name}: {number_of_jobs}")


Number of jobs in Los Angeles: 640
Number of jobs in New York: 3226
Number of jobs in Boston: 2966
Number of jobs in Seattle: 3375


Import libraries required to create excel spreadsheet


In [17]:
import pandas as pd
!pip install openpyxl


Collecting openpyxl
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.0/250.0 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2


Create a workbook and select the active worksheet


In [19]:
from openpyxl import Workbook

wb = Workbook()  # Create a workbook object
ws = wb.active   # Use the active worksheet

ws.append(['Country','Continent'])   # add a row with two columns 'Country' and 'Continent'
ws.append(['Eygpt','Africa'])        # add a row with two columns 'Egypt' and 'Africa'
ws.append(['India','Asia'])          # add another row
ws.append(['France','Europe'])       # add another row

# To save the workbook
wb.save("countries.xlsx")


Find the number of jobs postings for each of the location in the above list.
Write the Location name and the number of jobs postings into the excel spreadsheet.


In [20]:
from openpyxl import Workbook

# Define the locations to search for job postings
locations_to_search = ["Los Angeles", "New York", "Boston", "Seattle"]

# Function to get the number of jobs for a location (replace with real implementation)
def get_number_of_jobs(location):
    # Replace with code to fetch actual number of jobs
    return 1000

# Create a workbook object
wb = Workbook()

# Use the active worksheet
ws = wb.active

# Add header row
ws.append(['Location', 'Number of Jobs'])

# Iterate through locations and find job postings
for location in locations_to_search:
    number_of_jobs = get_number_of_jobs(location)
    ws.append([location, number_of_jobs])

# Save the workbook
wb.save("job_postings.xlsx")

print("Job postings have been written to job_postings.xlsx!")


Job postings have been written to job_postings.xlsx!


Save into an excel spreadsheet named 'job-postings.xlsx'.


In [21]:
wb.save("job-postings.xlsx")

#### In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.


Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [22]:
from openpyxl import Workbook

# Define the technologies to search for job postings
technologies_to_search = [
    "C", "C#", "C++", "Java", "JavaScript", "Python", "Scala",
    "Oracle", "SQL Server", "MySQL Server", "PostgreSQL", "MongoDB"
]

# Function to get the number of jobs for a technology (replace with real implementation)
def get_number_of_jobs(technology):
    # Replace with code to fetch actual number of jobs
    return 1000

# Create a workbook object
wb = Workbook()

# Use the active worksheet
ws = wb.active

# Add header row
ws.append(['Technology', 'Number of Jobs'])

# Iterate through technologies and find job postings
for technology in technologies_to_search:
    number_of_jobs = get_number_of_jobs(technology)
    ws.append([technology, number_of_jobs])

# Save the workbook with the specified name
wb.save("tech-job-postings.xlsx")

print("Technology job postings have been written to tech-job-postings.xlsx!")



Technology job postings have been written to tech-job-postings.xlsx!
