<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **Collecting Job Data Using APIs**


Estimated time needed: **30** minutes


## Objectives


After completing this lab, you will be able to:


*   Collect job data using Jobs API
*   Store the collected data into an excel spreadsheet.


><strong>Note: Before starting with the assignment make sure to read all the instructions and then move ahead with the coding part.</strong>


#### Instructions


To run the actual lab, firstly you need to click on the [Jobs_API](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Jobs_API.ipynb) notebook link. The file contains flask code which is required to run the Jobs API data.

Now, to run the code in the file that opens up follow the below steps.

Step1: Download the file. 

Step2: Upload the file into your current Jupyter environment using the upload button in your Jupyter interface. Ensure that the file is in the same folder as your working .ipynb file.

Step 2: If working in a local Jupyter environment, use the "Upload" button in your Jupyter interface to upload the Jobs_API notebook into the same folder as your current .ipynb file.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/Upload.PNG">

Step3:  Open the Jobs_API notebook, and run all the cells to start the Flask application. Once the server is running, you can access the API from the URL provided in the notebook.

If you want to learn more about flask, which is optional, you can click on this link [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/FLASK_API.md.html).

Once you run the flask code, you can start with your assignment.


## Dataset Used in this Assignment

The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.

> Note: We are using a modified subset of that dataset for the lab, so to follow the lab instructions successfully please use the dataset provided with the lab, rather than the dataset from the original source.

The original dataset is a csv. We have converted the csv to json as per the requirement of the lab.


## Warm-Up Exercise


Before you attempt the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.


Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>
You can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


In [1]:
import requests
import pandas as pd

In [2]:
api_url = "http://api.open-notify.org/astros.json" # this url gives use the astronaut data

In [3]:
response = requests.get(api_url) # Call the API using the get method and store the
                                # output of the API call in a variable called response.

In [4]:
if response.ok:             # if all is well() no errors, no network timeouts)
    data = response.json()  # store the result in json format in a variable called data
                            # the variable data is of type dictionary.

In [5]:
print(data)   # print the data just to check the output or for debugging

{'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': 'Tracy Caldwell Dyson'}, {'craft': 'ISS', 'name': 'Matthew Dominick'}, {'craft': 'ISS', 'name': 'Michael Barratt'}, {'craft': 'ISS', 'name': 'Jeanette Epps'}, {'craft': 'ISS', 'name': 'Alexander Grebenkin'}, {'craft': 'ISS', 'name': 'Butch Wilmore'}, {'craft': 'ISS', 'name': 'Sunita Williams'}, {'craft': 'Tiangong', 'name': 'Li Guangsu'}, {'craft': 'Tiangong', 'name': 'Li Cong'}, {'craft': 'Tiangong', 'name': 'Ye Guangfu'}], 'number': 12, 'message': 'success'}


Print the number of astronauts currently on ISS.


In [6]:
print(data.get('number'))

12


Print the names of the astronauts currently on ISS.


In [7]:
astronauts = data.get('people')
print("There are {} astronauts on ISS".format(len(astronauts)))
print("And their names are :")
for astronaut in astronauts:
    print(astronaut.get('name'))

There are 12 astronauts on ISS
And their names are :
Oleg Kononenko
Nikolai Chub
Tracy Caldwell Dyson
Matthew Dominick
Michael Barratt
Jeanette Epps
Alexander Grebenkin
Butch Wilmore
Sunita Williams
Li Guangsu
Li Cong
Ye Guangfu


Hope the warmup was helpful. Good luck with your next lab!


<h1>Lab: Collect Jobs Data using Jobs API</h1>


<h2> 🎯 Objective: Determine the number of jobs currently open for various technologies  and for various locations</h2>


<h3>🛠️Import all required Libraries</h3>

In [1]:
pip install openpyxl

Note: you may need to restart the kernel to use updated packages.


In [51]:
import pandas as pd
import json
import requests
import re  #pour eviter les erreurs de caracteres speciaux comme C++, C#
from openpyxl import Workbook  #pour creer des fichiers Excel

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json#### Write a function to get the number of jobs for the Python technology.<br>
> Note: While using the lab you need to pass the **payload** information for the **params** attribute in the form of **key** **value** pairs.
  Refer the ungraded **rest api lab** in the course **Python for Data Science, AI & Development**  <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/ungradedLti/P6sW8/hands-on-lab-access-rest-apis-request-http?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01">link</a>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 
 
You can also view  the json file contents  from the following <a href = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json">json</a> URL.



<h3>📥 Get the json file from the url and store data as a df</h3>

In [52]:
api_url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/Accessing%20Data%20Using%20APIs/jobs.json"

In [53]:
#get the json file from the url and check the status code (expected 200)
response = requests.get(api_url)
response.status_code

200

In [54]:
#read data and convert into a df
data = response.json()
df = pd.DataFrame(data)

In [55]:
#check the head of df
df.head()

Unnamed: 0,Id,Job Title,Job Experience Required,Key Skills,Role Category,Location,Functional Area,Industry,Role
0,0,Digital Media Planner,5 - 10 yrs,Media Planning| Digital Media,Advertising,Los Angeles,"Marketing , Advertising , MR , PR , Media Plan...","Advertising, PR, MR, Event Management",Media Planning Executive/Manager
1,1,Online Bidding Executive,2 - 5 yrs,pre sales| closing| software knowledge| client...,Retail Sales,New York,"Sales , Retail , Business Development","IT-Software, Software Services",Sales Executive/Officer
2,2,Trainee Research/ Research Executive- Hi- Tech...,0 - 1 yrs,Computer science| Fabrication| Quality check| ...,R&D,San Francisco,"Engineering Design , R&D","Recruitment, Staffing",R&D Executive
3,3,Technical Support,0 - 5 yrs,Technical Support,Admin/Maintenance/Security/Datawarehousing,Washington DC,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Technical Support Engineer
4,4,Software Test Engineer -hyderabad,2 - 5 yrs,manual testing| test engineering| test cases| ...,Programming & Design,Boston,IT Software - QA & Testing,"IT-Software, Software Services",Testing Engineer


<h3>🧪Create functions to fetch data and test them</h3>

➡️Write a function to find number of jobs in US for a technology of your choice

In [56]:
def get_number_of_jobs_T(technology, df):
    # Échapper les caractères spéciaux dans la technologie
    escaped_tech = re.escape(technology)

    # Compter les occurrences dans la colonne 'Key Skills'
    number_of_jobs = df['Key Skills'].str.contains(escaped_tech, na=False, regex=True).sum()

    #fetch data
    return technology, number_of_jobs

>✅Calling the function for 'Python' techonology and checking if it works


In [57]:
technology, number_of_jobs = get_number_of_jobs_T("Python", df)
print(f"The number of jobs requiring {technology} is : {number_of_jobs} jobs")

The number of jobs requiring Python is : 1173 jobs


➡️Write a function to find number of jobs in US for a location of your choice


In [58]:
 def get_number_of_jobs_L(location, df):
    # Compter les occurrences de la technologie dans la colonne 'Location'
    number_of_jobs = df['Location'].str.contains(location, na=False).sum()
    #fetch data
    return location, number_of_jobs

>✅Call the function for 'Los Angeles' location and check if it works


In [59]:
location, number_of_jobs = get_number_of_jobs_L('Los Angeles', df)
print(f"The number of jobs in {location} is : {number_of_jobs} jobs")

The number of jobs in Los Angeles is : 640 jobs


<h3>🔍Collect data from Key Skills / Technologies and store in a xls worksheet</h3>

➡️Create a python list of all technologies for which we need to find the number of jobs postings


*In the similar way, you can try for below given technologies and results  can be stored in an excel sheet.*

*Collect the number of job postings for the following languages using the API:*

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [60]:
#verifier si Key Skills contient des variations de MySQL Server car la boucle ne retourne aucune donnee pour MySQL Server ce qui est tres surprenant
df[df['Key Skills'].str.contains('MySQL', na=False, regex=False)]
#update de la liste des technologies pour remplacer MySQL Server par MySQL

Unnamed: 0,Id,Job Title,Job Experience Required,Key Skills,Role Category,Location,Functional Area,Industry,Role
52,60,Oracle DBA Consultant - Data Guard/rac Modules,2 - 5 yrs,WebLogic| configuration| MySQL| installation| ...,Admin/Maintenance/Security/Datawarehousing,Washington DC,"IT Software - DBA , Datawarehousing","IT-Software, Software Services",DBA
76,87,PHP Developer,1 - 5 yrs,XML| Javascript| PHP| development| css| techni...,Programming & Design,Washington DC,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Software Developer
81,92,PHP Developer (wordpress & Woo Commerce/shopif...,2 - 5 yrs,C| Woocommerce| Magento| Wordpress| MySQL| PHP...,Programming & Design,Los Angeles,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Software Developer
95,107,PHP Developer Jobs In Florida - Team Lead - PHP,8 - 13 yrs,Drupal| Application programming| MySQL| Wordpr...,Programming & Design,Los Angeles,"IT Software - Application Programming , Mainte...","Recruitment, Staffing",Software Developer
156,174,Magento Developer,2 - 7 yrs,Unix| Version control| Prototype| XML| MySQL| ...,Programming & Design,Los Angeles,IT Software - System Programming,"BPO, Call Centre, ITeS",Software Developer
...,...,...,...,...,...,...,...,...,...
26842,29821,PHP Developer,5 - 10 yrs,PHP| Javascript| MySQL| Ajax| jQuery| Joomla| ...,Programming & Design,Detroit,"IT Software - Application Programming , Mainte...","Recruitment, Staffing",Software Developer
26867,29848,American Technology Consulting - Node js Engineer,2 - 5 yrs,Linux| MySQL| SQL| Cloud computing| Backend| N...,Programming & Design,Detroit,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Software Developer
26927,29917,Senior Developer (PHP),6 - 11 yrs,Linux| MySQL| PHP| jQuery| Javascript| develop...,Programming & Design,Detroit,"IT Software - Application Programming , Mainte...","IT-Software, Software Services",Software Developer
26990,29984,SoC Verification Engineers,2 - 7 yrs,Technical product configuration| design| integ...,Programming & Design,Detroit,"IT Software - Embedded , EDA , VLSI , ASIC , C...","IT-Software, Software Services",Team Lead/Technical Lead


In [61]:
technologies = ['C','C#','C++','Python', 'Java', 'Scala', 'JavaScript','Oracle','MySQL','SQL Server','PostgreSQL','MongoDB']

➡️Store the results in an excel spreadsheet

*Call the API for all the given technologies above and write the results in an excel spreadsheet.*

*Find the number of jobs postings for each of the technology in the above list.*
*Write the technology name and the number of jobs postings into the excel spreadsheet.*

In [62]:
wb=Workbook()
ws_tech=wb.active
ws_tech.title = 'Technologies'
#add a row with two columns Technology and Number of Jobs
ws_tech.append(['Technology', 'Number of Jobs'])

In [63]:
# Boucle pour obtenir le nombre de jobs pour chaque technologie de la liste
for technology in technologies:
    language, number_of_jobs = get_number_of_jobs_T(technology, df)
    #print pour verifier la boucle
    print(f"Ajout de : {language}, {number_of_jobs}")
    #ajouter chaque tech/number of job distinct dans la ws
    ws_tech.append([language, number_of_jobs])
print("Technologies worksheet created into the workbook. ")

Ajout de : C, 13498
Ajout de : C#, 333
Ajout de : C++, 305
Ajout de : Python, 1173
Ajout de : Java, 2609
Ajout de : Scala, 33
Ajout de : JavaScript, 355
Ajout de : Oracle, 784
Ajout de : MySQL, 751
Ajout de : SQL Server, 250
Ajout de : PostgreSQL, 10
Ajout de : MongoDB, 174
Technologies worksheet created into the workbook. 


<h3>🔍Collect data from Location and store in a xlsx worksheet</h3>

➡️Create a python list for all Location for which you need to find the number of jobs postings

*Collect the number of job postings for the following locations using the API:*

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [64]:
locations = ['Los Angeles', 'New York', 'San Francisco','Washington DC','Seattle','Austin','Detroit']

➡️Store the results in an excel spreadsheet

In [65]:
# Créer une nouvelle feuille pour les "Locations"
ws_loc = wb.create_sheet(title='Locations')
#add a row with two columns Location and Number of Jobs
ws_loc.append(['Location','Number of Jobs'])

In [66]:
# Boucle pour obtenir le nombre de jobs pour chaque location de la liste
for location in locations:
    city, number_of_jobs = get_number_of_jobs_L(location, df)
    #print pour verifier la boucle
    print(f"Ajout de : {city}, {number_of_jobs}")
    #ajouter chaque loc/job distincte en tant que ligne dans la ws
    ws_loc.append([city, number_of_jobs])
print("Locations worksheet created into the workbook. ")

Ajout de : Los Angeles, 640
Ajout de : New York, 3226
Ajout de : San Francisco, 435
Ajout de : Washington DC, 5316
Ajout de : Seattle, 3375
Ajout de : Austin, 434
Ajout de : Detroit, 3945
Locations worksheet created into the workbook. 


<h3>🚀📁Save both worksheets into an excel spreadsheet named job-postings.xlsx</h3>


In [67]:
# Sauvegarder le fichier Excel
wb.save('job-postings.xlsx')
print("Excel sheet file created. ")

Excel sheet file created. 


## Authors


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


Copyright © IBM Corporation.


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
