# **Collecting Job Data Using APIs**


## Objectives


*   Collect job data from GitHub Jobs API
*   Store the collected data into an excel spreadsheet.


## Dataset Used


The dataset used in this lab comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom under the under a **Public Domain license**.


The original dataset is a csv. We have converted the csv to json.



## Lab: Collect Jobs Data using GitHub Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [12]:
#Import required libraries
import pandas as pd
import json


In [46]:
import requests

api_url = "http://127.0.0.1:5000/data"  # Ensure this is the correct URL

def get_number_of_jobs_T(technology):
    number_of_jobs = 0
    payload = {'Key Skills': technology}
    
    r = requests.get(api_url, params=payload)

    if r.ok:
        try:
            data = r.json()
            number_of_jobs=len(data)
        except ValueError:
            print("Response is not valid JSON.")
    

    return technology, number_of_jobs


Calling the function for Python and checking if it works.


In [48]:
get_number_of_jobs_T("Python")

1173


('Python', 1173)

In [64]:
 def get_number_of_jobs_L(location):
    number_of_jobs=0
    payload = {'Location': location}
    r = requests.get(api_url, params=payload)
    if r.ok:
        data=r.json()
        number_of_jobs=len(data)
        
    
    #your coe goes here
    return location,number_of_jobs

In [66]:
#your code goes here
get_number_of_jobs_L("Los Angeles")



('Los Angeles', 640)

In [70]:
#your code goes here

technologies=['Python','SQL', 'Power BI','Django','JavaScript']
!pip install openpyxl



Collecting openpyxl
  Downloading openpyxl-3.1.3-py2.py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.3/251.3 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.3


Import libraries required to create excel spreadsheet


In [None]:
from openpyxl import Workbook


In [None]:
wb= Workbook()
ws=wb.active

In [None]:
technologies=['Python','SQL', 'Power BI','Django','JavaScript']
wb= Workbook()
ws=wb.active
for technology in technologies:
    technology_name, number_of_jobs = get_number_of_jobs_T(technology)
    print(technology_name, number_of_jobs)
    ws.append([technology_name, number_of_jobs])

wb.save("github-job-postings.xlsx")
import pandas as pd

df = pd.read_excel('github-job-postings.xlsx')

print(df)
    

1173
Python 1173
SQL 0
Power BI 0
Django 0
355
JavaScript 355
       Python  1173
0         SQL     0
1    Power BI     0
2      Django     0
3  JavaScript   355


In [83]:
#your code goes here
wb.save("github-job-postings.xlsx")

Collecting the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [None]:
technologies = [
    "C", 
    "C#", 
    "C++", 
    "Java", 
    "JavaScript", 
    "Python", 
    "Scala", 
    "Oracle", 
    "SQL Server", 
    "MySQL Server", 
    "PostgreSQL", 
    "MongoDB"
]
wb= Workbook()
ws=wb.active
for technology in technologies:
    technology_name, number_of_jobs = get_number_of_jobs_T(technology)
    print(technology_name, number_of_jobs)
    ws.append([technology_name, number_of_jobs])

wb.save("github-job-postings2.xlsx")
import pandas as pd

df = pd.read_excel('github-job-postings.xlsx')

print(df)

13498
C 13498
333
C# 333
305
C++ 305
2609
Java 2609
355
JavaScript 355
1173
Python 1173
33
Scala 33
784
Oracle 784
250
SQL Server 250
0
MySQL Server 0
10
PostgreSQL 10
174
MongoDB 174
               C  13498
0             C#    333
1            C++    305
2           Java   2609
3     JavaScript    355
4         Python   1173
5          Scala     33
6         Oracle    784
7     SQL Server    250
8   MySQL Server      0
9     PostgreSQL     10
10       MongoDB    174


In [93]:
!git init

Reinitialized existing Git repository in /resources/DA0321EN/labs/module 1/Accessing Data Using APIs/.git/


## Authors


Ayushi Jain


### Other Contributors


Rav Ahuja

Lakshmi Holla

Malika


Copyright © 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?utm_medium=Exinfluencer\&utm_source=Exinfluencer\&utm_content=000026UJ\&utm_term=10006555\&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01\&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264\&cm_mmca1=000026UJ\&cm_mmca2=10006555\&cm_mmca3=M12345678\&cvosrc=email.Newsletter.M12345678\&cvo_campaign=000026UJ).


<!--## Change Log


<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
