# **Collecting Job Data Using APIs**


## Objectives


*   Collect job data from GitHub Jobs API
*   Store the collected data into an excel spreadsheet.


## Dataset Used in this Assignment

The dataset used comes from the following source: https://www.kaggle.com/promptcloud/jobs-on-naukricom .

> Note: We are using a modified subset of that dataset, rather than the dataset from the original source.

## Collect Jobs Data using GitHub Jobs API


### Objective: Determine the number of jobs currently open for various technologies  and for various locations


Collect the number of job postings for the following locations using the API:

* Los Angeles
* New York
* San Francisco
* Washington DC
* Seattle
* Austin
* Detroit


In [None]:
#Import required libraries
import pandas as pd
import json

#### Here is a function to get the number of jobs for the Python technology.<br>
  
 ##### The keys in the json are 
 * Job Title
 
 * Job Experience Required
 
 * Key Skills
 
 * Role Category
 
 * Location
 
 * Functional Area
 
 * Industry
 
 * Role 

In [14]:
api_url="http://127.0.0.1:5000/data/all"
def get_number_of_jobs_T(technology):
    
    response_api = requests.get(api_url)

    number_of_jobs = 0

    if response_api.ok:            
        jobs = response_api.json()

    for job in jobs:
        key = job.get('Key Skills')

        if key.find(technology) > -1 :
            number_of_jobs = number_of_jobs + 1

    number_of_jobs
    
    return technology,number_of_jobs

Calling the function for Python and checking if it works.


In [15]:
get_number_of_jobs_T("Python")

('Python', 1173)

#### Here is a function to find number of jobs in US for Los Angeles and to check if it is working.


In [None]:
def get_number_of_jobs_L(location):
    
    response_api = requests.get(api_url)

    number_of_jobs = 0

    if response_api.ok:            
        jobs = response_api.json()

    for job in jobs:
        loc = job.get('Location')

        if loc.find(location) > -1 :
            number_of_jobs = number_of_jobs + 1

    number_of_jobs
    return location,number_of_jobs

In [17]:
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 640)

### Store the results in an excel file


Collect the number of job postings for the specified locations and store the results in an Excel spreadsheet.

In [18]:
locations = ['Los Angeles','New York','San Francisco','Washington DC','Seattle','Austin','Detroit']

Import libraries required to create excel spreadsheet


In [20]:
!pip install openpyxl
from openpyxl import Workbook

Collecting openpyxl
  Downloading openpyxl-3.1.3-py2.py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.3/251.3 kB[0m [31m35.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.3


Create a workbook and select the active worksheet


In [21]:
wb1 = Workbook()
ws1 = wb1.active

Find the number of jobs postings for each of the technology in the above list.
Write the technology name and the number of jobs postings into the excel spreadsheet.


In [22]:
ws1.append(['Location','Number of Jobs'])

for i in range(len(locations)):
    ws1.append(get_number_of_jobs_L(locations[i]))

Save into an excel spreadsheet named '01-API_Job_Postings.xlsx'.


In [24]:
wb1.save('01-API_Job_Postings.xlsx')
wb1.close()

Collect the number of job postings for various programming languages and store the results in an Excel spreadsheet.

In [None]:
wb2 = Workbook()
ws2 = wb2.active

languages = ['C','C#','C++','Java','JavaScript','Python','Scala','Oracle','SQL Server','MySQL Server','PostgreSQL','MongoDB']

ws2.append(['Languages','Number of Jobs'])

for i in range(len(languages)):
    ws2.append(get_number_of_jobs_T(languages[i]))

wb2.save('01-API_Languages.xlsx')
wb2.close()

<!--| Date (YYYY-MM-DD) | Version | Changed By        | Change Description                 |
| ----------------- | ------- | ----------------- | ---------------------------------- | 
| 2022-01-19        | 0.3     | Lakshmi Holla        | Added changes in the markdown      |
| 2021-06-25        | 0.2     | Malika            | Updated GitHub job json link       |
| 2020-10-17        | 0.1     | Ramesh Sannareddy | Created initial version of the lab |--!>
