# Collecting Jobs Data Using GitHub Jobs API

**Objective: Determine the number of jobs currently open for various technologies and for various locations**

Collect the number of job postings for the following locations using the API:
 - Los Angeles
 - New York
 - San Francisco
 - Washington DC
 - Seattle
 - Austin
 - Detroit
 
Collect the number of job postings for the following languages using the API:

*   C
*   C#
*   C++
*   Java
*   JavaScript
*   Python
*   Scala
*   Oracle
*   SQL Server
*   MySQL Server
*   PostgreSQL
*   MongoDB


In [1]:
#Import required libraries
import pandas as pd
import json
import requests

## Function to get the number of jobs for the Python technology

In [2]:
# Define the API URL
api_url = "http://127.0.0.1:5000/data"

def get_number_of_jobs_T(technology):
    # Initialize the count of jobs
    number_of_jobs = 0
    
    # Define the payload for the API request
    payload = {'Key Skills': technology}
    
    # Send the GET request to the API
    response = requests.get(api_url, params=payload)
    
    # Check if the response is successful
    if response.status_code == 200:
        # Convert the response to JSON
        data = response.json()
        
        # Extract the number of jobs for the given technology
        number_of_jobs = len(data)
    
    # Return the technology and the number of jobs
    return technology, number_of_jobs


In [3]:
get_number_of_jobs_T("Python")

('Python', 1173)

## Function to find number of jobs in US for a location listed above

In [4]:
 def get_number_of_jobs_L(location):
    
    payload={"Location":location}
    response=requests.get(api_url,params=payload)
    
    if response.ok:
        data=response.json()
        number_of_jobs = len(data)
    
    return location,number_of_jobs

In [5]:
get_number_of_jobs_L("Los Angeles")

('Los Angeles', 640)

## **Storing the results in an excel file**

List of all technologies for which to find the number of jobs postings

In [6]:
technologies = ["C", "C#", "C++", "Java", "JavaScript", 
                "Python", "Scala", "Oracle", "SQL Server",
               "MySQL Server", "PostgreSQL", "MongoDB"]

Import libraries required to create excel spreadsheet

In [7]:
from openpyxl import Workbook

Create a workbook and select the active worksheet

In [8]:
wb=Workbook() # create a workbook worksheet                     
ws=wb.active  # use the active worksheet
ws.append(['Technology', 'Number of Jobs'])

Find the number of jobs postings for each of the technology in the above list. Write the technology name and the number of jobs postings into the excel spreadsheet.

In [11]:
for tech in technologies:
    print(get_number_of_jobs_T(tech))
    ws.append(get_number_of_jobs_T(tech))

('C', 13498)
('C#', 333)
('C++', 305)
('Java', 2609)
('JavaScript', 355)
('Python', 1173)
('Scala', 33)
('Oracle', 784)
('SQL Server', 250)
('MySQL Server', 0)
('PostgreSQL', 10)
('MongoDB', 174)


Save into an excel spreadsheet

In [10]:
wb.save(r"C:\Users\Steven\OneDrive\Documents\Excel\Datasets\github-job-postings.xlsx")

**Display data in a DataFrame**

In [13]:
df = pd.read_excel(r"C:\Users\Steven\OneDrive\Documents\Excel\Datasets\github-job-postings.xlsx")
df

Unnamed: 0,Technology,Number of Jobs
0,C,13498
1,C#,333
2,C++,305
3,Java,2609
4,JavaScript,355
5,Python,1173
6,Scala,33
7,Oracle,784
8,SQL Server,250
9,MySQL Server,0


In [21]:
import plotly.express as px

df_sorted = df.sort_values(by='Number of Jobs', ascending=False)

fig = px.bar(df_sorted, x='Technology', y='Number of Jobs', text_auto=True,
             title='Job Openings for Different Programming Languages')
fig.update_traces(textposition='outside')
fig.show()