# [GitHub Jobs API](https://jobs.github.com/api)

- The GitHub Jobs API allows you to search, and view jobs with JSON over HTTP.
- The API also supports pagination. /positions.json, for example, will only return 50 positions at a time. 

- You can paginate results by adding a page parameter to your queries.

- Pagination starts by default at 0.

In [1]:
import requests
import pandas as pd
from w3lib.html import remove_tags

In [2]:
base_url = 'https://jobs.github.com/positions.json'
response = requests.get(base_url)
data = response.json()
len(data)  # first 50 job info is stored in data

50

In [3]:
# let's get the information of next 50 jobs
response = requests.get(base_url, params={'page': 2})
data2 = response.json()
len(data2)  # next 50 job info is stored in data2

50

In [4]:
# some API's inform us about the total number of pages, but this API does not
# so we manually have to figure out the last page number with job data
# the API does not return an error for page numbers exceeding the last page
# but it simply returns an empty JSON object
# so when the length of the JSON object is 0, we have reached the last page
page_num = 1
while True:
    response = requests.get(base_url, params={'page': page_num})
    if len(response.json()) == 0:
        last_page_num = page_num-1
        print(f"Last page with jobs data is: {last_page_num}")
        break
    page_num += 1

Last page with jobs data is: 3


In [5]:
df = pd.DataFrame(columns=['id',
                           'type',
                           'url',
                           'created_at',
                           'company',
                           'company_url',
                           'location',
                           'title',
                           'description',
                           'how_to_apply',
                           'company_logo'])

for page_num in range(1, last_page_num+1):
    response = requests.get(base_url, params={'page': page_num})
    data = response.json()
    df = df.append(data)

In [6]:
print(f"Total number of jobs: {len(df)}")

Total number of jobs: 133


In [7]:
df.head()

Unnamed: 0,id,type,url,created_at,company,company_url,location,title,description,how_to_apply,company_logo
0,2940d1d6-767a-44dd-814a-f8e923bac67c,Full Time,https://jobs.github.com/positions/2940d1d6-767...,Tue Jan 05 18:14:57 UTC 2021,Rigado,https://www.rigado.com/,"Portland, OR",Senior Software Engineer,<h3>What we do in engineering at Rigado:</h3>\...,"<p>Apply through our careers page at <a href=""...",https://jobs.github.com/rails/active_storage/b...
1,e5948998-64f6-4186-9bdb-bf71aeb5fc01,Full Time,https://jobs.github.com/positions/e5948998-64f...,Tue Jan 05 15:14:49 UTC 2021,ALDI Einkauf GmbH & Co. oHG,https://vonq.io/2Xe5qvf,Essen,Solution Architect (m/w/d),<h2>Das sind deine Aufgaben</h2>\n<ul>\n<li>Di...,"<p><a href=""https://vonq.io/2Xe5qvf"">Klicken S...",https://jobs.github.com/rails/active_storage/b...
2,6bb39f1f-18a1-45ec-b985-696e2f5a698b,Full Time,https://jobs.github.com/positions/6bb39f1f-18a...,Tue Jan 05 15:03:07 UTC 2021,advalyze GmbH,https://www.advalyze.com/de/,Berlin,Marketing Data Engineer (m/w/d),<p>advalyze steht für analyze | advice | adver...,"<p><a href=""https://advalyze.join.com/jobs/174...",https://jobs.github.com/rails/active_storage/b...
3,912fc53b-1b7f-427a-bc69-25f03b597f8c,Full Time,https://jobs.github.com/positions/912fc53b-1b7...,Fri Oct 30 15:33:27 UTC 2020,Defendify,https://www.defendify.io/,"Portland, ME (Remote OK)",Senior Full Stack Developer,<p>Thanks for your interest in working with us...,"<p>Apply now at <a href=""https://defendify.bre...",https://jobs.github.com/rails/active_storage/b...
4,b003b45c-6abb-4206-8a5c-171c6efb7d00,Full Time,https://jobs.github.com/positions/b003b45c-6ab...,Tue Jan 05 12:09:20 UTC 2021,dirico,https://dirico.io/,Koblenz,Fullstack Developer (m/w/d),"<p>Wir sind die 247GRAD Labs GmbH, das IT-Unte...","<p><a href=""https://t.gohiring.com/h/c5a40ff41...",https://jobs.github.com/rails/active_storage/b...


In [8]:
# we can see that there are some html tags in the description column and how_to_apply column in the data frame
# we can remove that using remove_tags function from w3lib.html library
df.description = df.description.apply(remove_tags)
df.how_to_apply = df.how_to_apply.apply(remove_tags)
df.head()

Unnamed: 0,id,type,url,created_at,company,company_url,location,title,description,how_to_apply,company_logo
0,2940d1d6-767a-44dd-814a-f8e923bac67c,Full Time,https://jobs.github.com/positions/2940d1d6-767...,Tue Jan 05 18:14:57 UTC 2021,Rigado,https://www.rigado.com/,"Portland, OR",Senior Software Engineer,What we do in engineering at Rigado:\nWe desig...,Apply through our careers page at https://www....,https://jobs.github.com/rails/active_storage/b...
1,e5948998-64f6-4186-9bdb-bf71aeb5fc01,Full Time,https://jobs.github.com/positions/e5948998-64f...,Tue Jan 05 15:14:49 UTC 2021,ALDI Einkauf GmbH & Co. oHG,https://vonq.io/2Xe5qvf,Essen,Solution Architect (m/w/d),Das sind deine Aufgaben\n\nDigitale Lösungen a...,Klicken Sie hier um zum Bewerbungsformular zu ...,https://jobs.github.com/rails/active_storage/b...
2,6bb39f1f-18a1-45ec-b985-696e2f5a698b,Full Time,https://jobs.github.com/positions/6bb39f1f-18a...,Tue Jan 05 15:03:07 UTC 2021,advalyze GmbH,https://www.advalyze.com/de/,Berlin,Marketing Data Engineer (m/w/d),advalyze steht für analyze | advice | advertis...,https://advalyze.join.com/jobs/1749662-marketi...,https://jobs.github.com/rails/active_storage/b...
3,912fc53b-1b7f-427a-bc69-25f03b597f8c,Full Time,https://jobs.github.com/positions/912fc53b-1b7...,Fri Oct 30 15:33:27 UTC 2020,Defendify,https://www.defendify.io/,"Portland, ME (Remote OK)",Senior Full Stack Developer,Thanks for your interest in working with us! D...,Apply now at https://defendify.breezy.hr/p/267...,https://jobs.github.com/rails/active_storage/b...
4,b003b45c-6abb-4206-8a5c-171c6efb7d00,Full Time,https://jobs.github.com/positions/b003b45c-6ab...,Tue Jan 05 12:09:20 UTC 2021,dirico,https://dirico.io/,Koblenz,Fullstack Developer (m/w/d),"Wir sind die 247GRAD Labs GmbH, das IT-Unterne...",application form\n,https://jobs.github.com/rails/active_storage/b...


In [9]:
# finally let's export the dataframe as a csv file for any further use
df.to_csv('github_jobs_data.csv',index=False)