In [7]:
## Lab Task: Data Collection from RemoteOK

# As a data scientist, the first step in any project is to collect relevant and structured data. 
# In this exercise, your task is to extract job-related information from the RemoteOK job website: https://remoteok.com/r

# Objectives:
# - Collect the following data fields for each job posting:
#   - Company Name
#   - Job Role
#   - Location
#   - Features or Tags (e.g., technologies, benefits, job type)

# Instructions:
# - Use Python along with libraries such as requests, pandas, and optionally json or BeautifulSoup if needed.
# - Retrieve the job data from the RemoteOK API or web page.
# - Parse the JSON or HTML response to extract the required fields.
# - Store the collected data in a structured format such as CSV for future analysis.

# Output:
# A CSV file (e.g., remoteok_jobs.csv) containing all extracted job listings with the specified fields.

# This dataset will serve as the foundation for further data analysis and machine learning tasks in upcoming lab exercises.

%pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [14]:
import requests
import pandas as pd


In [15]:
resp = requests.get("https://remoteok.com/api")
resp.raise_for_status()
data = resp.json()

jobs = []
for job in data:
    # Skip metadata entries if any (sometimes the first item may be site metadata)
    if not isinstance(job, dict) or 'company' not in job:
        continue

    jobs.append({
        'company': job.get('company', ''),
        'role': job.get('position') or job.get('title', ''),
        'location': job.get('location', ''),
        'tags': ','.join(job.get('tags', []))
    })

df = pd.DataFrame(jobs)
df.to_csv('remoteok_jobs.csv', index=False)
print("Saved", len(df), "job listings to remoteok_jobs.csv")

Saved 98 job listings to remoteok_jobs.csv
