# Remote Work Job Scraper: Remotive.com API

This Jupyter Notebook demonstrates how to fetch real remote job postings using the [Remotive.com Public API](https://remotive.com/remote-jobs/api). Utilizing an API is often the most robust and polite method for collecting data from websites, as it bypasses the complexities of HTML parsing and anti-scraping measures associated with traditional web scraping.

## 1. Setup and Configuration

We begin by importing the necessary Python libraries: `requests` for making HTTP requests to the API, `pandas` for data manipulation, and `time` for potential delays (though less critical with APIs, it's good practice if rate limits are strict).

In [1]:
import requests
import pandas as pd
import time
from IPython.display import display, Markdown

## 2. API Data Fetching Function (`scrape_remotive_api`)

This function is designed to interact with the Remotive API. It constructs the API request with optional parameters for filtering by category.

In [2]:
def scrape_remotive_api(category=None, search=None, limit=None):
    base_api_url = "https://remotive.com/api/remote-jobs"
    params = {}
    if category:
        params['category'] = category
    if search:
        params['search'] = search
    if limit:
        params['limit'] = limit        
    job_listings = []
    print(f"Starting to fetch jobs from Remotive API with parameters: {params}")
    try:
        response = requests.get(base_api_url, params=params)
        response.raise_for_status()  # Raise an exception for HTTP errors
        data = response.json()

        if 'jobs' in data:
            for job in data['jobs']:
                job_data = {
                    "Job ID": job.get("id"),
                    "Job Title": job.get("title"),
                    "Company Name": job.get("company_name"),
                    "Publication Date": job.get("publication_date"),
                    "Job Type": job.get("job_type"),
                    "Category": job.get("category"),
                    "Candidate Required Location": job.get("candidate_required_location"),
                    "Salary Range": job.get("salary"),
                    "Job Description": job.get("description"),
                    "Source URL": job.get("url"),
                    "Company Logo": job.get("company_logo"),
                    "Job Board": "Remotive.com"
                }
                job_listings.append(job_data)
        else:
            print("No 'jobs' key found in the API response.")

    except requests.exceptions.RequestException as e:
        print(f"Error during request to Remotive API: {e}")
    except ValueError as e:
        print(f"Error decoding JSON response from Remotive API: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    df = pd.DataFrame(job_listings)
    return df    

## 3. Execution and Data Storage

This section demonstrates how to call the `scrape_remotive_api` function, retrieve job data, and save it to a CSV file. For this example, we'll fetch 50 jobs from the 'Software Development' category.

In [3]:
if __name__ == "__main__":

 # Example usage: Fetch 500 software development jobs
 scraped_data = scrape_remotive_api(category="software-dev", limit=500)
 if not scraped_data.empty:
    output_filename = "remotive_jobs.csv"
    scraped_data.to_csv(output_filename, index=False)
    print(f"Scraped data saved to {output_filename}")
    display(Markdown(f"### Sample of Scraped Data from Remotive API"))
    display(scraped_data.head())
 else:
    print("No job listings were fetched from Remotive API. Please check the logs above for errors.")
    display(Markdown( "### No Data Fetched."))
    display(Markdown("API call did not return any job listings. Review the console output for error messages or check the API documentation for valid parameters."))

Starting to fetch jobs from Remotive API with parameters: {'category': 'software-dev', 'limit': 500}
Scraped data saved to remotive_jobs.csv


### Sample of Scraped Data from Remotive API

Unnamed: 0,Job ID,Job Title,Company Name,Publication Date,Job Type,Category,Candidate Required Location,Salary Range,Job Description,Source URL,Company Logo,Job Board
0,2069789,Shopify Developer,Tidal Commerce,2025-10-16T18:50:24,part_time,Software Development,"LATAM, India",,"<p style=""font-weight:400;margin:0px;line-heig...",https://remotive.com/remote-jobs/software-dev/...,https://remotive.com/job/2069789/logo,Remotive.com
1,2069788,Principal Software Engineer,Cint,2025-10-16T18:50:23,full_time,Software Development,Brazil,,"<br><br><div class=""h3"">Company Description</d...",https://remotive.com/remote-jobs/software-dev/...,https://remotive.com/job/2069788/logo,Remotive.com
2,2069786,Product Engineer,Dataline nv,2025-10-16T18:50:22,full_time,Software Development,Belgium,,<p>We are a leading provider of enterprise res...,https://remotive.com/remote-jobs/software-dev/...,https://remotive.com/job/2069786/logo,Remotive.com
3,2062716,Technical Architect,Tether,2025-10-16T14:31:37,full_time,Software Development,UK,,"<p class=""sc-1fwbcuw-0 kbZOmE""> </p><p class=""...",https://remotive.com/remote-jobs/software-dev/...,https://remotive.com/job/2062716/logo,Remotive.com
4,2069850,Bilingual (English and Indonesian/Portuguese) ...,Mercor,2025-10-16T14:27:42,contract,Software Development,USA,$30 - $55 usd hourly,<i>\n This description is a summary of our und...,https://remotive.com/remote-jobs/software-dev/...,https://remotive.com/job/2069850/logo,Remotive.com


## 4. API Usage Best Practices and Considerations

Using an API for data collection offers several advantages over traditional web scraping:

### Reliability
APIs provide a stable and structured way to access data. They are less prone to breaking due to website design changes, which is a common challenge with HTML parsing

### Efficiency
APIs typically return data in a machine-readable format (like JSON), which is much faster and easier to parse than HTML. This reduces development time and computational resources.

### Politeness and Rate Limits
API providers usually have clear guidelines on usage and rate limits. Adhering to these limits (e.g., by introducing `time.sleep()` if making multiple sequential calls) ensures continued access and good standing with the service provider. Remotive's API documentation advises a maximum of 4 requests per day for general data, and excessive requests (more than 2x per minute) will be blocked.

### Data Quality
Data obtained through an API is often cleaner and more consistent, as it's directly provided by the source in a structured format, reducing the need for extensive data cleaning and transformation.

### Terms of Service
Always review the API's terms of service. For Remotive, it's crucial to link back to the original job URL and mention Remotive as the source if you share their job listings. Also, note that jobs displayed via the public API are delayed by 24 hours.

### Integration into BI Projects
For a full BI project, the data fetched via this API would be a crucial input. It would then be loaded into a data warehouse, transformed, enriched, and used to build interactive dashboards (e.g., in Power BI) for trend analysis, market insights, and data-driven decision-making regarding remote work trends.