<a href="https://colab.research.google.com/github/Sara-Soliman/RemoteOkScraper/blob/main/Job_Listings_Web_Scraper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RemoteOK Web Scraper
### Real-Time Remote Job Listings with Python, Selenium & BeautifulSoup

This project scrapes [RemoteOK.com](https://remoteok.com) to extract remote job listings in real time. Using `Selenium` to handle JavaScript and `BeautifulSoup` for parsing, it collects:

- ✅ Job Title
- ✅ Company
- ✅ Tags/Skills
- ✅ Posted Date
- ✅ Direct Job Link

Results are saved to Excel for future use, business insight, or job hunting.

---



**Libraries and Setup**

In [13]:
# Install requirements (Colab only)
!apt-get update
!apt install chromium-chromedriver
!pip install selenium
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

0% [Working]            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
0% [Connecting to archive.ubuntu.com (185.125.190.82)] [Waiting for headers] [C                                                                               Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as reposit

**Imports and Selenium Setup**

In [15]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import time

**Web Driver Configuration & Page Load**

In [16]:
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)
driver.get("https://remoteok.com/")
time.sleep(5)  # Wait for JS to load

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()

**Extracting Job Listings**

In [17]:
jobs = []
job_rows = soup.find_all("tr", class_="job")

for row in job_rows:
    try:
        title_tag = row.find("h2")
        company_tag = row.find("h3")
        tags = row.find_all("div", class_="tag")
        date_epoch = row.get("data-epoch")
        link = row.get("data-href")

        if not title_tag or not company_tag or not link:
            continue

        title = title_tag.text.strip()
        company = company_tag.text.strip()
        tag_list = [tag.text.strip() for tag in tags]

        # Convert date
        date = datetime.fromtimestamp(int(date_epoch)).strftime("%Y-%m-%d") if date_epoch else "N/A"

        jobs.append({
            "Title": title,
            "Company": company,
            "Tags": ", ".join(tag_list),
            "Posted Date": date,
            "Link": f"https://remoteok.com{link}"
        })

    except:
        continue

**Export to Excel**

In [18]:
df = pd.DataFrame(jobs)
df.to_excel("remoteok_all_jobs.xlsx", index=False)
df.head()

Unnamed: 0,Title,Company,Tags,Posted Date,Link
0,Senior Fullstack Software Engineer,Blotato,"Developer, JavaScript, Typescript, Heroku, AWS",2025-06-09,https://remoteok.com/remote-jobs/remote-senior...
1,Typescript Engineer,wander.com,Developer,2025-05-26,https://remoteok.com/remote-jobs/remote-typesc...
2,Virtual Assistant $25 Hourly,Brookview Lawncare,Customer Support,2025-06-12,https://remoteok.com/remote-jobs/remote-virtua...
3,Registered Dietitian,Foodsmart,"System, Founder, Support, Financial, Video, He...",2025-06-16,https://remoteok.com/remote-jobs/remote-regist...
4,Market Research Executive,Sprinto,"Security, Growth, Investment, Operational, Sal...",2025-06-16,https://remoteok.com/remote-jobs/remote-market...


**Downloading File (Colab Only)**

In [20]:
from google.colab import files
files.download("remoteok_all_jobs.xlsx")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**Conclusion**
- Successfully scraped dynamic content from RemoteOK using Selenium.
- Extracted and cleaned job data, saved in Excel format.
- Ready to adapt for any job board or use case (e.g. email alerts, filtering, cloud storage).

---

**Credits**

- [RemoteOK](https://remoteok.com) for public job listings
- [Selenium](https://selenium.dev) & [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/)