# OPT-Friendly CS Internship Finder (jobspy + Python)

This notebook scrapes CS-related internship postings from job sites using **jobspy**
and filters for roles that appear **OPT-friendly** based on keyword heuristics.

> ⚠️ **Disclaimer**  
> - This does **not** guarantee OPT eligibility or visa sponsorship.  
> - Always verify details on the company's careers page and with recruiters.  
> - Scraping may be subject to each site's Terms of Service; use responsibly.


In [1]:
!pip install -U python-jobspy

Collecting python-jobspy
  Downloading python_jobspy-1.1.82-py3-none-any.whl.metadata (10 kB)
Collecting NUMPY==1.26.3 (from python-jobspy)
  Downloading numpy-1.26.3-cp312-cp312-macosx_10_9_x86_64.whl.metadata (61 kB)
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from python-jobspy)
  Downloading beautifulsoup4-4.14.3-py3-none-any.whl.metadata (3.8 kB)
Collecting markdownify<0.14.0,>=0.13.1 (from python-jobspy)
  Downloading markdownify-0.13.1-py3-none-any.whl.metadata (8.5 kB)
Collecting pandas<3.0.0,>=2.1.0 (from python-jobspy)
  Downloading pandas-2.3.3-cp312-cp312-macosx_10_13_x86_64.whl.metadata (91 kB)
Collecting pydantic<3.0.0,>=2.3.0 (from python-jobspy)
  Downloading pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
Collecting regex<2025.0.0,>=2024.4.28 (from python-jobspy)
  Downloading regex-2024.11.6-cp312-cp312-macosx_10_13_x86_64.whl.metadata (40 kB)
Collecting requests<3.0.0,>=2.31.0 (from python-jobspy)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Colle

In [2]:
import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "google"], #"zip_recruiter", "glassdoor", "bayt", "naukri", "bdjobs"
    search_term="\"software engineering intern\" -sales",
    google_search_term="software internship jobs in the US since last week",
    location="USA",
    results_wanted=20,
    hours_old=100,
    country_indeed='USA',

    # linkedin_fetch_description=True # gets more info such as description, direct job url (slower)
    # proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel

2025-12-03 07:12:38,526 - INFO - JobSpy:Linkedin - finished scraping


Found 40 jobs
                    id    site  \
0  in-5102d0c18675267d  indeed   
1  in-2367541993d49d2f  indeed   
2  in-b8f5bf834d5a63ac  indeed   
3  in-1870b2ed5917295a  indeed   
4  in-9c53e8847318341d  indeed   

                                             job_url  \
0  https://www.indeed.com/viewjob?jk=5102d0c18675...   
1  https://www.indeed.com/viewjob?jk=2367541993d4...   
2  https://www.indeed.com/viewjob?jk=b8f5bf834d5a...   
3  https://www.indeed.com/viewjob?jk=1870b2ed5917...   
4  https://www.indeed.com/viewjob?jk=9c53e8847318...   

                                      job_url_direct  \
0  https://jobs.disneycareers.com/job/orlando/sof...   
1  https://jobs.disneycareers.com/job/lake-buena-...   
2  https://click.appcast.io/t/0F9M0CFwFVdSHnyd6hJ...   
3  https://wisk.wd108.myworkdayjobs.com/en-US/Wis...   
4  https://jobs.ashbyhq.com/telus-digital/4486bd5...   

                                            title                  company  \
0        Software Engineering

In [3]:
import pandas as pd
from jobspy import scrape_jobs

def contains_any(text: str, keywords: list[str]) -> bool:
    """Return True if any keyword appears in the given text (case-insensitive)."""
    if text is None:
        return False
    text = text.lower()
    return any(k in text for k in keywords)

def score_text(text: str, good_keywords: list[str], bad_keywords: list[str]) -> int:
    """Compute a simple OPT-friendliness score: (# good hits) - (# bad hits)."""
    if text is None:
        text = ""
    text = text.lower()
    score = 0
    for k in good_keywords:
        score += text.count(k)
    for k in bad_keywords:
        score -= text.count(k)
    return score

In [5]:
# === Search configuration ===

SEARCH_TERMS = [
    "computer science intern",
    "software engineer intern",
    "software developer intern",
    "data science intern",
]

# jobspy-supported sites; you can add/remove depending on what works for you
SITES = [
    "indeed",
    "linkedin",
    # "zip_recruiter",
    # "glassdoor",
]

LOCATION = "United States"  # e.g. "United States", "Remote", "Boston, MA"
RESULTS_PER_SITE = 150       # number of results per site per search term
HOURS_OLD = 168              # limit to last 7 days (168 hours)

# === Keyword heuristics for OPT friendliness ===

GOOD_KEYWORDS = [
    "opt",
    "cpt",
    "stem opt",
    "f1",
    "f-1",
    "visa sponsorship",
    "sponsorship available",
    "sponsor visas",
    "h-1b",
    "h1b",
    "international students",
]

BAD_KEYWORDS = [
    "us citizens only",
    "u.s. citizens only",
    "must be a us citizen",
    "citizen only",
    "no sponsorship",
    "cannot sponsor",
    "unable to sponsor",
    "not provide sponsorship",
    "gc or citizen only",
    "green card or citizen only",
]

print("Configuration loaded.")

Configuration loaded.


In [6]:
all_jobs = []

for site in SITES:
    for term in SEARCH_TERMS:
        print(f"Scraping {site} for '{term}' in {LOCATION} (last {HOURS_OLD} hours)...")
        try:
            jobs_df = scrape_jobs(
                site_name=site,
                search_term=term,
                location=LOCATION,
                results_wanted=RESULTS_PER_SITE,
                hours_old=HOURS_OLD,
                country_indeed="USA",  # relevant for Indeed
            )
            jobs_df["site"] = site
            jobs_df["search_term"] = term
            all_jobs.append(jobs_df)
            print(f"  -> Retrieved {len(jobs_df)} results.")
        except Exception as e:
            print(f"  !! Error scraping {site} for '{term}': {e}")

if not all_jobs:
    raise RuntimeError("No jobs retrieved. Try changing sites, search terms, or HOURS_OLD.")

raw_df = pd.concat(all_jobs, ignore_index=True)
print(f"\nTotal raw jobs collected: {len(raw_df)}")

raw_df.head()

Scraping indeed for 'computer science intern' in United States (last 168 hours)...
  -> Retrieved 150 results.
Scraping indeed for 'software engineer intern' in United States (last 168 hours)...
  -> Retrieved 150 results.
Scraping indeed for 'software developer intern' in United States (last 168 hours)...
  -> Retrieved 90 results.
Scraping indeed for 'data science intern' in United States (last 168 hours)...
  -> Retrieved 150 results.
Scraping linkedin for 'computer science intern' in United States (last 168 hours)...
  -> Retrieved 140 results.
Scraping linkedin for 'software engineer intern' in United States (last 168 hours)...
  -> Retrieved 130 results.
Scraping linkedin for 'software developer intern' in United States (last 168 hours)...
  -> Retrieved 120 results.
Scraping linkedin for 'data science intern' in United States (last 168 hours)...
  -> Retrieved 140 results.

Total raw jobs collected: 1070


  raw_df = pd.concat(all_jobs, ignore_index=True)


Unnamed: 0,id,site,job_url,job_url_direct,title,company,location,date_posted,job_type,salary_source,...,company_num_employees,company_revenue,company_description,skills,experience_range,company_rating,company_reviews_count,vacancy_count,work_from_home_type,search_term
0,in-e59eb8cccdc25689,indeed,https://www.indeed.com/viewjob?jk=e59eb8cccdc2...,https://vhr-epri.wd1.myworkdayjobs.com/en-US/e...,Energy Supply Student Internship: Fusion Stude...,Electric Power Research Institute,"Charlotte, NC, US",2025-12-03,fulltime,direct_data,...,"1,001 to 5,000",$500M to $1B (USD),,,,,,,,computer science intern
1,in-6c67e32eed5ed00e,indeed,https://www.indeed.com/viewjob?jk=6c67e32eed5e...,https://jobs.disneycareers.com/job/bay-lake/da...,"Data Product Intern, Summer 2026",The Walt Disney Company,"Bay Lake, FL, US",2025-12-03,internship,,...,"10,000+",more than $10B (USD),Synonymous with quality entertainment and cutt...,,,,,,,computer science intern
2,in-5102d0c18675267d,indeed,https://www.indeed.com/viewjob?jk=5102d0c18675...,https://jobs.disneycareers.com/job/orlando/sof...,"Software Engineering Intern, Summer 2026",The Walt Disney Company,"Orlando, FL, US",2025-12-03,internship,,...,"10,000+",more than $10B (USD),Synonymous with quality entertainment and cutt...,,,,,,,computer science intern
3,in-2367541993d49d2f,indeed,https://www.indeed.com/viewjob?jk=2367541993d4...,https://jobs.disneycareers.com/job/lake-buena-...,"Software Engineering Intern, Summer 2026",The Walt Disney Company,"Lake Buena Vista, FL, US",2025-12-03,internship,,...,"10,000+",more than $10B (USD),Synonymous with quality entertainment and cutt...,,,,,,,computer science intern
4,in-ed9df620ea0f87bc,indeed,https://www.indeed.com/viewjob?jk=ed9df620ea0f...,https://careers.hfsinclair.com/job/Dallas-Inte...,"Intern, Internal Audit Technology",Hf Sinclair,"Dallas, TX, US",2025-12-03,internship,,...,,,,,,,,,,computer science intern


In [7]:
raw_df.value_counts("title")


title
Software Engineer Intern                                        35
Robotics - Software Development Engineer Intern/Co-op - 2026    23
Software Engineering Intern                                     19
Software Engineering Internship - Summer 2026                   18
Data Scientist                                                  16
                                                                ..
Industrial AI Internship                                         1
IT Systems Analyst Intern - Summer 2026                          1
IT Software Development Intern                                   1
IT Service Delivery Manager                                      1
iOS Intern                                                       1
Name: count, Length: 631, dtype: int64

In [11]:
df = raw_df.copy()

# Create a combined text field to search for OPT-related keywords
description_col = "description" if "description" in df.columns else None
snippet_col = "snippet" if "snippet" in df.columns else None

# Start with the title, filling any NaN values with an empty string
combined_text_series = df["title"].fillna("").astype(str)

# Concatenate description if available
if description_col and not df[description_col].isnull().all(): # Check if column exists and is not entirely null
    combined_text_series = combined_text_series.str.cat(df[description_col].fillna("").astype(str), sep=" ")

# Concatenate snippet if available
if snippet_col and not df[snippet_col].isnull().all(): # Check if column exists and is not entirely null
    combined_text_series = combined_text_series.str.cat(df[snippet_col].fillna("").astype(str), sep=" ")

df["search_text"] = combined_text_series.str.lower()

# Filter for internships explicitly (job title contains "intern" or "co-op")
intern_mask = df["title"].str.lower().str.contains("intern|co-op|co op", na=False)

df_interns = df[intern_mask].copy()
print(f"Internship-like roles: {len(df_interns)}")

df_interns[["title", "company", "location", "site"]].head(10)

Internship-like roles: 879


Unnamed: 0,title,company,location,site
0,Energy Supply Student Internship: Fusion Stude...,Electric Power Research Institute,"Charlotte, NC, US",indeed
1,"Data Product Intern, Summer 2026",The Walt Disney Company,"Bay Lake, FL, US",indeed
2,"Software Engineering Intern, Summer 2026",The Walt Disney Company,"Orlando, FL, US",indeed
3,"Software Engineering Intern, Summer 2026",The Walt Disney Company,"Lake Buena Vista, FL, US",indeed
4,"Intern, Internal Audit Technology",Hf Sinclair,"Dallas, TX, US",indeed
5,AI Assisted Software Developer (Intern / Entry...,Maxus33,US,indeed
6,AI Assisted Software Developer (Intern / Entry...,Maxus33,"Remote, US",indeed
7,PLC Programming intern,Schneider Electric,"Raleigh, NC, US",indeed
8,IT Integration Intern,Schneider Electric,"Raleigh, NC, US",indeed
9,AI Engineer Intern,Promega Corporation,"Madison, WI, US",indeed


In [12]:
# Apply positive and negative keyword filters on search_text
good_mask = df_interns["search_text"].apply(lambda t: contains_any(t, GOOD_KEYWORDS))
bad_mask = df_interns["search_text"].apply(lambda t: contains_any(t, BAD_KEYWORDS))

df_opt = df_interns[good_mask & ~bad_mask].copy()
print(f"Potential OPT-friendly internships: {len(df_opt)}")

# Add an 'opt_score' column for simple ranking
df_opt["opt_score"] = df_opt["search_text"].apply(
    lambda t: score_text(t, GOOD_KEYWORDS, BAD_KEYWORDS)
)

# Select useful columns if they exist
columns_to_keep = []
for col in ["title", "company", "location", "site", "search_term", "url", "description", "snippet", "opt_score"]:
    if col in df_opt.columns:
        columns_to_keep.append(col)

df_opt = df_opt[columns_to_keep]

# Sort by opt_score descending
df_opt = df_opt.sort_values("opt_score", ascending=False)

df_opt.head(20)

Potential OPT-friendly internships: 180


Unnamed: 0,title,company,location,site,search_term,description,opt_score
277,2026 Operations Research Science Internship - ...,Amazon.com,"Seattle, WA, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
276,2026 Operations Research Science Internship - ...,Amazon.com,"Corvallis, OR, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
149,2026 Operations Research Science Internship - ...,Amazon.com,"Seattle, WA, US",indeed,computer science intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
148,2026 Operations Research Science Internship - ...,Amazon.com,"Corvallis, OR, US",indeed,computer science intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
516,2026 Operations Research Science Internship - ...,Amazon.com,"Corvallis, OR, US",indeed,data science intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
519,2026 Operations Research Science Internship - ...,Amazon.com,"Seattle, WA, US",indeed,data science intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
257,"Internship, Automation Development & Tooling E...",Tesla,"Brooklyn Park, MN, US",indeed,software engineer intern,**What to Expect**\nConsider before submitting...,5
71,RFIC Intern,Nokia,US,indeed,computer science intern,**Number of Position(s):** 1 \n\n \n\n**Dura...,5
63,"Internship, Controls Engineer, Manufacturing (...",Tesla,"Fremont, CA, US",indeed,computer science intern,**What to Expect**\nConsider before submitting...,5
62,"Internship, Automation Development & Tooling E...",Tesla,"Brooklyn Park, MN, US",indeed,computer science intern,**What to Expect**\nConsider before submitting...,5


In [13]:
# Deduplicate based on title + company + location + site
dedupe_keys = [c for c in ["title", "company", "location", "site"] if c in df_opt.columns]
df_opt_unique = df_opt.drop_duplicates(subset=dedupe_keys, keep="first").reset_index(drop=True)

print(f"After deduplication: {len(df_opt_unique)} internships\n")

# Show a preview
df_opt_unique.head(20)

After deduplication: 131 internships



Unnamed: 0,title,company,location,site,search_term,description,opt_score
0,2026 Operations Research Science Internship - ...,Amazon.com,"Seattle, WA, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
1,2026 Operations Research Science Internship - ...,Amazon.com,"Corvallis, OR, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nDo you e...,6
2,"Internship, Automation Development & Tooling E...",Tesla,"Brooklyn Park, MN, US",indeed,software engineer intern,**What to Expect**\nConsider before submitting...,5
3,RFIC Intern,Nokia,US,indeed,computer science intern,**Number of Position(s):** 1 \n\n \n\n**Dura...,5
4,"Internship, Controls Engineer, Manufacturing (...",Tesla,"Fremont, CA, US",indeed,computer science intern,**What to Expect**\nConsider before submitting...,5
5,"Internship, Automotive Photographer, Design St...",Tesla,"Fremont, CA, US",indeed,software engineer intern,**What to Expect**\nConsider before submitting...,5
6,"Research Scientist Intern, Center for Quantum ...",Amazon Web Services,"Pasadena, CA, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nThe Amaz...,4
7,"Research Scientist Intern, Center for Quantum ...",Amazon Web Services,"Santa Clara, CA, US",indeed,software engineer intern,**DESCRIPTION**\n---------------\n\n\nThe Amaz...,4
8,"2026 Summer Intern - Software Engineer, Machin...",General Motors,"Mountain View, CA, US",indeed,software developer intern,**Job Description**\n\nGM does not provide imm...,4
9,Laser Department Manufacturing Intern (Spring ...,,"Seabrook, NH, US",indeed,software engineer intern,**Key Responsibilities:**\n\n\n* **Laser Progr...,4


In [14]:
# Save to CSV
output_path = "opt_internships.csv"
df_opt_unique.to_csv(output_path, index=False)
print(f"Saved {len(df_opt_unique)} internships to: {output_path}")

Saved 131 internships to: opt_internships.csv


In [15]:
print("=== Summary ===")
print(f"Total raw jobs scraped: {len(raw_df)}")
print(f"Internship-like roles: {len(df_interns)}")
print(f"Potential OPT-friendly (before dedupe): {len(df_opt)}")
print(f"Unique OPT-like internships (after dedupe): {len(df_opt_unique)}")

print("\nSample of results:")
df_opt_unique[["title", "company", "location", "site", "opt_score"]].head(10)

=== Summary ===
Total raw jobs scraped: 1070
Internship-like roles: 879
Potential OPT-friendly (before dedupe): 180
Unique OPT-like internships (after dedupe): 131

Sample of results:


Unnamed: 0,title,company,location,site,opt_score
0,2026 Operations Research Science Internship - ...,Amazon.com,"Seattle, WA, US",indeed,6
1,2026 Operations Research Science Internship - ...,Amazon.com,"Corvallis, OR, US",indeed,6
2,"Internship, Automation Development & Tooling E...",Tesla,"Brooklyn Park, MN, US",indeed,5
3,RFIC Intern,Nokia,US,indeed,5
4,"Internship, Controls Engineer, Manufacturing (...",Tesla,"Fremont, CA, US",indeed,5
5,"Internship, Automotive Photographer, Design St...",Tesla,"Fremont, CA, US",indeed,5
6,"Research Scientist Intern, Center for Quantum ...",Amazon Web Services,"Pasadena, CA, US",indeed,4
7,"Research Scientist Intern, Center for Quantum ...",Amazon Web Services,"Santa Clara, CA, US",indeed,4
8,"2026 Summer Intern - Software Engineer, Machin...",General Motors,"Mountain View, CA, US",indeed,4
9,Laser Department Manufacturing Intern (Spring ...,,"Seabrook, NH, US",indeed,4
