# Web scraping to generate RSS feed for new positions in economics
This application is using three main sources to retrieve information about the new job posts:
1. [NBER](https://www.nber.org/career-resources/research-assistant-positions-not-nber)
2. [Predoc](https://predoc.org/opportunities)
3. [EconJobMarket](https://econjobmarket.org/market)
   
The packeges that are needed are **requests**, **beautifulsoup4**,**MIMEtext**. As a first step we recall them:


In [None]:

import requests
import certifi
from bs4 import BeautifulSoup
import re
import csv
import os
import pandas as pd
import smtplib
from email.mime.text import MIMEText
from IPython.display import Markdown, display
from email.mime.multipart import MIMEMultipart
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv() #just for email and password 



True

In [2]:
CSV_FILE = "jobs.csv" #database saved in simple csv file
PREDOC_URL = "https://predoc.org/opportunities"
NBER_URL = "https://www.nber.org/career-resources/research-assistant-positions-not-nber"
EJM_URL = "https://econjobmarket.org/market"



## Downloading the html
The following functions are downloading the HTML content from the sources and it save it in the foulder sources.
For PREDOC there is a issue with certificate so it is easy to use curl (bash MacOS)

In [3]:
!mkdir -p sources
!curl -L "https://predoc.org/opportunities" -o "sources/predoc.html"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  407k  100  407k    0     0   163k      0  0:00:02  0:00:02 --:--:--  163k


In [4]:
def download_html(url, filename):
    """
    Downloads the HTML content from the given URL and saves it to the specified filename.
    """
    try:
        response = requests.get(url, verify=certifi.where())
        response.raise_for_status()
        with open(filename, "w", encoding="utf-8") as f:
            f.write(response.text)
        print(f"Downloaded HTML from {url} to {filename}")
    except Exception as e:
        print(f"Error downloading {url}: {e}")

# Ensure the 'sources' folder exists.
os.makedirs("sources", exist_ok=True)

# Download HTML content for each source.
download_html(PREDOC_URL, "sources/predoc.html")
download_html(NBER_URL, "sources/nber.html")
download_html(EJM_URL, "sources/ejm.html")


Error downloading https://predoc.org/opportunities: HTTPSConnectionPool(host='predoc.org', port=443): Max retries exceeded with url: /opportunities (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1010)')))
Downloaded HTML from https://www.nber.org/career-resources/research-assistant-positions-not-nber to sources/nber.html
Downloaded HTML from https://econjobmarket.org/market to sources/ejm.html




## Web Scraping Section 🚀

In this section, we set up our web scraping functionality. Our goal is to **extract job details** from pre-doctoral opportunities pages (in this example, from [predoc.org](https://predoc.org/opportunities)). We assume that the HTML content has already been downloaded and saved locally in the `sources` folder.

### Predoc
What This Section Does:
- **Reads the Local HTML File 📂:**  
  We read the downloaded HTML file (`sources/predoc.html`). If the file isn't found, the code prompts you to download it first.
  
- **Parses the HTML Content 🥣:**  
  Using BeautifulSoup, the code parses the HTML to locate the container that holds the opportunity details.
  
- **Extracts Key Information 🔍:**  
  For each job posting, the function extracts:
  - **Program Title** and **Link** from the `<h2>` element.
  - Additional details (like **sponsor**, **institution**, **fields of research**, and **deadline**) from the "copy" `<div>`.
  
- **Determines the Main Field 🔑:**  
  It combines several text fields and passes them to an auxiliary function (`extract_main_field()`) that determines the primary focus (e.g., Economics, Microeconomics, Finance, etc.).

- **Returns the Data as a List 📤:**  
  Each job is stored as a dictionary, and the function returns a list of these dictionaries.

> **Note:**  
> Make sure to download the HTML file before running the scraper (therefore run the previous chunks).


In [5]:
def scrape_predoc():
    """
    Scrapes the pre-doctoral opportunities page from the local HTML file
    and extracts job details.
    """
    jobs = []
    
    # Attempt to read the local HTML file. 📂
    try:
        with open("sources/predoc.html", "r", encoding="utf-8") as f:
            html = f.read()
    except Exception as e:
        print("Error reading sources/predoc.html. Please download the HTML from predoc before proceeding. 🚫")
        return jobs  # Return an empty list if the file can't be read.
    
    # Parse the HTML content using BeautifulSoup. 🥣
    soup = BeautifulSoup(html, "html.parser")
    
    # Find the container holding the opportunities using a regex on the class name. 🔍
    container = soup.find("div", class_=re.compile("Opportunities"))
    if not container:
        print("No Predoc container found. 😢")
        return jobs
    
    # Loop over each article element within the container. 📝
    articles = container.find_all("article")
    for article in articles:
        job = {}
        job["source"] = "predoc"  # Mark the source as 'predoc'. 🌟
        
        # Extract the title and link from the <h2> element. 🏷️
        h2 = article.find("h2")
        if h2:
            a_tag = h2.find("a")
            if a_tag:
                job["program_title"] = a_tag.get_text(strip=True)
                job["link"] = a_tag.get("href", "").strip()
            else:
                job["program_title"] = "N/A"
                job["link"] = ""
        else:
            job["program_title"] = "N/A"
            job["link"] = ""
        
        # Extract details from the "copy" div. 🗒️
        copy_div = article.find("div", class_="copy")
        if copy_div:
            p = copy_div.find("p")
            if p:
                text = p.get_text(separator=" ", strip=True)
                # Use regex to capture specific fields from the text. 🔍
                researcher_match = re.search(r"Sponsoring Researcher\(s\):\s*(.*?)\s*Sponsoring Institution:", text)
                institution_match = re.search(r"Sponsoring Institution:\s*(.*?)\s*Fields of Research", text)
                fields_match = re.search(r"Fields of Research\s*:\s*(.*?)\s*Deadline:", text)
                deadline_match = re.search(r"Deadline:\s*(.*)", text)
                job["sponsor"] = researcher_match.group(1).strip() if researcher_match else "N/A"
                job["institution"] = institution_match.group(1).strip() if institution_match else "N/A"
                job["fields"] = fields_match.group(1).strip() if fields_match else "N/A"
                job["deadline"] = deadline_match.group(1).strip() if deadline_match else "N/A"
            else:
                job["sponsor"] = "N/A"
                job["institution"] = "N/A"
                job["fields"] = "N/A"
                job["deadline"] = "N/A"
        else:
            job["sponsor"] = "N/A"
            job["institution"] = "N/A"
            job["fields"] = "N/A"
            job["deadline"] = "N/A"
        
        # Add additional fields for consistency. 🛠️
        job["university"] = "N/A"
        job["program_type"] = "N/A"
        job["publication_date"] = "N/A"
        
        # Determine the main field by combining text from various fields. 🔑
        text_to_search = " ".join([job.get("fields", ""), job.get("program_title", ""), job.get("institution", "")])
        job["main_field"] = extract_main_field(text_to_search)
        
        # Append the extracted job details to the jobs list. ✅
        jobs.append(job)
    
    # Return the list of all extracted job details. 📤
    return jobs


# Web Scraping Section for NBER (Local HTML) 🔎

In this section, we extract job details from the locally saved NBER page HTML file. The function follows these steps:

- **📂 Read the Local HTML File:**  
  The function attempts to read `sources/nber.html`. If the file isn't found, it prints an error message and returns an empty list.

- **🥣 Parse HTML with BeautifulSoup:**  
  The HTML content is parsed so we can navigate and extract the data.

- **🔍 Locate the Container:**  
  It finds the `<div>` with class `page-header__intro-inner` that holds the job details.

- **✂️ Skip Header Paragraphs:**  
  The first three `<p>` elements are skipped as they contain header information.

- **📋 Extract Job Details:**  
  For each job posting, the function extracts:
  - Program title  
  - Sponsor  
  - Institution  
  - Fields of research  
  - Job link  
  If any of these details are missing, it defaults to `"N/A"`.

- **🔑 Determine Main Field:**  
  It combines relevant text and uses the helper function `extract_main_field()` (which should be defined elsewhere) to determine the primary research area.

- **✅ Return the Jobs List:**  
  Finally, all extracted job entries are stored in a list and returned.


In [6]:
def scrape_nber():
    """
    Scrapes the NBER research assistant positions page from a local HTML file
    and extracts job details.
    """
    jobs = []
    
    # Attempt to read the local HTML file. 📂
    try:
        with open("sources/nber.html", "r", encoding="utf-8") as f:
            html = f.read()
    except Exception as e:
        print("Error reading sources/nber.html. Please download the HTML from NBER before proceeding. 🚫")
        return jobs  # Return an empty list if the file can't be read.
    
    # Parse the HTML content using BeautifulSoup. 🥣
    soup = BeautifulSoup(html, "html.parser")
    
    # Find the container holding the job details using its class name. 🔍
    container = soup.find("div", class_="page-header__intro-inner")
    if container:
        # Get all <p> elements inside the container. 📝
        paragraphs = container.find_all("p")
        # Skip the first three header paragraphs. ✂️
        for p in paragraphs[3:]:
            job = {}
            job["source"] = "nber"  # Mark the source as NBER. 🌟
            parts = p.decode_contents().split("<br>")
            if len(parts) >= 5:
                job["program_title"] = parts[0].strip()
                job["sponsor"] = parts[1].replace("NBER Sponsoring Researcher(s):", "").strip()
                job["institution"] = parts[2].replace("Institution:", "").strip()
                job["fields"] = parts[3].replace("Field(s) of Research:", "").strip()
                
                # Extract the job link from the HTML in the last part. 🔗
                link_soup = BeautifulSoup(parts[4], "html.parser")
                a_tag = link_soup.find("a")
                job["link"] = a_tag["href"] if a_tag else ""
            else:
                # Default values if parts are missing. 😢
                job["program_title"] = "N/A"
                job["sponsor"] = "N/A"
                job["institution"] = "N/A"
                job["fields"] = "N/A"
                job["link"] = ""
            
            job["deadline"] = "N/A"  # Deadline not provided. ⏰
            job["university"] = "N/A"
            job["program_type"] = "N/A"
            job["publication_date"] = "N/A"
            
            # Combine text fields to determine the main field using a helper function. 🔑
            text_to_search = " ".join([job.get("fields", ""), job.get("program_title", ""), job.get("institution", "")])
            job["main_field"] = extract_main_field(text_to_search)
            
            # Append the extracted job to our list. ✅
            jobs.append(job)
    else:
        print("NBER container not found. 😢")
    
    # Return the list of all extracted job details. 📤
    return jobs


### Web Scraping Section for EJM (Econ Job Market) 🔎

This function is designed to scrape job postings from the Econ Job Market (EJM) page. It performs the following tasks:

- **🌐 Fetching the Page:**  
  It sends an HTTP GET request to the EJM URL using the `requests` library.

- **🥣 Parsing HTML:**  
  The response content is parsed with BeautifulSoup to create a DOM structure for extraction.

- **🔍 Locating Job Panels:**  
  It finds all `<div>` elements with the classes `"panel panel-info"`, each representing a job posting.

- **🏷️ Extracting Job Details:**  
  For each panel, it extracts:
  - **Job Title & Link:** Located within an `<a>` tag with an ID starting with "title-".  
  - **University & Program Type:** Extracted from `<div>` elements with class `"col-md-4"` and `"col-md-2"`, respectively.
  - **Publication Date & Deadline:** Extracted from `<div>` elements with class `"col-md-2"`.
  - **Default Values:** Fields such as **sponsor**, **institution**, and **fields** are set to `"N/A"` since they're not provided.
  
- **🔑 Determining the Main Field:**  
  It combines the program title and university information to deduce the primary research field using the helper function `extract_main_field()`.

- **✅ Building the Result List:**  
  Each job is stored as a dictionary, and all such dictionaries are appended to a list which is then returned.



In [7]:
def scrape_ejm():
    """
    Scrapes the Econ Job Market (EJM) page and extracts job details.
    """
    jobs = []
    try:
        # Send an HTTP GET request to the EJM page. 🌐
        response = requests.get(EJM_URL)
        response.raise_for_status()  # Ensure we got a valid response. ✅
        
        # Parse the response content using BeautifulSoup. 🥣
        soup = BeautifulSoup(response.content, "html.parser")
        
        # Find all panels that represent job postings. 🔍
        panels = soup.find_all("div", class_="panel panel-info")
        for panel in panels:
            job = {}
            job["source"] = "ejm"  # Mark the source as EJM. 🌟
            
            # Extract title and link from the <a> element with an ID starting with 'title-'. 🏷️
            title_a = panel.find("a", id=lambda x: x and x.startswith("title-"))
            if title_a:
                job["program_title"] = title_a.get_text(strip=True)
                job["link"] = title_a.get("href", "").strip()
            else:
                job["program_title"] = "N/A"
                job["link"] = ""
            
            # Extract university and program type from the designated columns.
            cols = panel.find_all("div", class_="col-md-4")
            if len(cols) >= 2:
                job["university"] = cols[1].get_text(" ", strip=True)
            else:
                job["university"] = "N/A"
            
            col_md2 = panel.find("div", class_="col-md-2")
            if col_md2:
                job["program_type"] = col_md2.get_text(" ", strip=True)
            else:
                job["program_type"] = "N/A"
            
            # Extract publication and deadline dates from the columns with class 'col-md-2'.
            cols_date = panel.find_all("div", class_="col-md-2")
            if cols_date:
                if len(cols_date) >= 1:
                    job["publication_date"] = cols_date[0].get_text(" ", strip=True)
                else:
                    job["publication_date"] = "N/A"
                if len(cols_date) >= 2:
                    job["deadline"] = cols_date[1].get_text(" ", strip=True)
                else:
                    job["deadline"] = "N/A"
            else:
                job["publication_date"] = "N/A"
                job["deadline"] = "N/A"
            
            # EJM pages may not include these details, so we use default values. 😢
            job["sponsor"] = "N/A"
            job["institution"] = "N/A"
            job["fields"] = "N/A"
            
            # Combine text fields (program title and university) to determine the main field. 🔑
            text_to_search = " ".join([job.get("program_title", ""), job.get("university", "")])
            job["main_field"] = extract_main_field(text_to_search)
            
            # Append the job details to our list. ✅
            jobs.append(job)
    except Exception as e:
        print("Error during EJM scraping:", e)
    return jobs


## Extract Main Field Helper Function 🔑

The `extract_main_field` function analyzes a given text to determine which research fields are mentioned. It searches for multiple keywords in a **case-insensitive** manner. If one or more keywords are found, it returns them as a comma‑separated string. If none are found, it returns `"N/A"`.

### Keywords Included:
- **Economics**
- **Macroeconomics**
- **Microeconomics**
- **Labour**
- **Industrial Organization**
- **Enterpreneurship**
- **Healthcare**
- **Discrimination**
- **Finance**
- **Public Policy**

You can extend this list with additional fields in economics as needed.

In [8]:
def extract_main_field(text):
    """
    Looks for keywords in the provided text.
    Keywords: Economics, Macroeconomics, Microeconomics, Labour, Industrial Organization,
    Enterpreneurship, Healthcare, Discrimination, Finance, Public Policy.
    Returns a comma-separated string of all found keywords or "N/A" if none are found.
    """
    keywords = [
        "Economics", "Macroeconomics", "Microeconomics",
        "Labour", "Industrial Organization", "Enterpreneurship",
        "Healthcare", "Discrimination", "Finance", "Public Policy"
    ]
    
    found = []
    for keyword in keywords:
        if keyword.lower() in text.lower():
            found.append(keyword)
    
    if found:
        # Remove duplicates while preserving order and return as a comma-separated string.
        unique_keywords = list(dict.fromkeys(found))
        return ", ".join(unique_keywords)
    else:
        return "N/A"

## CSV & Email Handling Section 📊✉️

This section contains helper functions to manage your job database and send email notifications when new opportunities are detected.

### 1. Reading Existing Jobs from a CSV File 📂

The `read_existing_jobs()` function reads a CSV file that contains saved job listings and returns a **set** of job links that are already recorded.  
- It checks if the file exists.  
- It uses Python's `csv.DictReader` to iterate over rows and collects the "link" field for each job.  

### 2. Appending New Jobs to the CSV File 💾

The `append_jobs_to_csv()` function takes a list of job dictionaries and appends them to the specified CSV file.
- If the CSV file doesn't exist, it creates the file and writes the header.  
- It then appends each job as a new row.  

### 3. Sending Email Notifications ✉️

- **Purpose:**  
  This function sends an email notification whenever new job records are found.  
  - **Single Record:** The subject is set to the university name from that record.
  - **Multiple Records:** The subject lists the unique university names (e.g., "University A, University B").

- **Email Body:**  
  The email body is constructed as an HTML document with a styled table that lists:
  - **Source** (e.g., "predoc", "nber", "ejm")
  - **Program Title**
  - **Clickable Link** (each link is rendered as a clickable hyperlink)
  - **Sponsor**
  - **Institution**
  - **Fields**
  - **Main Field**
  - **Deadline**
  - **University**
  - **Program Type**
  - **Publication Date**

- **How It Works:**  
  1. **Subject Creation:**  
     The function extracts university names from each job record. If there's only one record, it uses that university name; if multiple, it joins all unique names.
  
  2. **HTML Table Construction:**  
     An HTML table is built with one row per job record, ensuring that links are rendered as clickable hyperlinks.
  
  3. **Email Assembly:**  
     The email is composed as a multipart message with both plain text and HTML parts.
  
  4. **Sending the Email:**  
     Using Python's `smtplib`, the function logs in to the SMTP server (defaulting to Gmail) and sends the email.



In [9]:
def read_existing_jobs(csv_file):
    """
    Reads the CSV file and returns a set of job links that are already recorded.
    """
    existing_links = set()
    if os.path.exists(csv_file):
        with open(csv_file, newline='', encoding='utf-8') as f:
            reader = csv.DictReader(f)
            for row in reader:
                if "link" in row:
                    existing_links.add(row["link"])
    return existing_links

def append_jobs_to_csv(csv_file, jobs, fieldnames):
    """
    Appends the list of job dictionaries to the CSV file.
    If the file does not exist, it is created and a header is written.
    """
    file_exists = os.path.exists(csv_file)
    with open(csv_file, "a", newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        if not file_exists:
            writer.writeheader()
        for job in jobs:
            writer.writerow(job)

def send_email_new_jobs(new_jobs, sender_email, sender_password, receiver_email, smtp_server="smtp.gmail.com", smtp_port=587):
    """
    Sends an email with new job records.
    
    If there is a single record, the email subject is set to the university name from that record.
    If there are multiple records, the subject lists the unique university names (comma-separated).
    
    The email body is an HTML table containing one row per job record.
    Clickable hyperlinks are created for the job links.
    """
    
    # Extract university names from the new jobs (ignoring "N/A")
    universities = [job.get("university", "N/A") for job in new_jobs if job.get("university", "N/A") != "N/A"]
    unique_universities = list(dict.fromkeys(universities))  # preserve order, remove duplicates
    
    # Set email subject based on number of records:
    if len(new_jobs) == 1:
        subject = unique_universities[0] if unique_universities else "New Research Position"
    else:
        subject = ", ".join(unique_universities) if unique_universities else "New Research Positions"
    
    # Construct the HTML table for the email body
    html_body = """
    <html>
      <head>
        <style>
          table, th, td {
            border: 1px solid #ddd;
            border-collapse: collapse;
            padding: 8px;
          }
          th {
            background-color: #f2f2f2;
          }
        </style>
      </head>
      <body>
        <p>New Research Positions Found:</p>
        <table>
          <tr>
            <th>Source</th>
            <th>Program Title</th>
            <th>Link</th>
            <th>Sponsor</th>
            <th>Institution</th>
            <th>Fields</th>
            <th>Main Field</th>
            <th>Deadline</th>
            <th>University</th>
            <th>Program Type</th>
            <th>Publication Date</th>
          </tr>
    """
    for job in new_jobs:
        link = job.get("link", "")
        # Make the link clickable if available
        clickable_link = f'<a href="{link}">{link}</a>' if link else "N/A"
        html_body += f"""
          <tr>
            <td>{job.get("source", "N/A")}</td>
            <td>{job.get("program_title", "N/A")}</td>
            <td>{clickable_link}</td>
            <td>{job.get("sponsor", "N/A")}</td>
            <td>{job.get("institution", "N/A")}</td>
            <td>{job.get("fields", "N/A")}</td>
            <td>{job.get("main_field", "N/A")}</td>
            <td>{job.get("deadline", "N/A")}</td>
            <td>{job.get("university", "N/A")}</td>
            <td>{job.get("program_type", "N/A")}</td>
            <td>{job.get("publication_date", "N/A")}</td>
          </tr>
        """
    html_body += """
        </table>
      </body>
    </html>
    """
    
    # Create a multipart email message (plain text and HTML)
    msg = MIMEMultipart("alternative")
    msg["Subject"] = subject
    msg["From"] = sender_email
    msg["To"] = receiver_email
    
    # Plain text version as fallback
    text_body = "New Research Positions Found. Please view this email in an HTML-compatible client."
    
    part1 = MIMEText(text_body, "plain")
    part2 = MIMEText(html_body, "html")
    
    msg.attach(part1)
    msg.attach(part2)
    
    # Send the email via SMTP
    try:
        with smtplib.SMTP(smtp_server, smtp_port) as server:
            server.starttls()  # Secure the connection
            server.login(sender_email, sender_password)
            server.send_message(msg)
        print("Email sent successfully!")
    except Exception as e:
        print("Failed to send email:", e)

## Main Function: Scrape, Update, and Notify 🚀📊✉️

This **main()** function orchestrates the complete workflow of the project. It:

- **Scrapes Job Data:**  
  Calls the scraping functions for all three sources (Predoc, NBER, EJM) to collect job postings.

- **Filters New Jobs:**  
  Reads an existing CSV file (acting as a simple database) to get a set of already recorded job links. Then, it filters out jobs that are already present.

- **Sends Notifications:**  
  For each new job found, the function sends an email notification with the job details.

- **Updates the CSV Database:**  
  Finally, it appends the new job entries to the CSV file for future reference.

> **Note:**  
> Ensure that your SMTP credentials (i.e. `SENDER_EMAIL` and `SENDER_PASSWORD`) are set up and that the scraping functions (`scrape_predoc()`, `scrape_nber()`, and `scrape_ejm()`) along with CSV and email helper functions are defined before running `main()`.

In [None]:
def main():
    # Scrape jobs from all three sources. 🌐
    jobs = []
    jobs.extend(scrape_predoc())
    jobs.extend(scrape_nber())
    jobs.extend(scrape_ejm())

    # Check if any jobs were scraped. 🚨
    if not jobs:
        print("No jobs were scraped.")
        return

    # Read existing jobs from CSV (using 'link' as a unique identifier). 📂
    existing_links = read_existing_jobs(CSV_FILE)

    # Filter out jobs that are already recorded. 🔍
    new_jobs = [job for job in jobs if job.get("link") not in existing_links]
    print(f"Found {len(new_jobs)} new job(s).")

    # Define CSV columns (fieldnames) for consistent data structure. 📋
    fieldnames = [
        "source", "program_title", "link", "sponsor",
        "institution", "fields", "main_field", "deadline",
        "university", "program_type", "publication_date"
    ]

    # If new jobs are found, process them. ✉️💾
    if new_jobs:
        # Retrieve SMTP credentials from environment variables.
        sender_email    = os.getenv('SENDER_EMAIL')
        sender_password = os.getenv('SENDER_PASSWORD')
        receiver_email  = os.getenv('SENDER_EMAIL')
        
        # Convert the new jobs to a Pandas DataFrame for easy visualization. 📈
        df = pd.DataFrame(new_jobs)
        md_table = df.to_markdown(index=False)
        # Uncomment these lines if you want to send an email and update the CSV.
        # send_email_new_jobs(new_jobs, sender_email, sender_password, receiver_email)
        # append_jobs_to_csv(CSV_FILE, new_jobs, fieldnames)
        
        # Print the DataFrame to visualize the new jobs.
        
    else:
        # If no new jobs were found, print a message and display existing links.
        print("No new jobs found.")
        df = pd.DataFrame(list(existing_links), columns=["link"])
        md_table = df.to_markdown(index=False)
    # Display the table in the notebook
    display(Markdown(md_table))


In [37]:
if __name__ == "__main__":
    main()


Found 0 new job(s).
No new jobs found.
                            link
0                               
1         https://bit.ly/40M5pz6
2                      #ad-10962
3         https://bit.ly/40MdGDf
4         https://bit.ly/40POY4Y
..                           ...
149       https://bit.ly/47y04yB
150       https://bit.ly/42mLf1b
151       https://bit.ly/4f2ipGs
152       https://bit.ly/3XOeQ0C
153  https://stanford.io/4hF4vMG

[154 rows x 1 columns]
