## Scholarship Scraper

Install Dependencies

In [1]:
!pip install python-dateutil


Defaulting to user installation because normal site-packages is not writeable


DEPRECATION: Loading egg at c:\program files\python311\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330

[notice] A new release of pip is available: 23.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Import Libraries

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
from dateutil import parser as dateparser
import re
import time
import csv


### Define Section Extraction Function

In [3]:
def extract_section(soup, keywords):
    """Extract text that comes after a heading containing any of the keywords."""
    for tag in soup.find_all(["h2", "h3", "strong", "b"]):
        if any(word in tag.get_text(strip=True).lower() for word in keywords):
            content = []
            next_tag = tag.find_next_sibling()
            while next_tag and next_tag.name in ["p", "ul"]:
                content.append(next_tag.get_text(strip=True))
                next_tag = next_tag.find_next_sibling()
            return "\n".join(content)
    return ""


### Set Up Web Driver

In [4]:
options = webdriver.ChromeOptions()
# options.add_argument("--headless")  # Uncomment for headless mode
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)



### Load Tag Page and Get Links

In [5]:
base_url = "https://scholarshipscorner.website/chinese-government-scholarship/"
driver.get(base_url)
time.sleep(3)

post_elements = driver.find_elements(By.CSS_SELECTOR, "h2.entry-title a")
post_links = [elem.get_attribute("href") for elem in post_elements]
print(f"Found {len(post_links)} posts.")


Found 10 posts.


### Create CSV File

In [6]:
csv_file = open("chinese_scholarships_detailed(1).csv", mode="w", newline="", encoding="utf-8")
csv_writer = csv.DictWriter(csv_file, fieldnames=[
    "Title", "Link", "Official Link", "Deadline", "Eligibility",
    "Host Country", "Host University", "Program Duration", "Degree Offered"
])

csv_writer.writeheader()


108

### Loop Through Each Post and Extract Data

In [7]:
for link in post_links:
    driver.get(link)
    time.sleep(2)
    soup = BeautifulSoup(driver.page_source, "html.parser")

    title = soup.find("h1", class_="entry-title").text.strip()
    content_div = soup.find("div", class_="entry-content")

    # Official link
    official_link = ""
    for a in content_div.find_all("a"):
        if "official" in a.text.lower() or "apply" in a.text.lower():
            official_link = a.get("href")
            break

    # Extract sections
    deadline = extract_section(content_div, ["deadline", "last date"])
    eligibility = extract_section(content_div, ["eligibility", "who can apply", "eligible"])
    host_country = extract_section(content_div, ["host country", "study in"])
    host_university = extract_section(content_div, ["host university", "offered by"])
    program_duration = extract_section(content_div, ["program duration", "duration"])
    degree_offered = extract_section(content_div, ["degree", "degree offered", "field of study", "what you will study"])

    csv_writer.writerow({
        "Title": title,
        "Link": link,
        "Official Link": official_link,
        "Deadline": deadline,
        "Eligibility": eligibility,
        "Host Country": host_country,
        "Host University": host_university,
        "Program Duration": program_duration,
        "Degree Offered": degree_offered
    })


    print(f"✅ Saved: {title}")


✅ Saved: Schwarzman Scholars Programme 2026-27 in China | Fully Funded
✅ Saved: Dalian University CSC Scholarship in China 2025 | Fully Funded
✅ Saved: Chinese Government Scholarship Silk Road Program 2025 | Fully Funded
✅ Saved: Xiamen University CSC Scholarship 2025 | Fully Funded | Chinese Government Scholarship
✅ Saved: Zhejiang University CSC Scholarship 2025 in China | Fully Funded
✅ Saved: HIT Chinese Government Scholarship in China 2025 | Fully Funded
✅ Saved: Beijing Institute of Technology Scholarship in China 2025 | Fully Funded | BIT CSC Scholarship
✅ Saved: Tsinghua University Scholarship in China 2025 | Fully Funded
✅ Saved: Yenching Academy Scholarship in China 2026 | Fully Funded
✅ Saved: Wuhan University CSC Scholarship in China 2024 | Fully Funded | Chinese Government Scholarship


### Close File and Driver

In [8]:
csv_file.close()
driver.quit()
print("🎉 Data saved to chinese_scholarships_detailed.csv")


🎉 Data saved to chinese_scholarships_detailed.csv
