---
title: Lesson 6. OhioLINK ETD
format:
  html:
    toc: true
    toc-expand: 2
    toc-title: CONTENTS
---

The OhioLINK [Electronic Theses and Dissertations (ETD) Center](https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/1?clear=0,1,5,10,20,21,1001) provides access to abstracts and full-text PDFs of theses and dissertations submitted by participating Ohio colleges and universities. Users can perform basic searches by title, author, or keyword, or use [advanced search](https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/search-results?p1001_advanced=1&clear=0,1001) to filter by subject, year, language, institution, ORCID iD, committee members, topic keywords, and full-text content. 

For prospective PhD students, the ETD Center is a valuable resource to:
- Explore research aligned with their interests
- Identify academic programs that fit their goals
- Discover potential advisors or committee members by reviewing recent dissertations

## Data skills | concepts
- Web scraping
- Dynamic vs. static HTML

## Learning objectives
1. Understand the difference between dynamic and static HTML
2. Develop strategies and approaches to gather dynamic HTML content.

This tutorial is designed to support multi-session __[workshops](https://library.osu.edu/events?combine=&tid=All&field_location_code_value=10&sort_bef_combine=field_end_date_value_ASC)__ hosted by The Ohio State University Libraries Research Commons. It assumes you already have a basic understanding of Python, including how to iterate through lists and dictionaries to extract data using a for loop. To learn basic Python concepts visit the [Python - Mastering the Basics](python_basics.ipynb) tutorial.

<div class="alert alert-dismissible alert-primary">
  <button type="button" class="btn-close" data-bs-dismiss="alert" aria-label="Close"></button>
  <h4 class="alert-heading"><img src="images/star_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Important!</h4><p>Remember to examine copyright and terms of use before starting any web scraping project.</p><p><a href="https://www.ohiolink.edu/content/acceptable_use_policy_ohiolink_etd">OhioLINK ETD Acceptable Use Policy</a></p>
</div>

# LESSON 6
When exploring the OhioLINK [Electronic Theses and Dissertations Center](https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/1?clear=0,1,5,10,20,21,1001), you may notice that it lacks built-in features to print, email, save, or export search results to citation managers or other tools. While the [ETD Center Consumer Guide](https://www.ohiolink.edu/content/etd_center_consumer_guide) provides documentation for accessing data via an API, this approach can be move complex than necessary, especially if your goal is simply to retrieve metadata for a specific department within a defined time frame. 

<div class="accordion" id="accordionExercise1">

  <div class="accordion-item"><h2 class="accordion-header" id="ex1-headingOne"><button class="accordion-button fs-3" type="button" data-bs-toggle="collapse" data-bs-target="#ex1-collapseOne" aria-expanded="true" aria-controls="ex1-collapseOne"><img src="images/guidepost_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Exercise 1: Examine the URL</button></h2><div id="ex1-collapseOne" class="accordion-collapse collapse show fs-4" aria-labelledby="ex1-headingOne" data-bs-parent="#accordionExercise1"> <div class="accordion-body fs-4"><p>Search for theses and dissertations submitted <strong>after 2020</strong> from a specific department at <strong>The Ohio State University</strong> using the OhioLINK ETD <strong>Advanced Search</strong>:<ol><li>Go to the OhioLINK ETD <a href="https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/search-results?p1001_advanced=1&clear=0,1001">Advanced Search</a> page.</li><li>Leave the <strong>Subject</strong> field blank.</li><li>In the <strong>Institution</strong> menu, select <strong>The Ohio State University.</strong></li><li>Under <strong>Submission Site</strong>, select <strong>The Ohio State University</strong> again.</li><li>A new field labeled <strong>Institution Department</strong> will appear. This dropdown lists the actual dissertation program names used by the university, which is especially helpful for locating work from interdisciplinary or uniquely named departments. Select the department you're interested in.</li><li>Set the <strong>Year</strong> filter to <strong>2021 or later</strong> to limit results to recent submissions.</li><li>Run the search and examine the URL structure in your browser‚Äôs address bar. Consider:<ul><li>How are the search parameters encoded?</li><li>How does this URL differ from others you‚Äôve used in web scraping or automation?</li></ul></li></ol></p></div></div>
  </div>

  <div class="accordion-item"><h2 class="accordion-header" id="ex1-headingTwo"><button class="accordion-button fs-3 collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#ex1-collapseTwo" aria-expanded="false" aria-controls="ex1-collapseTwo"><img src="images/magnifying_glass_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Solution:</button></h2><div id="ex1-collapseTwo" class="accordion-collapse collapse" aria-labelledby="ex1-headingTwo" data-bs-parent="#accordionExercise1"> <div class="accordion-body"><img src="images/ohiolink1.png" alt="Screenshot of OhioLINK ETD search limited by year 2020 through 2025, institution and submission site = The Ohio State University, institution departmetn = Neuroscience Graduate Studies Program. There are 39 total search results on 2 pages." class="img-fluid" style="max-width: 100%; border-radius: 8px; padding: 1rem;"><p>A search for theses and dissertations submitted after 2020 by students in <strong>The Ohio State University's Neuroscience Graduate Studies Program</strong> returns <strong>records</strong>, with <strong>20 results displayed per page</strong>. Notably, the <strong>search URL does not include visible parameters or filters</strong>, and it remains <strong>unchanged</strong> even when navigating to subsequent pages‚Äîsuggesting that the site uses JavaScript to manage pagination and query state dynamically.</p><p class="text-primary">https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/search-results?p1001_advanced=1&clear=0,1001</p>

  </div>
  </div>
  </div>

</div>

## Handling dynamic content

The [Electronic Theses and Dissertations Center](https://etd.ohiolink.edu/acprod/odb_etd/r/etd/search/1?clear=0,1,5,10,20,21,1001) is a dynamic website. While the page layout, such as headers, footers, and overall structure, is built with static HTML, the search results are loaded dynamically via JavaScript after the initial page load. If you inspect the page using browser Developer Tools, you‚Äôll find that search results are inserted into the element `<t-SearchResults-content>`:

**Example**:

In [None]:
<div class="t-SearchResults-content">
            <h3 class="t-SearchResults-title">
                1. <span class="t-SearchResults-author">Mackey-Alfonso, Sabrina</span>
                <a href="/acprod/odb_etd/r/etd/search/10?p10_accession_num=osu1744894008278402&amp;clear=10&amp;session=10473290292427">Short-term High Fat Diet Accelerates Synaptic and Memory Deficits via Neuroinflammatory Mechanisms in an Alzheimer's Disease Mouse Model</a>
            </h3>
            <div class="t-SearchResults-info">
              <p class="t-SearchResults-degree">
                  Doctor of Philosophy, The Ohio State University, 2025, Neuroscience Graduate Studies Program
              </p>
              <p class="t-SearchResults-desc hide-class"><div style="overflow: hidden; height: 40px;">Alzheimer's Disease (AD) is a neurodegenerative disease characterized by profound memory impairments, synaptic loss, neuroinflammation, and hallmark pathological markers. High-fat diet (HFD) consumption increases the risk of developing AD even after controlling for metabolic syndrome, pointing to a role of the diet itself in increasing risk. In AD, the complement system, an arm of the immune system which normally tags redundant or damaged synapses for pruning, becomes pathologically overactivated leading to tagging of healthy synapses. While the unhealthy diet to AD link is strong, the underlying mechanisms are not well understood in part due to confounding variables associated with long-term HFD which can independently influence the brain. Therefore, we experimented with a short-term diet regimen to isolate the diet's impact on brain function without causing changes in metabolic markers.
This project investigated potential mechanisms underlying cognitive impairments evoked by short-term diet consumption using the 3xTg-AD model. In chapter 1 we discuss the link between HFD and AD and outline the current findings and hypothesis regarding of relevant mechanisms. In chapter 2 we characterize the effect of short-term HFD on 1) memory, 2) neuroinflammation including complement, 3) AD pathology markers, 4) synaptic markers, and 5) in vitro microglial synaptic phagocytosis in the 3xTg-AD mouse model. In chapter 3 we analyze two potential mechanisms underlying HFD-mediated AD vulnerability: toll-like receptor 4 (TLR4)-evoked neuroinflammation and complement system activation. Finally, in chapter 4 we validate the absence of glucose modifications as an effect of the diet and drug treatment, explore potential mitochondrial mechanisms in the hippocampus, and evaluate the diet's impact on the pre-frontal cortex (PFC).
Following the consumption of either standard chow or HFD, 3xTg-AD mice exhibited impaired long-term memory performance which was associated with increased level (open full item for complete abstract)</div> <a href="#" data-ctrl="" class="">... <em>More</em></a></p>
              <span class="t-SearchResults-misc"><b>Committee:</b> Ruth Barrientos (Advisor); Benedetta Leuner (Committee Member); Nikki Kokiko-Cochran (Committee Member); Harry Fu (Committee Member)</span>
              <span class="t-SearchResults-misc"><b>Subjects:</b> Neurosciences</span>
              <!-- span class="t-SearchResults-misc">Score: 100</span -->
            </div>
        </div>

This setup allows the site to update content without reloading the entire page.

‚ö†Ô∏è **Why requests and BeautifulSoup Alone Won‚Äôt Work**

If you try to scrape the page using requests and BeautifulSoup, you‚Äôll notice that the response only contains the static HTML shell‚Äînone of the dynamically loaded search results are included. That‚Äôs because the content is rendered by JavaScript, which requests cannot execute.

üõ†Ô∏è **Workaround for Small-Scale Projects**

If you're working on a small project and just need to extract a limited number of results, you can manually save the page and parse it locally:

‚úÖ **Steps:**

1. Set **Results Per Page to 100** to minimize the number of pages you need to save.
2. **Right-click** on the results per page and choose **Save As**.
3. Select **Webpage, Single File (*.mhtml)** as the format.
4. Open the saved `.mhtml` file using **Notepad** or any other plain text editor.
5. Delete everything above the `<!DOCTYPE html>` line.
4. Save the file again, changing the extension from `.mhtml` to `.html`.

Once saved as a `.html` file, you can read and parse it using **BeautifulSoup**. 

<div class="accordion" id="accordionExercise2">

  <div class="accordion-item"><h2 class="accordion-header" id="ex2-headingOne"><button class="accordion-button fs-3" type="button" data-bs-toggle="collapse" data-bs-target="#ex2-collapseOne" aria-expanded="true" aria-controls="ex2-collapseOne"><img src="images/guidepost_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Exercise 2: Save HTML</button></h2><div id="ex2-collapseOne" class="accordion-collapse collapse show fs-4" aria-labelledby="ex2-headingOne" data-bs-parent="#accordionExercise2"> <div class="accordion-body fs-4">Save your results from Exercise 1 by following the steps listed above.</div></div>
  </div>

  <div class="accordion-item"><h2 class="accordion-header" id="ex2-headingTwo"><button class="accordion-button fs-3 collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#ex2-collapseTwo" aria-expanded="false" aria-controls="ex2-collapseTwo"><img src="images/magnifying_glass_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Solution:</button></h2><div id="ex2-collapseTwo" class="accordion-collapse collapse" aria-labelledby="ex2-headingTwo" data-bs-parent="#accordionExercise2"> <div class="accordion-body"><img src="images/ohiolink2.png" alt= "Screenshot of OhioLINK ETD search showing location of Save As and screenshot of the explorer window with webpage, single file (*.mhtml) selected for save as type" style="max-width: 100%; border-radius: 8px; padding: 1rem;">
  </div>
  </div>
  </div>

</div>

<div class="accordion" id="accordionExercise3">

  <div class="accordion-item"><h2 class="accordion-header" id="ex3-headingOne"><button class="accordion-button fs-3" type="button" data-bs-toggle="collapse" data-bs-target="#ex3-collapseOne" aria-expanded="true" aria-controls="ex3-collapseOne"><img src="images/guidepost_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Exercise 3: Parse HTML</button></h2><div id="ex3-collapseOne" class="accordion-collapse collapse show fs-4" aria-labelledby="ex3-headingOne" data-bs-parent="#accordionExercise3"> <div class="accordion-body fs-4"><p>Use <strong>BeautifulSoup</strong> to extract the following elements from your saved HTML file:<ul><li><span class="text-primary">title</span></li><li><span class="text-primary">author</span></li><li><span class="text-primary">degree</span></li><li><span class="text-primary">degre_year</span></li><li><span class="text-primary">advisor</span></li><li><span class="text-primary">committee_members</span></li><li><span class="text-primary">subjects</span></li></ul></p><p><strong>Remember to read the <span class="text-primary">HTML</span>` file into Python</strong> first. <strong>Export</strong> the results to a <span class="text-primary">CSV</span> file.</p></div></div>
  </div>

  <div class="accordion-item"><h2 class="accordion-header" id="ex3-headingTwo"><button class="accordion-button fs-3 collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#ex3-collapseTwo" aria-expanded="false" aria-controls="ex3-collapseTwo"><img src="images/magnifying_glass_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Solution:</button></h2><div id="ex3-collapseTwo" class="accordion-collapse collapse" aria-labelledby="ex3-headingTwo" data-bs-parent="#accordionExercise3"> <div class="accordion-body">

```python
from bs4 import BeautifulSoup
import pandas as pd

results=pd.DataFrame()

contents = open('data/results.html').read()
soup = BeautifulSoup(contents, 'html.parser')
etds = soup.find_all(attrs={"class":'3D"t-SearchResults-content"'})

for each_etd in etds:

    row={}
    title=each_etd.find("h3").find("a").text.replace('=\n','').replace('=',' ')
    row['title']=title
    author=each_etd.find('span', {"class":'3D"t-SearchResults-author"'}).text.replace('=\n','').split('\n')[0]
    row['author']=author
    degree=each_etd.find('p', {"class":'3D"t-SearchResults-degree"'}).text.split(',')[0].strip()
    row['degree']=degree
    degree_year=each_etd.find('p', {"class":'3D"t-SearchResults-degree"'}).text.split(',')[2].strip()
    row['degree_year']=degree_year
    misc_results=author=each_etd.find_all('span', {"class":'3D"t-SearchResults-misc"'})

    advisors=[]
    committee=[]
    subjects=[]
    for misc in misc_results:
        if "Committee:" in misc.text:
            members=misc.text.replace('=\n','').replace("Committee: ","").split(';')
            for member in members:
                if "advisor" in member.lower():
                    advisors.append(member.split('(')[0].replace('=\n','').strip())
                    committee.append(member.split('(')[0].replace('=\n','').strip())
                else:
                    committee.append(member.split('(')[0].strip())

        elif "Subjects:" in misc.text:
            subjs=misc.text.replace('=\n','').replace("Subjects: ","").split(';')
            for subject in subjs:
                subjects.append(subject.strip())

                
    advisors=(';').join(advisors).rstrip(';')
    row['advisors']=advisors
    committee=(';').join(committee).rstrip(';')
    row['committee']=committee
    subjects=(';').join(subjects).rstrip(';')
    row['subjects']=subjects
    
    df_row=pd.DataFrame(row, index=[0])
    results=pd.concat([df_row, results], axis=0, ignore_index=True)
    
results.to_csv('data/npsg.csv')
```
  </div>
  </div>
  </div>

</div>



For larger projects, consider using tools such as [Requests-HTML](https://requests.readthedocs.io/projects/requests-html/en/latest/) or [Selenium](https://pypi.org/project/selenium/), which are capable of rendering JavaScript‚Äîmaking them ideal for scraping dynamic web content that standard libraries like requests and BeautifulSoup can't access.