# Simplified Long COVID Research Analysis

This notebook demonstrates the **simplified unified pipeline** for processing scientific literature:

1. Search for Long COVID papers
2. Process top papers with unified pipeline (XML → Enrichment → RDF)
3. View results

The complex multi-step workflow is now handled by a single `PaperProcessingPipeline` call.

## 1. Import Libraries

Import the simplified unified pipeline and basic search functionality.

In [1]:
# Import simplified components
from pyeuropepmc import SearchClient, QueryBuilder
from pyeuropepmc.pipeline import PaperProcessingPipeline, PipelineConfig

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Search for Long COVID Papers

Find papers with XML available for processing.

In [2]:
# Search for Long COVID papers
search_client = SearchClient()

query = (
    QueryBuilder()
    .keyword("long covid")
    .or_()
    .keyword("post-acute sequelae of SARS-CoV-2")
    .and_()
    .citation_count(min_count=10)
    .build()
)

results = search_client.search(query, limit=20)
papers = results["resultList"]["result"]

print(f"Found {len(papers)} Long COVID papers")

# Show top 3 papers
for i, paper in enumerate(papers[:3], 1):
    print(f"{i}. {paper.get('title', 'No title')[:60]}...")
    print(f"   Citations: {paper.get('citedByCount', 0)}, DOI: {paper.get('doi', 'N/A')}")
    print()

Found 25 Long COVID papers
1. Neurologic Manifestations of Long COVID Disproportionately A...
   Citations: 12, DOI: 10.1002/ana.27128

2. Long COVID or Post-Acute Sequelae of SARS-CoV-2 Infection (P...
   Citations: 11, DOI: 10.12659/msm.946512

3. Real-world effectiveness and causal mediation study of BNT16...
   Citations: 13, DOI: 10.1016/j.eclinm.2024.102962



## 3. Process Papers with Unified Pipeline

Use the simplified pipeline to automatically handle XML download, parsing, enrichment, and RDF conversion.

In [3]:
# Force reload of modules to clear any cached bytecode
import sys


# Configure the unified pipeline
import os
from datetime import datetime

print(f'Starting pipeline processing at {datetime.now()}')

config = PipelineConfig(
    enable_enrichment=False,  # Disable enrichment to avoid API issues
    output_format="turtle",
    output_dir=os.path.join(os.getcwd(), "rdf_output")  # Use absolute path
)

# Create the unified pipeline
pipeline = PaperProcessingPipeline(config)

# Process just one paper for testing
print("Processing one paper with unified pipeline...")
print("(This automatically handles: XML download → parsing → RDF conversion)")
print()

results = {}
paper = papers[0]  # First paper
doi = paper.get('doi')
pmcid = paper.get('pmcid')

if doi:
    print(f"Processing paper: {doi}")
    try:
        # Single pipeline call replaces all the complex steps!
        result = pipeline.process_paper(
            xml_content=None,  # Pipeline will download XML automatically
            doi=doi,
            pmcid=pmcid,
            save_rdf=True,
            filename_prefix="long_covid_"
        )
        results[doi] = result
        print(f"  ✓ Generated {result['triple_count']} RDF triples")
        print(f"  Output file: {result['output_file']}")
    except Exception as e:
        print(f"  ✗ Failed: {e}")
        import traceback
        traceback.print_exc()
else:
    print("  ✗ No DOI available for paper")

print(f"\nSuccessfully processed {len(results)} papers")

Starting pipeline processing at 2025-11-25 10:50:10.403045
Processing one paper with unified pipeline...
(This automatically handles: XML download → parsing → RDF conversion)

Processing paper: 10.1002/ana.27128
  ✓ Generated 2830 RDF triples
  Output file: /home/jhe24/AID-PAIS/pyEuropePMC_project/examples/rdf_output/long_covid_10_1002_ana_27128.ttl

Successfully processed 1 papers
  ✓ Generated 2830 RDF triples
  Output file: /home/jhe24/AID-PAIS/pyEuropePMC_project/examples/rdf_output/long_covid_10_1002_ana_27128.ttl

Successfully processed 1 papers
