# EuropePMCParser Demo
This notebook demonstrates the main parsing functions of `EuropePMCParser` for JSON, XML, and Dublin Core XML responses.

In [24]:
# Import the parser class
from pyeuropepmc.parser import EuropePMCParser

In [25]:
# Load fixture data for demo
import json
from pathlib import Path

# Paths to fixture files
json_path = Path('../tests/fixtures/search_cancer.json')
xml_path = Path('../tests/fixtures/search_cancer.xml')
dc_xml_path = Path('../tests/fixtures/search_cancer_dc.xml')

# Load JSON data
with open(json_path, 'r', encoding='utf-8') as f:
    real_json_data = json.load(f)

# Load XML data
with open(xml_path, 'r', encoding='utf-8') as f:
    real_xml_data = f.read()

# Load DC XML data
with open(dc_xml_path, 'r', encoding='utf-8') as f:
    real_dc_xml_data = f.read()

## 1. JSON Parsing
Parse a typical Europe PMC JSON response.

In [26]:
# Parse real Europe PMC JSON response from fixture
parsed_json = EuropePMCParser.parse_json(real_json_data)
parsed_json[:3]  # Show first 3 results for brevity

[{'id': '39709209',
  'source': 'MED',
  'pmid': '39709209',
  'doi': '10.1016/s0140-6736(24)02600-x',
  'title': 'Abscopal response in a patient with fibrolamellar hepatocellular carcinoma following radiotherapy.',
  'authorString': "Schoenfeld JD, Haas-Kogan DA, O'Neill AF.",
  'journalTitle': 'Lancet',
  'issue': '10471',
  'journalVolume': '404',
  'pubYear': '2025',
  'journalIssn': '0140-6736; 1474-547x; ',
  'pageInfo': '2603-2604',
  'pubType': 'journal article',
  'isOpenAccess': 'N',
  'inEPMC': 'N',
  'inPMC': 'N',
  'hasPDF': 'N',
  'hasBook': 'N',
  'hasSuppl': 'N',
  'citedByCount': 0,
  'hasReferences': 'N',
  'hasTextMinedTerms': 'N',
  'hasDbCrossReferences': 'N',
  'hasLabsLinks': 'N',
  'hasTMAccessionNumbers': 'N',
  'firstIndexDate': '2024-12-22',
  'firstPublicationDate': '2025-12-01'},
 {'id': '39709208',
  'source': 'MED',
  'pmid': '39709208',
  'pmcid': 'PMC12124214',
  'fullTextIdList': {'fullTextId': ['PMC12124214']},
  'doi': '10.1016/s0140-6736(24)02716-8'

## 2. XML Parsing
Parse a Europe PMC XML response.

In [27]:
# Parse real Europe PMC XML response from fixture
parsed_xml = EuropePMCParser.parse_xml(real_xml_data)
parsed_xml[:3]  # Show first 3 results for brevity

[{'id': '39709209',
  'source': 'MED',
  'pmid': '39709209',
  'doi': '10.1016/s0140-6736(24)02600-x',
  'title': 'Abscopal response in a patient with fibrolamellar hepatocellular carcinoma following radiotherapy.',
  'authorString': "Schoenfeld JD, Haas-Kogan DA, O'Neill AF.",
  'journalTitle': 'Lancet',
  'issue': '10471',
  'journalVolume': '404',
  'pubYear': '2025',
  'journalIssn': '0140-6736; 1474-547x; ',
  'pageInfo': '2603-2604',
  'pubType': 'journal article',
  'isOpenAccess': 'N',
  'inEPMC': 'N',
  'inPMC': 'N',
  'hasPDF': 'N',
  'hasBook': 'N',
  'hasSuppl': 'N',
  'citedByCount': '0',
  'hasReferences': 'N',
  'hasTextMinedTerms': 'N',
  'hasDbCrossReferences': 'N',
  'hasLabsLinks': 'N',
  'hasTMAccessionNumbers': 'N',
  'firstIndexDate': '2024-12-22',
  'firstPublicationDate': '2025-12-01'},
 {'id': '39709208',
  'source': 'MED',
  'pmid': '39709208',
  'pmcid': 'PMC12124214',
  'fullTextIdList': None,
  'doi': '10.1016/s0140-6736(24)02716-8',
  'title': 'Department 

## 3. Dublin Core XML Parsing
Parse a Europe PMC Dublin Core XML response.

In [28]:
# Parse real Europe PMC Dublin Core XML response from fixture
parsed_dc = EuropePMCParser.parse_dc(real_dc_xml_data)
parsed_dc[:3]  # Show first 3 results for brevity

[{'title': 'Abscopal response in a patient with fibrolamellar hepatocellular carcinoma following radiotherapy.',
  'creator': ['Schoenfeld, JD', 'Haas-Kogan, DA', "O'Neill, AF"],
  'contributor': "Department of Radiation Oncology, Brigham and Women's Hospital, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA. Electronic address: jdschoenfeld@partners.org.",
  'description': 'Journal Article',
  'date': '2025-12-01',
  'created': '2025-12-01',
  'identifier': ['http://europepmc.org/abstract/MED/39709209',
   'https://doi.org/10.1016/s0140-6736(24)02600-x'],
  'type': 'Text',
  'language': 'eng',
  'bibliographicCitation': ["Schoenfeld JD, Haas-Kogan DA, O'Neill AF. Abscopal response in a patient with fibrolamellar hepatocellular carcinoma following radiotherapy. Lancet. 2025 Dec;404(10471):2603-2604. doi:10.1016/s0140-6736(24)02600-x. PubMed PMID:39709209.",
   '&ctx_ver=Z39.88-2004&rft.jtitle=Lancet%20%28London%2C%20England%29&rft.stitle=Lancet&rft.volume=404&rf

## 4. Error Handling Demo
Show how the parser handles invalid input gracefully.

In [29]:
# Invalid JSON: result is not a list
invalid_json = {
    'hitCount': 1,
    'resultList': {'result': 'not a list'}
}
EuropePMCParser.parse_json(invalid_json)  # Returns []

Expected list but got str, returning empty results


[]

In [30]:
# Invalid XML: malformed
try:
    EuropePMCParser.parse_xml('<malformed><xml>')
except Exception as e:
    print(type(e).__name__, e)  # Should show ET.ParseError

XML parsing error: no element found: line 1, column 16. The response appears malformed.


ParseError no element found: line 1, column 16


In [31]:
# Invalid DC XML: malformed
try:
    EuropePMCParser.parse_dc('<malformed><xml>')
except Exception as e:
    print(type(e).__name__, e)  # Should show ET.ParseError

Dublin Core XML parsing error: no element found: line 1, column 16. The response appears malformed.


ParseError no element found: line 1, column 16
