# PyEuropePMC Basic Usage Examples

This notebook demonstrates various ways to use PyEuropePMC to search and retrieve scientific literature from Europe PMC.

## Setup

First, let's import the necessary libraries and configure logging.

In [1]:
import logging
import pprint
from typing import Any, Dict, List, Optional

# Import the main classes
from pyeuropepmc.search import EuropePMCError, SearchClient

# Configure logging to see what's happening
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

## Basic Search Example

Let's start with a simple search for papers about CRISPR gene editing.

In [None]:
print("=== Basic Search Example ===")

with SearchClient(rate_limit_delay=0.5) as client:
    try:
        # Simple search
        results: Dict[str, Any] = client.search(
            "CRISPR gene editing", page_size=5, format="json"
        ) # type: ignore

        print(f"Found {results.get('hitCount', 0)} total papers")
        print(f"Retrieved {len(results.get('resultList', {}).get('result', []))} papers")

        # Display first few results
        for i, paper in enumerate(results.get("resultList", {}).get("result", [])[:3]):
            print(f"\n{i + 1}. {paper.get('title', 'No title')}")
            print(f"   Authors: {paper.get('authorString', 'N/A')}")
            print(f"   Journal: {paper.get('journalTitle', 'N/A')}")
            print(f"   Year: {paper.get('pubYear', 'N/A')}")

    except EuropePMCError as e:
        print(f"Search failed: {e}")

2025-07-03 11:37:21,213 - INFO - Performing search with params: {'query': 'CRISPR gene editing', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 5, 'format': 'json', 'cursorMark': '*', 'sort': ''}


=== Basic Search Example ===


2025-07-03 11:37:21,404 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


Found 103397 total papers
Retrieved 5 papers

1. Lipid Nanoparticles for Delivery of CRISPR Gene Editing Components.
   Authors: Wu F, Li N, Xiao Y, Palanki R, Yamagata H, Mitchell MJ, Han X.
   Journal: Small Methods
   Year: 2025

2. Transforming Pharmacogenomics and CRISPR Gene Editing with the Power of Artificial Intelligence for Precision Medicine.
   Authors: Srivastav AK, Mishra MK, Lillard JW, Singh R.
   Journal: Pharmaceutics
   Year: 2025

3. Vectors in CRISPR Gene Editing for Neurological Disorders: Challenges and Opportunities.
   Authors: Xiong K, Wang X, Feng C, Zhang K, Chen D, Yang S.
   Journal: Adv Biol (Weinh)
   Year: 2025


### Raw Output Inspection

Let's examine the raw structure of a search result:

In [10]:
# Show raw output of the first result
if results.get("resultList", {}).get("result", []):
    print("Raw output of first result:")
    first_result: Dict[str, Any] = results.get("resultList", {}).get("result", [])[0]
    pprint.pprint(first_result)

Raw output of first result:
{'authorString': 'Sikkema L, Lkandushi M, Scarcella D, Moinfar AA, Engelmann '
                 'J, Theis FJ.',
 'bookOrReportDetails': {'publisher': 'bioRxiv', 'yearOfPublication': 2025},
 'citedByCount': 0,
 'doi': '10.1101/2025.05.23.655749',
 'firstIndexDate': '2025-05-30',
 'firstPublicationDate': '2025-05-28',
 'fullTextIdList': {'fullTextId': ['PPR1027522']},
 'hasBook': 'N',
 'hasDbCrossReferences': 'N',
 'hasLabsLinks': 'Y',
 'hasPDF': 'Y',
 'hasReferences': 'Y',
 'hasSuppl': 'Y',
 'hasTMAccessionNumbers': 'N',
 'hasTextMinedTerms': 'Y',
 'id': 'PPR1027522',
 'inEPMC': 'Y',
 'inPMC': 'N',
 'isOpenAccess': 'Y',
 'pubType': 'preprint',
 'pubYear': '2025',
 'source': 'PPR',
 'title': 'Automated evaluation of single-cell reference atlas mappings '
          'enables the identification of disease-associated cell states'}


## Advanced Search with Parsing

Now let's demonstrate more advanced search functionality with automatic parsing.

In [4]:
print("=== Advanced Search with Parsing ===")

with SearchClient() as client:
    try:
        # Search and parse results automatically
        papers: List[Dict[str, Any]] = client.search_and_parse(
            query="COVID-19 AND vaccine",
            format="json",
            pageSize=10,
            sort="CITED desc",  # Most cited first
        )

        print(f"Retrieved {len(papers)} papers")

        # Display top cited papers
        for i, paper in enumerate(papers[:5]):
            citations: int = paper.get("citedByCount", 0)
            print(f"\n{i + 1}. [{citations} citations] {paper.get('title', 'No title')}")
            print(f"   DOI: {paper.get('doi', 'N/A')}")

    except EuropePMCError as e:
        print(f"Advanced search failed: {e}")

2025-07-03 11:37:21,950 - INFO - Performing search with params: {'query': 'COVID-19 AND vaccine', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 10, 'format': 'json', 'cursorMark': '*', 'sort': 'CITED desc'}


=== Advanced Search with Parsing ===


2025-07-03 11:37:22,177 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


Retrieved 10 papers

1. [10245 citations] Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine.
   DOI: 10.1056/nejmoa2034577

2. [8661 citations] Integrated analysis of multimodal single-cell data.
   DOI: 10.1016/j.cell.2021.04.048

3. [7266 citations] Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine.
   DOI: 10.1056/nejmoa2035389

4. [6567 citations] Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis.
   DOI: 10.1016/s0140-6736(21)02724-0

5. [6170 citations] Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.
   DOI: 10.1126/science.abb2507


## Pagination Example

For large result sets, we can automatically fetch multiple pages.

In [5]:
print("=== Pagination Example ===")

with SearchClient() as client:
    try:
        # Fetch multiple pages automatically
        all_papers: List[Dict[str, Any]] = client.fetch_all_pages(
            query="machine learning bioinformatics",
            page_size=25,
            max_results=100,  # Limit to 100 papers total
        )

        print(f"Retrieved {len(all_papers)} papers across multiple pages")

        # Analyze publication years
        years: Dict[str, int] = {}
        for paper in all_papers:
            year: Optional[str] = paper.get("pubYear")
            if year:
                years[year] = years.get(year, 0) + 1

        print("\nPublication years distribution:")
        for year in sorted(years.keys(), reverse=True)[:5]:
            print(f"  {year}: {years[year]} papers")

    except EuropePMCError as e:
        print(f"Pagination example failed: {e}")

2025-07-03 11:37:23,203 - INFO - Performing search with params: {'query': 'machine learning bioinformatics', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 25, 'format': 'json', 'cursorMark': '*', 'sort': ''}


=== Pagination Example ===


2025-07-03 11:37:23,453 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:24,457 - INFO - Performing search with params: {'query': 'machine learning bioinformatics', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 25, 'format': 'json', 'cursorMark': 'AoIIQJYM0ig1MzE3Mzk0Ng==', 'sort': ''}
2025-07-03 11:37:24,457 - INFO - Performing search with params: {'query': 'machine learning bioinformatics', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 25, 'format': 'json', 'cursorMark': 'AoIIQJYM0ig1MzE3Mzk0Ng==', 'sort': ''}
2025-07-03 11:37:24,655 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:24,655 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:25,658 - INFO - Performing search with params: {'query': 'machine learning bioinformatics', 'resultType': 'lite', '

Retrieved 100 papers across multiple pages

Publication years distribution:
  2025: 96 papers
  2024: 3 papers
  2021: 1 papers


## Search Parameters Example

Let's explore various search parameters and query syntax.

In [None]:
print("=== Search Parameters Example ===")

with SearchClient() as client:
    try:
        # Search with specific parameters
        results: Dict[str, Any] = client.search(
            query='AUTHOR:"Smith J" AND JOURNAL:"Nature"',
            resultType="core",  # Get full metadata
            pageSize=5,
            format="json",
        ) # type: ignore

        papers: List[Dict[str, Any]] = results.get("resultList", {}).get("result", [])
        print(f"Found {len(papers)} papers by 'Smith J' in 'Nature'")

        for paper in papers:
            print(f"\n- {paper.get('title', 'No title')}")
            abstract: str = paper.get("abstractText", "N/A")
            print(f"  Abstract: {abstract[:100]}...")

    except EuropePMCError as e:
        print(f"Parameter search failed: {e}")

2025-07-03 11:37:28,137 - INFO - Performing search with params: {'query': 'AUTHOR:"Smith J" AND JOURNAL:"Nature"', 'resultType': 'core', 'synonym': 'FALSE', 'pageSize': 5, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:28,298 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:28,298 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


=== Search Parameters Example ===
Found 5 papers by 'Smith J' in 'Nature'

- Daily briefing: Researchers re-enact an epic ancient canoe trip.
  Abstract: N/A...

- Daily briefing: Some moths can navigate using starlight.
  Abstract: N/A...

- Daily briefing: Physical masks created using AI can restore damaged paintings.
  Abstract: N/A...

- Daily briefing: Travel bans, revoked visas and yet more funding cuts - the latest in Trump's attack on science.
  Abstract: N/A...

- Daily briefing: The sweet smell of outer space.
  Abstract: N/A...
Found 5 papers by 'Smith J' in 'Nature'

- Daily briefing: Researchers re-enact an epic ancient canoe trip.
  Abstract: N/A...

- Daily briefing: Some moths can navigate using starlight.
  Abstract: N/A...

- Daily briefing: Physical masks created using AI can restore damaged paintings.
  Abstract: N/A...

- Daily briefing: Travel bans, revoked visas and yet more funding cuts - the latest in Trump's attack on science.
  Abstract: N/A...

- Daily brief

## Hit Count Example

Sometimes we just want to know how many papers match our query without retrieving the results.

In [7]:
print("=== Hit Count Example ===")

with SearchClient() as client:
    queries: List[str] = [
        "artificial intelligence",
        "CRISPR",
        "COVID-19",
        "climate change biology",
    ]

    for query in queries:
        try:
            count: int = client.get_hit_count(query)
            print(f"'{query}': {count:,} papers")
        except EuropePMCError as e:
            print(f"Failed to get count for '{query}': {e}")

2025-07-03 11:37:29,324 - INFO - Performing search with params: {'query': 'artificial intelligence', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}


=== Hit Count Example ===


2025-07-03 11:37:29,505 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:30,508 - INFO - Performing search with params: {'query': 'CRISPR', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:30,508 - INFO - Performing search with params: {'query': 'CRISPR', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:30,556 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:30,556 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


'artificial intelligence': 292,313 papers


2025-07-03 11:37:31,560 - INFO - Performing search with params: {'query': 'COVID-19', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:31,703 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:31,703 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


'CRISPR': 184,279 papers


2025-07-03 11:37:32,706 - INFO - Performing search with params: {'query': 'climate change biology', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:32,799 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:32,799 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


'COVID-19': 901,665 papers
'climate change biology': 112,262 papers
'climate change biology': 112,262 papers


## Error Handling Example

Let's demonstrate how to handle various error conditions gracefully.

In [None]:
print("=== Error Handling Example ===")

with SearchClient() as client:
    # Try various problematic queries
    problematic_queries: List[str] = [
        "",  # Empty query
        "a",  # Too short
        'query with "unmatched quotes',  # Invalid syntax
        "normal query",  # This should work
    ]

    for query in problematic_queries:
        try:
            if client.validate_query(query):
                results: Dict[str, Any] = client.search(query, pageSize=1) # type: ignore
                print(f"✓ '{query}': {results.get('hitCount', 0)} results")
            else:
                print(f"✗ '{query}': Invalid query")
        except EuropePMCError as e:
            print(f"✗ '{query}': {e}")

2025-07-03 11:37:33,830 - INFO - Performing search with params: {'query': 'normal query', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:33,830 - INFO - Performing search with params: {'query': 'normal query', 'resultType': 'lite', 'synonym': 'FALSE', 'pageSize': 1, 'format': 'json', 'cursorMark': '*', 'sort': ''}
2025-07-03 11:37:33,963 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200
2025-07-03 11:37:33,963 - INFO - GET request to https://www.ebi.ac.uk/europepmc/webservices/rest/search succeeded with status 200


=== Error Handling Example ===
✗ '': Invalid query
✗ 'a': Invalid query
✗ 'query with "unmatched quotes': Invalid query
✓ 'normal query': 82997 results
✓ 'normal query': 82997 results


## Summary

This notebook demonstrated:

1. **Basic search functionality** - Simple queries and result display
2. **Advanced search with parsing** - Using built-in parsing methods
3. **Pagination** - Handling large result sets across multiple pages
4. **Search parameters** - Using specific query syntax and parameters
5. **Hit counts** - Getting result counts without full retrieval
6. **Error handling** - Gracefully handling various error conditions

The PyEuropePMC SearchClient provides a robust interface for searching scientific literature with built-in error handling, rate limiting, and pagination support.