# Europe PMC Advanced Filtering Example

This notebook demonstrates how to use `filter_pmc_papers` to find high-quality scientific papers from Europe PMC using advanced filtering criteria such as citations, MeSH terms, keywords, and abstract content.

In [None]:
from pyeuropepmc import SearchClient, filter_pmc_papers, filter_pmc_papers_or

## 1. Perform a Broad Search

We start by searching for papers on **cancer immunotherapy**. We use `resultType="core"` to retrieve full metadata, including MeSH terms.

In [None]:
client = SearchClient()
query = "cancer immunotherapy"
response = client.search(query, pageSize=500, resultType="core", sort="cited DESC")
papers = response.get("resultList", {}).get("result", []) if isinstance(response, dict) else []
print(f'Total papers found: {len(papers)}')

In [None]:
papers

## 2. Filter for High-Quality Review Papers

- At least 10 citations
- Published in 2020 or later
- Type: Review or Systematic Review
- Open Access

In [None]:
filtered_reviews = filter_pmc_papers(
    papers,
    min_citations=10,
    min_pub_year=2020,
    allowed_types=("Review", "Systematic Review"),
    open_access="Y",
)
print(f'High-quality reviews found: {len(filtered_reviews)}')
if filtered_reviews:
    print('First result:')
    print(filtered_reviews[0])

### Example Results

Below are the top 3 high-quality review papers found:

In [None]:
for i, paper in enumerate(filtered_reviews[:3], 1):
    print(f"{i}. {paper['title']}")
    print(f"   Year: {paper['pubYear']}, Citations: {paper['citedByCount']}")
    print(f"   Authors: {', '.join(paper['authors'][:3])}")
    if paper['keywords']:
        print(f"   Keywords: {', '.join(paper['keywords'][:5])}")
    print()

## 3. Filter by MeSH Terms (Partial Matching)

Find papers with MeSH terms containing both 'neoplasm' and 'immuno' (partial, case-insensitive match).

In [None]:
filtered_mesh = filter_pmc_papers(
    papers,
    min_citations=5,
    required_mesh={"neoplasm", "immuno"},
)
print(f"Papers with required MeSH terms: {len(filtered_mesh)}")
if filtered_mesh:
    print('First result:')
    print(filtered_mesh[0])

In [None]:
filtered_mesh = filter_pmc_papers_or(
    papers,
    min_citations=5,
    required_mesh={"neoplasm", "immuno"},
)
print(f"Papers with required MeSH terms: {len(filtered_mesh)}")
if filtered_mesh:
    print('First result:')
    print(filtered_mesh[0])

## 4. Filter by Keywords (Partial Matching)

Find papers with keywords containing both 'checkpoint' and 'inhibitor'.

In [None]:
filtered_keywords = filter_pmc_papers(
    papers,
    min_citations=5,
    required_keywords={"cancer"},
)
print(f"Papers with required keywords: {len(filtered_keywords)}")
if filtered_keywords:
    print('First result:')
    print(filtered_keywords[0])

## 5. Filter by Abstract Content

Find papers whose abstract contains all of: 'clinical trial', 'efficacy', and 'safety'.

In [None]:
papers[0].get("abstractText")

In [None]:
filtered_abstract = filter_pmc_papers(
    papers,
    min_citations=0,
    required_abstract_terms={"immunity", "tumour"},
)
print(f"Papers with required abstract terms: {len(filtered_abstract)}")
if filtered_abstract:
    print('First result:')
    print(filtered_abstract[0])

In [None]:
filtered_abstract = filter_pmc_papers_or(
    papers,
    min_citations=0,
    required_abstract_terms={"immunity", "tumour"},
)
print(f"Papers with required abstract terms: {len(filtered_abstract)}")
if filtered_abstract:
    print('First result:')
    print(filtered_abstract[0])

## 6. Combine Multiple Filters for Precision

- At least 20 citations
- Published in 2021 or later
- Review or Systematic Review
- Open Access
- Keyword contains 'immuno'
- Abstract contains 'therapy'

In [None]:
filtered_combined = filter_pmc_papers(
    papers,
    min_citations=20,
    min_pub_year=2021,
    allowed_types=("Review", "Systematic Review"),
    open_access="Y",
    required_keywords={"immuno"},
    required_abstract_terms={"therapy"},
)
print(f"Papers meeting all criteria: {len(filtered_combined)}")
if filtered_combined:
    print('First result:')
    print(filtered_combined[0])
for i, paper in enumerate(filtered_combined[:2], 1):
    print(f"{i}. {paper['title']}")
    print(f"   Year: {paper['pubYear']}, Citations: {paper['citedByCount']}")
    print(f"   Type: {paper['pubType']}")
    print(f"   Open Access: {paper['isOpenAccess']}")
    print(f"   PMID: {paper.get('pmid', 'N/A')}, DOI: {paper.get('doi', 'N/A')}")
    print()

## Summary Statistics

In [None]:
print('Summary:')
print(f'Total papers found: {len(papers)}')
print(f'High-quality reviews: {len(filtered_reviews)}')
print(f'Papers with specific MeSH: {len(filtered_mesh)}')
print(f'Papers with specific keywords: {len(filtered_keywords)}')
print(f'Papers with abstract terms: {len(filtered_abstract)}')
print(f'Papers meeting all criteria: {len(filtered_combined)}')

## Filtering Tips

1. Use partial matching for flexibility (e.g., 'immuno' matches 'immunotherapy').
2. Combine filters to narrow results to high-quality papers.
3. Adjust `min_citations` based on your field (some fields have lower citation rates).
4. Use `resultType='core'` to get MeSH terms and full metadata.
5. MeSH terms are more standardized than keywords for biomedical topics.