# Europe PMC Advanced Filtering Example

This notebook demonstrates how to use `filter_pmc_papers` to find high-quality scientific papers from Europe PMC using advanced filtering criteria such as citations, MeSH terms, keywords, and abstract content.

In [None]:
from pyeuropepmc import SearchClient, filter_pmc_papers, filter_pmc_papers_or

## 1. Perform a Broad Search

We start by searching for papers on **cancer immunotherapy**. We use `resultType="core"` to retrieve full metadata, including MeSH terms.

In [None]:
client = SearchClient()
query = "cancer immunotherapy"
response = client.search(query, pageSize=500, resultType="core", sort="cited DESC")
papers = response.get("resultList", {}).get("result", []) if isinstance(response, dict) else []
print(f'Total papers found: {len(papers)}')

In [None]:
papers

## 2. Filter for High-Quality Review Papers

- At least 10 citations
- Published in 2020 or later
- Type: Review or Systematic Review
- Open Access

In [None]:
filtered_reviews = filter_pmc_papers(
    papers,
    min_citations=10,
    min_pub_year=2020,
    allowed_types=("Review", "Systematic Review"),
    open_access="Y",
)
print(f'High-quality reviews found: {len(filtered_reviews)}')
if filtered_reviews:
    print('First result:')
    print(filtered_reviews[0])

### Example Results

Below are the top 3 high-quality review papers found:

In [None]:
for i, paper in enumerate(filtered_reviews[:3], 1):
    print(f"{i}. {paper['title']}")
    print(f"   Year: {paper['pubYear']}, Citations: {paper['citedByCount']}")
    print(f"   Authors: {', '.join(paper['authors'][:3])}")
    if paper['keywords']:
        print(f"   Keywords: {', '.join(paper['keywords'][:5])}")
    print()

## 3. Filter by MeSH Terms: AND vs OR Logic

Let's compare AND vs OR logic with MeSH terms.

### AND Logic (filter_pmc_papers)
Papers must have MeSH terms containing BOTH 'neoplasm' AND 'immuno' (partial, case-insensitive match).

In [None]:
filtered_mesh_and = filter_pmc_papers(
    papers,
    min_citations=5,
    required_mesh={"neoplasm", "immuno"},
)
print(f"Papers with BOTH 'neoplasm' AND 'immuno' MeSH terms (AND logic): {len(filtered_mesh_and)}")
if filtered_mesh_and:
    print('First result:')
    print(f"  Title: {filtered_mesh_and[0]['title']}")
    print(f"  MeSH: {filtered_mesh_and[0].get('meshHeadingList', {}).get('meshHeading', [])[:5]}")

In [None]:
filtered_mesh_or = filter_pmc_papers_or(
    papers,
    min_citations=5,
    required_mesh={"neoplasm", "immuno"},
)
print(f"Papers with EITHER 'neoplasm' OR 'immuno' MeSH terms (OR logic): {len(filtered_mesh_or)}")
if filtered_mesh_or:
    print('First result:')
    print(f"  Title: {filtered_mesh_or[0]['title']}")
    print(f"  MeSH: {filtered_mesh_or[0].get('meshHeadingList', {}).get('meshHeading', [])[:5]}")

print(f"\nDifference: OR returned {len(filtered_mesh_or) - len(filtered_mesh_and)} more papers than AND")

### OR Logic (filter_pmc_papers_or)
Papers can have MeSH terms containing EITHER 'neoplasm' OR 'immuno' (or both). This typically returns more results.

## 4. Filter by Keywords: AND vs OR Logic

### AND Logic
Papers must have keywords containing ALL specified terms (e.g., 'cancer').

In [None]:
filtered_keywords_and = filter_pmc_papers(
    papers,
    min_citations=5,
    required_keywords={"checkpoint", "inhibitor"},
)
print(f"Papers with BOTH 'checkpoint' AND 'inhibitor' keywords (AND logic): {len(filtered_keywords_and)}")
if filtered_keywords_and:
    print('First result:')
    print(f"  Title: {filtered_keywords_and[0]['title']}")
    print(f"  Keywords: {filtered_keywords_and[0].get('keywordList', {}).get('keyword', [])[:5]}")

### OR Logic
Papers can have keywords containing EITHER 'checkpoint' OR 'inhibitor' (or both).

In [None]:
filtered_keywords_or = filter_pmc_papers_or(
    papers,
    min_citations=5,
    required_keywords={"checkpoint", "inhibitor"},
)
print(f"Papers with EITHER 'checkpoint' OR 'inhibitor' keywords (OR logic): {len(filtered_keywords_or)}")
if filtered_keywords_or:
    print('First result:')
    print(f"  Title: {filtered_keywords_or[0]['title']}")
    print(f"  Keywords: {filtered_keywords_or[0].get('keywordList', {}).get('keyword', [])[:5]}")

print(f"\nDifference: OR returned {len(filtered_keywords_or) - len(filtered_keywords_and)} more papers than AND")

## 5. Filter by Abstract Content: AND vs OR Logic

### AND Logic
Papers must have abstracts containing ALL specified terms (e.g., 'immunity' AND 'tumour').

In [None]:
papers[0].get("abstractText")

In [None]:
filtered_abstract_and = filter_pmc_papers(
    papers,
    min_citations=0,
    required_abstract_terms={"immunity", "tumour"},
)
print(f"Papers with BOTH 'immunity' AND 'tumour' in abstract (AND logic): {len(filtered_abstract_and)}")
if filtered_abstract_and:
    print('First result:')
    print(f"  Title: {filtered_abstract_and[0]['title']}")
    abstract = filtered_abstract_and[0].get('abstractText', '')[:200]
    print(f"  Abstract preview: {abstract}...")

In [None]:
filtered_abstract_or = filter_pmc_papers_or(
    papers,
    min_citations=0,
    required_abstract_terms={"immunity", "tumour"},
)
print(f"Papers with EITHER 'immunity' OR 'tumour' in abstract (OR logic): {len(filtered_abstract_or)}")
if filtered_abstract_or:
    print('First result:')
    print(f"  Title: {filtered_abstract_or[0]['title']}")
    abstract = filtered_abstract_or[0].get('abstractText', '')[:200]
    print(f"  Abstract preview: {abstract}...")

print(f"\nDifference: OR returned {len(filtered_abstract_or) - len(filtered_abstract_and)} more papers than AND")

### OR Logic
Papers can have abstracts containing EITHER 'immunity' OR 'tumour' (or both).

## 6. Combine Multiple Filters for Precision

- At least 20 citations
- Published in 2021 or later
- Review or Systematic Review
- Open Access
- Keyword contains 'immuno'
- Abstract contains 'therapy'

In [None]:
filtered_combined = filter_pmc_papers(
    papers,
    min_citations=20,
    min_pub_year=2021,
    allowed_types=("Review", "Systematic Review"),
    open_access="Y",
    required_keywords={"immuno"},
    required_abstract_terms={"therapy"},
)
print(f"Papers meeting all criteria: {len(filtered_combined)}")
if filtered_combined:
    print('First result:')
    print(filtered_combined[0])
for i, paper in enumerate(filtered_combined[:2], 1):
    print(f"{i}. {paper['title']}")
    print(f"   Year: {paper['pubYear']}, Citations: {paper['citedByCount']}")
    print(f"   Type: {paper['pubType']}")
    print(f"   Open Access: {paper['isOpenAccess']}")
    print(f"   PMID: {paper.get('pmid', 'N/A')}, DOI: {paper.get('doi', 'N/A')}")
    print()

## Summary Statistics

In [None]:
print('Summary:')
print(f'Total papers found: {len(papers)}')
print(f'High-quality reviews: {len(filtered_reviews)}')
print(f'\nAND Logic Results:')
print(f'  Papers with MeSH (AND): {len(filtered_mesh_and)}')
print(f'  Papers with keywords (AND): {len(filtered_keywords_and)}')
print(f'  Papers with abstract terms (AND): {len(filtered_abstract_and)}')
print(f'  Papers meeting all criteria (AND): {len(filtered_combined)}')
print(f'\nOR Logic Results:')
print(f'  Papers with MeSH (OR): {len(filtered_mesh_or)}')
print(f'  Papers with keywords (OR): {len(filtered_keywords_or)}')
print(f'  Papers with abstract terms (OR): {len(filtered_abstract_or)}')

## Filtering Tips

1. **AND vs OR Logic:**
   - Use `filter_pmc_papers` (AND) for precise, specific results where all criteria must match
   - Use `filter_pmc_papers_or` (OR) for broad, exploratory searches where any criteria can match
   
2. **Partial Matching:** Both functions use partial, case-insensitive matching (e.g., 'immuno' matches 'immunotherapy').

3. **Combining Filters:** Combine multiple criteria to narrow results to high-quality papers.

4. **Citation Thresholds:** Adjust `min_citations` based on your field (some fields have lower citation rates).

5. **Full Metadata:** Use `resultType='core'` to get MeSH terms and full metadata in your search.

6. **MeSH vs Keywords:** MeSH terms are more standardized than keywords for biomedical topics.

7. **Multi-Criteria OR:** When using `filter_pmc_papers_or` with multiple criteria types (e.g., MeSH + keywords + abstract), a paper matches if it satisfies ANY of the criteria sets.