# Using `search-query` for Literature Searches

<p style="max-width: 90ch; line-height: 1.5;">
  This notebook demonstrates how the <code>search-query</code> Python package supports reproducible and programmable academic search strategies by organizing the search process around a query object that can be created programmatically or parsed from an existing string or JSON file. Once created, a query can be <i>linted</i> to identify quality defects, such as syntactic errors, <i>translated</i> to adapt the query string to different database syntaxes (e.g., PubMed vs. Web of Science), <i>improved</i> to iteratively refine and strengthen the search formulation, and <i>automated</i> to run API searches within scripts, command-line workflows, or other environments. Throughout, queries can be saved to and loaded from JSON files, supporting versioning, reuse, and collaborative development of search strategies.
</p>

```mermaid
flowchart TD
    %% External artifact
    J[(JSON query file)]

    C[Create a query object]
    C .-> Q
    %% Query object as a subgraph
    subgraph Q[Query object]
        direction LR

        %% Any combination, starting right after create
        Auto[Automate]
        Trans[Translate]
        Imp[Improve]
        Lint[Lint]

        Imp <--> Lint
        Trans <--> Auto
        Imp <--> Auto
        Lint <--> Trans


    end

    %% Interfacing with file via annotated dotted lines (no Save/Load boxes)
    Q -. "save" .-> J
    J -. "load" .-> Q

    %% ===== Styling to resemble the example =====
    style J fill:#ffffff,stroke:#333,stroke-width:2px
    style C fill:#ffffff,stroke:#333,stroke-width:2px
    style Auto fill:#ffffff,stroke:#333,stroke-width:2px
    style Trans fill:#ffffff,stroke:#333,stroke-width:2px
    style Lint fill:#ffffff,stroke:#333,stroke-width:2px
    style Imp fill:#ffffff,stroke:#333,stroke-width:2px
    style Q fill:#f5f5f5,stroke:#666,stroke-width:1px
```

## Installation (if needed)

<p style="max-width: 90ch; line-height: 1.5;">
The <code>search-query</code> package should be installed automatically in Binder.
If you run this notebook locally and do not have `search-query` installed, uncomment and run the next cell.
</p>


In [None]:
# !pip install search-query

## Create a query object

To create a query object, there are two options: a) create a query programmatically, or b) parse a query from a string:

### a) Programmatically



In [None]:
from search_query import OrQuery, AndQuery

digital_synonyms = OrQuery(["digital", "virtual", "online"], field="abstract")
work_synonyms = OrQuery(["work", "labor", "service"], field="abstract")

query = AndQuery([digital_synonyms, work_synonyms])

print(query.to_string())

When building queries programmatically, use **canonical generic field tokens** (e.g., `abstract`, `title`, `keywords`).

### b) Parse from a string

Parsing platform syntax is a core feature.

Example PubMed query:


In [None]:
from search_query.parser import parse

query_string = '("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract])'
pubmed_query = parse(query_string, platform="pubmed")

# `pubmed_query` is now a Query object that can be translated or rendered.
print(pubmed_query.to_string())

## Lint a query


<p style="max-width: 90ch; line-height: 1.5;">
Search queries are prone to subtle but impactful errors, ranging from unbalanced parentheses to unsupported fields or database-specific constraints. The <code>search-query</code> linters help detect such issues early and provide precise, actionable feedback‚Äîcovering parsing errors, structural problems, term and field issues, as well as platform-specific constraints (e.g., PubMed or Web of Science).
</p>

<p style="max-width: 90ch; line-height: 1.5;">
By limiting fatal errors during exploratory workflows, you can surface these diagnostics without interrupting the overall analysis. This makes it easier to iteratively refine queries, compare variants, and understand quality defects before running searches in external databases.
</p>

<p style="max-width: 90ch; line-height: 1.5;">
In this example, we intentionally parse a malformed query (missing a closing parenthesis) to illustrate how the parser reports a fatal parsing error with a clear explanation and location hint. For a full overview of supported lint categories and best-practice checks‚Äîincluding parsing errors, query structure errors, term and field errors, database-specific constraints, and quality warnings (see the <a href="https://colrev-environment.github.io/search-query/lint/index.html">Lint documentation</a>).
</p>


In [None]:
bad_query = '("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract]'

try:
    parse(bad_query, platform="pubmed")
    print("‚ùå Unexpected: bad query parsed without a fatal error.")
except Exception as exc:
    print(f"\n‚úÖ Linter demo: parse() raised an error (expected): {type(exc).__name__}")

## Translate a query

<p style="max-width: 90ch; line-height: 1.5;">
Systematic literature searches typically involve multiple databases (e.g., PubMed, Web of Science, EBSCOHost). Because each platform uses its own query syntax and field conventions, search strategies need to be translated and adapted accordingly to ensure comparable retrieval across sources.
</p>

<p style="max-width: 90ch; line-height: 1.5;">
Here, we translate a parsed PubMed query to Web of Science syntax. Depending on semantics, some fields may expand during translation‚Äîfor example, PubMed <code>[Title/Abstract]</code> can map to <code>TI=</code> OR <code>AB=</code> in Web of Science.
</p>

In [None]:
query_string = '("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract])'
pubmed_query = parse(query_string, platform="pubmed")

wos_query = pubmed_query.translate(target_syntax="wos")

print(wos_query.to_string())

## Improve and automate a query

<p style="max-width: 90ch; line-height: 1.5;">
Programmatic access to search queries enables a wide range of use cases related to both <strong>query improvement</strong> and <strong>automation</strong>.
</p>

<div style="max-width: 90ch; line-height: 1.5;">
<ul>
<li><strong>Query improvement</strong> typically focuses on <i>local</i> and exploratory workflows. It may involve systematic modifications‚Äîsuch as query expansion or structural simplification‚Äîfollowed by evaluating query performance on pre-classified datasets. This makes it possible to iteratively refine search queries and assess how different formulations affect recall and precision.</li>
<li><strong>Automation</strong>, in contrast, usually targets <i>online</i> workflows and external systems. Typical use cases include retrieving records from APIs (e.g., Crossref) or running multiple query variants against live databases to compare yields across research scopes, date restrictions, field specifications, or keyword combinations. Such experiments help understand, justify, and operationalize search strategies.</li>
</ul>
</div>

<p style="max-width: 90ch; line-height: 1.5;">
To support these workflows, researchers can write Python code that programmatically interacts with the <code>search-query</code> package and query objects.
The documentation provides practical examples for both <a href="https://colrev-environment.github.io/search-query/improve.html">query improvement</a> and <a href="https://colrev-environment.github.io/search-query/automate.html">automation</a>.
</p>


## Save a query JSON file

This is useful for reproducible workflows and sharing exact search strategies.


In [None]:
from search_query.parser import parse
from search_query import SearchFile

query_string = '("digital health"[Title]) AND ("privacy"[Title])'
pubmed_query = parse(query_string, platform="pubmed")

search_file = SearchFile(
    search_string=pubmed_query.to_string(),
    platform="pubmed",
    version="1",
    authors=[{"name": "Gerit Wagner"}],
    record_info={},
    date={}
)

out_path = "pubmed-search-file.json"
search_file.save(out_path)
print(f"‚úÖ Saved: {out_path}")

## Load a query JSON file

<p style="max-width: 90ch; line-height: 1.5;">
This closes the loop: saving and loading queries enables iterative refinement over time‚Äîwhether you update a search strategy, version it in a Git repository, or share exact queries with collaborators. The following example shows how to load a previously saved query file.
</p>



In [None]:
from search_query.search_file import load_search_file
from search_query.parser import parse

search = load_search_file("pubmed-search-file.json")
query = parse(search.search_string, platform=search.platform)

print("Loaded platform:", search.platform)
print(query.to_string())

---

## ‚úÖ Completed ‚Äî What we learned

üéâüéà You have completed the `search-query` demo notebook ‚Äî good work! üéàüéâ

In this notebook, we walked through the full lifecycle of search queries:

- Create queries programmatically or parse them from strings / JSON files  
- Lint queries to detect quality defects early  
- Translate queries across platforms (e.g., PubMed ‚Üî Web of Science)  
- Save and reload queries as reusable JSON search files  

<p style="max-width: 90ch; line-height: 1.5;">
Together, these steps show how search queries can be treated as first-class, versionable research artifacts‚Äîsupporting reproducible and transparent literature searches.
</p>
