# Search String Construction

This script generates a search string for a database of academic publications. The search string is based on **concepts**, where a search should be concerned with multiple concepts. Since terminology may be ambiguous, this search string construction considers synonyms for each concept (e.g., "effort" might be referenced as "cost" in different publications).

The example below constructs a search string on the topic of *accuracy* of *effort estimation* for *software*. The string is designed to be executed on the [ACM Digital Library](https://dl.acm.org/search/advanced).

### Search fields

First, define all fields in which the search terms shall appear. This could be `TITLE-ABS-KEY` or, if the database does not offer this cumulative field (like the [ACM Guide to Computing Literature](https://libraries.acm.org/digital-library/acm-guide-to-computing-literature)), `Title`, `Abstract`, and `Keyword`.

In [11]:
fields = ["Title", "Abstract", "Keyword"]

### Concepts and Synonyms

Secondly, identify all **concepts** that need to appear in those fields. Construct **one list of synonyms for each concept**. In this example, the concepts *software*, *effort*, *estimation*, and *accuracy* are required to appear in the previously defined fields.

In [6]:
terms_synonyms = [
    ['software', '"information system"'],
    ['effort', 'budget', 'cost'],
    ['estimat*', 'evaluat*', 'predict*'],
    ['precis*', 'accura*']
]

### String construction

Finally, assemble the search string that ensures that *for every concept, at least one synonym is mentioned in at least one of the fields*.

In [10]:
search_string = "(" + ") AND (".join([" OR ".join([f'{field}:({" OR ".join(terms)})' for field in fields]) for terms in terms_synonyms]) + ")"

In [9]:
print(search_string)

(Title:(software OR "information system") OR Abstract:(software OR "information system") OR Keyword:(software OR "information system")) AND (Title:(effort OR budget OR cost) OR Abstract:(effort OR budget OR cost) OR Keyword:(effort OR budget OR cost)) AND (Title:(estimat* OR evaluat* OR predict*) OR Abstract:(estimat* OR evaluat* OR predict*) OR Keyword:(estimat* OR evaluat* OR predict*)) AND (Title:(precis* OR accura*) OR Abstract:(precis* OR accura*) OR Keyword:(precis* OR accura*))
