Skip to content

Search API (v2)

Noah Santacruz edited this page Dec 5, 2023 · 9 revisions

POST /api/search-wrapper

Simpler search API than Search API (v1) which exposes the main search functionality available on the site. Your POST should include the Content-Type: application/json; charset=utf-8 header. In the POST body, you can include the following keys:

Parameter Type Description
query str (required) Your search query
type one_of('text', 'sheet') (required) The index you want to query. See Search API (v2)#Index Types for an explanation of each index
field str (default=exact) The field you want to query. Common fields to query are exact or naive_lemmatizer for the text and merged indexes. For querying the sheets index, commonly you'll query the content field
source_proj bool, str or list(str) (default=false) By default, the ElasticSearch document is not returned. Specifying True will return the entire document. Specifying a str or list(str) will perform a projection on the document for the specified fields
slop int (default=0) The maximum distance between each query word in the resulting document. 0 means an exact match must be found
start int (default=0) For paginating results. The number document to start returning. 0 means start at the first result
size int (default=100) For paginating results. The total number of results to return, starting from start
filters list(str) (default=[]) A list of filters to filter results. These filters cannot include RegEx. Any RegEx characters will be escaped. Each filter is applied to the corresponding field in the filter_fields list. E.g. if filters is ["Passover", "Torah Talks"] and filter_fields is ["topics_en", "collections"] then the "Passover" filter will be applied to the "topics_en" field and the "Torah Talks" filter will be applied to the "collections" field. For text queries, filters always applies to the path field of documents. This essentially corresponds to the category path of the book in Sefaria's table of contents (there are some differences with regards to commentary paths). For sheet queries, filters can be applied to collections, topics_en or topics_he. These fields are explained in filter_fields below.
filter_fields list(str) (default=[] required if filters is specified) Must be the same length as filters. Each entry specifies the field to apply the corresponding filter in filters. For queries of type text this has no effect since there's only one field to filter text queries on (path. this field is explained in filters above). For sheet queries, the following fields can appear in filter_fields: collections (corresponds to the collections that the sheet is in), topics_en (corresponds to the topics for this sheet, translated into English), topics_he (corresponds to the topics for this sheet, translated into Hebrew).
aggs list(str) (default=[]) List of fields to aggregate on. Common fields are path for the text type and group or topics for the sheet type
sort_method one_of('sort', 'score') (default=sort) How to sort results. If sort, the values are sorted according to sort_fields. If score, the value in sort_fields is multiplied with the default ElasticSearch score.
sort_fields list(str) List of fields to sort on. If sort_method = 'score' this list should have exactly one item. Common fields to sort on are comp_date, order, pagesheetrank, dateCreated, views
sort_reverse bool (default=False) Whether or not to reverse the sort applied on sort_fields
sort_score_missing float (default=0) The number used in case there is a value missing in your sort_field

Examples

Search for 'Moshe' in any text.

POST /api/search-wrapper
{
  "query": "Moshe",
  "type": "text"
}

In cURL:

curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "text"}'

Search for 'Moshe' in any source sheet.

POST /api/search-wrapper
{
  "query": "Moshe",
  "type": "sheet",
  "field": "content"  // NOTE: must specify field as 'content' when querying sheets
}

In cURL:

curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "sheet", "field": "content"}'

Inexact search for משה רבנו in any text.

Results can have a separation of maximum 10 words between search terms. Also, search terms can have prefixes and can be spelled in מלא/חסר. Inexact search only works for Hebrew queries.

POST /api/search-wrapper
{
  "query": "משה רבנו",
  "type": "text",
  "field": "naive_lemmatizer",
  "slop": 10  // Maximum distance b/w search terms is 10 words
}

In cURL:

curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "משה רבנו","type": "text", "field": "naive_lemmatizer", "slop": 10}'

Search for 'Moshe' in Talmud Bavli, Mesekhet Berakhot OR anywhere in Midrash

POST /api/search-wrapper
{
  "query": "Moshe",
  "type": "text",
  "filters": ["Talmud/Bavli/Berakhot", "Midrash"],
  "filter_fields": ["path", "path"]
}

In cURL:

curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "text", "filters": ["Talmud/Bavli/Seder Zeraim/Berakhot", "Midrash"], "filter_fields": ["path", "path"]}'

Return value

The API returns results in the standard ElasticSearch format. See Search API (v1) for a brief explanation