# Course 4, Module 3: Deep Dive into Elasticsearch Search and DSL

Now that we know how to perform basic CRUD operations, it's time to explore Elasticsearch's main feature: **search**. Unlike the simple URI search, most powerful searches are done using a rich, JSON-based language called the **Query DSL (Domain Specific Language)**.

This notebook covers:
1.  How to bulk-insert data for our examples.
2.  The difference between **Query Context** and **Filter Context**.
3.  The most common and useful search queries (`match`, `term`, `range`).
4.  How to combine queries using the powerful **`bool`** query.

--- 
## 1. Setting Up the Demo Data

First, we'll delete any old `books` index and then use the `_bulk` API to index several book documents at once. This is the most efficient way to insert multiple documents.

**Command:**
```bash
curl -X POST "localhost:9200/books/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"title":"The Great Gatsby","author":"F. Scott Fitzgerald","year":1925,"summary":"A story of wealth, love, and the American dream.","tags":["classic","novel"]}
{"index":{"_id":"2"}}
{"title":"To Kill a Mockingbird","author":"Harper Lee","year":1960,"summary":"A powerful story about justice and racial inequality.","tags":["classic","fiction"]}
{"index":{"_id":"3"}}
{"title":"1984","author":"George Orwell","year":1949,"summary":"A dystopian novel about totalitarianism and surveillance.","tags":["classic","dystopian"]}
{"index":{"_id":"4"}}
{"title":"Pride and Prejudice","author":"Jane Austen","year":1813,"summary":"A romantic novel of manners and a story of love.","tags":["classic","romance"]}
'
```

**Expected Output:**
```json
{
  "took" : 25,
  "errors" : false,
  "items" : [ /* ... details about each item ... */ ]
}
```

--- 
## 2. Query Context vs. Filter Context

This is a critical concept in Elasticsearch. Every query clause runs in one of two contexts:

- **Query Context**: Asks "*How well* does this document match this query?" It calculates a relevance **`_score`** for each document. This is used for full-text search where you care about which result is the *best* match.

- **Filter Context**: Asks "*Does* this document match this query?" It's a simple yes/no question and does not calculate a score. It's much faster and its results can be cached. This is used for exact matches on structured data (like years, tags, or IDs).

You will almost always use a mix of both in a single search request.

--- 
## 3. The Most Common Queries

All complex searches are performed using a `GET` request to the `_search` endpoint with a JSON body.

### `match` Query (Query Context)

The standard for full-text search on a field. It analyzes the search string and finds documents containing the individual terms.

**Command:**
```bash
curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "summary": "love and justice"
    }
  }
}'
```

**Expected Output (Simplified):**
```json
{
  "hits": {
    "total": {"value": 2, "relation": "eq"},
    "hits": [
      {
        "_id": "4",
        "_score": 1.5753493,
        "_source": { "title": "Pride and Prejudice", /* ... */ }
      },
      {
        "_id": "2",
        "_score": 1.252763,
        "_source": { "title": "To Kill a Mockingbird", /* ... */ }
      }
    ]
  }
}
```

### `term` Query (Filter Context)

Used for finding an **exact** value in a field. This is best used on `keyword` fields (like tags) that are not analyzed as full text.

**Command:**
```bash
curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "tags.keyword": "dystopian" 
    }
  }
}'
```
*(Note: Elasticsearch often creates a `.keyword` sub-field for exact matching on text fields.)*

**Expected Output:** This will return only the document for "1984".

--- 
## 4. Combining Queries with the `bool` Query

The `bool` query is the workhorse for building complex searches. It lets you combine multiple query clauses with boolean logic.

Let's find a book that:
- **MUST** contain `story` in the summary (Query Context)
- **MUST NOT** be a `romance` (Filter Context)
- **SHOULD** preferably be by `George Orwell` (Query Context, boosts score)
- **MUST** have a `year` greater than `1940` (Filter Context)

**Command:**
```bash
curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "summary": "story" } }
      ],
      "must_not": [
        { "term": { "tags.keyword": "romance" } }
      ],
      "should": [
        { "match": { "author": "George Orwell" } }
      ],
      "filter": [
        { "range": { "year": { "gt": 1940 } } }
      ]
    }
  }
}'
```

**Expected Output:** This complex query will correctly return two documents: "To Kill a Mockingbird" and "1984", with "1984" getting a higher relevance `_score` because it matched the `should` clause.

--- 
## Conclusion

This notebook introduced the power and flexibility of the Elasticsearch Query DSL. We learned:

- The crucial difference between **Query Context** (for relevance) and **Filter Context** (for exact matches).
- How to use fundamental queries like `match` and `term`.
- How to build sophisticated searches by combining clauses in a **`bool` query**.

In the next notebook, we will learn how to execute these queries programmatically using the official Python client.