# Elasticsearch

## Basic Concepts

### Documents

* basically JSON objects that you search over
* each one has a unique ID
```
{
      "id": "XYZ123",
      "title": "The Great Gatsby",
      "author": "F. Scott Fitzgerald",
      "price": 10.99,
      "createdAt": "2024-01-01T00:00:00.000Z"
}
```

### Index

* a collection of documents
* searches are done against Indexes which return a list of documents

### Mappings and Fields

* Mapping = schema of the Index that defines the fields the Index can have and its data type
    - determines which field is searchable
* example of mapping:
    - keyword type = treats entire thing as a single value, a single token
        * if your id = 123, you can only search for it if your query = 123,
        * it would not return anything if your query = 12
        * think Hash map
    - text type = words or phrases of the text can be searched for
        * e.g. "the quick brown fox" can be searched for with "quick brown"
        * think Inverted Index

In [None]:
{
  "properties": {
    "id": { "type": "keyword" },
    "title": { "type": "text" },
    "author": { "type": "text" },
    "price": { "type": "float" },
    "createdAt": { "type": "date" }
  }
}


* Mappings can affect the performance of your cluster
    - too many fields in the Mapping that aren't actually searchable increases memory overhead of Index = wastes memory!!!
    - __you are allowed to not have every documents' fields in your Mapping__
    - the `dynamic` setting determines how to go about adding new fields into the Mapping
        * dynamic: true => adds new fields into Mapping if it encounters a new field
        * dynamic: false => disregards new fields in new documents not in the Mapping, i.e. doesn't add them to Mapping
        * dynamic: strict => will throw an error if it encounters new fields in new documents

In [None]:
// PUT users_index
{
    "mappings": {
        "dynamic": false, // IMPORTANT
            "properties": {
                "name": {
                    "type": "text"
                },
            "createdAt": {
                "type": "date"
            }
        }
    }
}


// POST users_index/_doc
{
    "name": "Alice",
    "createdAt": "2024-01-01T12:00:00Z",
    "occupation": "Engineer"
}

## Basic Use

### Create an Index

In [None]:
// PUT /books
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

### Set a Mapping

* if most of the fields in your data are not searchable, you can create a Mapping for the index without relying on the dynamic mapping
* you can see that one of the fields has a type of `nested`
    - this means that these are nested documents with their own fields
    - your decision on when to nest something is entirely dependent on its query patterns
        * if something is queried often but updated infrequently, you might want to nest it
        * this is similar to normalization/denormalization tradeoff with SQL databases

In [None]:
// PUT /books/_mapping
{
  "properties": {
    "title": { "type": "text" },
    "author": { "type": "keyword" },
    "description": { "type": "text" },
    "price": { "type": "float" },
    "publish_date": { "type": "date" },
    "categories": { "type": "keyword" },
    "reviews": {
      "type": "nested", // IMPORTANT!!!
      "properties": {
        "user": { "type": "keyword" },
        "rating": { "type": "integer" },
        "comment": { "type": "text" }
      }
    }
  }
}

### Add Documents

*  simple POST request to /_doc endpoint
* each request will return a document ID and data on how it persisted across the cluster
    - the `version` field can be used to update the documents atomically

In [None]:
// POST /books/_doc
{
  "title": "The Great Gatsby",
  "author": "F. Scott Fitzgerald",
  "description": "A novel about the American Dream in the Jazz Age",
  "price": 9.99,
  "publish_date": "1925-04-10",
  "categories": ["Classic", "Fiction"],
  "reviews": [
    {
      "user": "reader1",
      "rating": 5,
      "comment": "A masterpiece!"
    },
    {
      "user": "reader2",
      "rating": 4,
      "comment": "Beautifully written, but a bit sad."
    }
  ]
}

// RESPONSE
{
  "_index": "books",
  "_id": "kLEHMYkBq7V9x4qGJOnh",
  "_version": 1, // IMPORTANT!!!
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

### Updating Documents

* similar to creating a document but requires you specify the document ID in the URL
* if you pass in `version` as the query parameter, you prevent overwriting your changes
    - Elasticsearch will check the version number of the document with the one in the query parameter
    - if they both match, it can proceed to update
    - if not, it will return an error
    - it's a really simple example of __Optimistic Concurrency Control__
* * can use the `_update` endpoint to only update some fields and not the entire document at once

In [None]:
// PUT /books/_doc/kLEHMYkBq7V9x4qGJOnh
{
  "title": "To Kill a Mockingbird",
  "author": "Harper Lee",
  "description": "A novel about racial injustice in the American South",
  "price": 13.99,
  "publish_date": "1960-07-11",
  "categories": ["Classic", "Fiction"],
  "reviews": [
    {
      "user": "reader3",
      "rating": 5,
      "comment": "Powerful and moving."
    }
  ]
}

// PUT /books/_doc/kLEHMYkBq7V9x4qGJOnh?version=1
...

// UPDATE ONLY PARTS OF THE DOCUMENT
// POST /books/_update/kLEHMYkBq7V9x4qGJOnh
{
  "doc": {
    "price": 14.99
  }
}

## Search