In [2]:
#!pip install -r requirements.txt

# <center> How to Elasticsearch </center>
This is only a short guide for the presentation, for more information see the [documentation](https://elasticsearch-py.readthedocs.io/en/v8.7.0/) or ask the AI of your trust.

In [129]:
from elasticsearch import Elasticsearch
from os import path

## Establish Connection to the server

Elasticsearch has a number of security features, which are
+ Authentication: Controls access to the cluster with built-in user management.
+ Role-based access control (RBAC): Enables fine-grained permissions management through roles.
+ Encryption: Provides SSL/TLS for data in transit and supports encryption for data at rest

None of these are enabled for this tutorial. Keep this in mind when setting up your own elasticsearch server or trying to connect to another.

Elasticsearch is still a young database, and the libraries change a lot. Currently you need the same Python library version as the elasticsearch server, they are not backwards compatible!

In [11]:
if path.exists('secret.txt'):
    with open('secret.txt','r') as f:
        connection=f.read()
else:
    connection='http://0.0.0.0:9200'


# connect to the Server
es = Elasticsearch([connection])
try:
    print("Connection successful\n",es.info())
except Exception as e:
    print("ERROR! trying to connect!\n",e)

Trying to connect to: http://10.0.0.168:9200
Connection successful
 {'name': '5023a93705cc', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'bU3iWkjPQ_u0F4VrBeHiXQ', 'version': {'number': '8.7.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '09520b59b6bc1057340b55750186466ea715e30e', 'build_date': '2023-03-27T16:31:09.816451435Z', 'build_snapshot': False, 'lucene_version': '9.5.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


## Different terminology
Elasticsearch is a document-oriented database, and although there are similarities to relational databases, the way data is stored, indexed and queried is different. Elasticsearch is primarily designed for search and analysis, whereas relational databases are designed for structured data and relationships between tables.

| Elasticsearch      | Relational Database |
|--------------------|---------------------|
| Index              | Table               |
| Document           | Row                 |
| Field              | Column              |
| Mapping            | Schema              |
| Nested field       | Foreign Key / Join  |
| _id                | Primary Key         |

## Lets poke around in the databse

As there are no user roles, we can also see the system indexes and we do not want to touch them in this tutorial. So we will exclude them from the search.
If you were to search with `"*"` you would find everything. Combining this with `"-.*"`, which excludes indices starting with `'.'`, we only see the four indices we want.
<!-- for index in es.indices.get_alias(index='*,-.*'):
    print(index) -->

From the index pokemon, let us select a document and look at it.
Since we do not yet know what the id looks like, we have to use a search query. 

The object returned is an `ObjectApiResponse`, but we can navigate in this object like in any other python `dictionary`.
In this return we will get a lot of information like how long the query time was and so on, but we are not interested in this information, at least not for this short tutorial, we will look at the found document.

To see the hits we just nagivate into `['hits']`, this will tell us how many possible documents would be eq = equal to our search.<br>
But we also want to skip this information, so we navigate deeper with `['hits']` again.

<!-- es.search(index='pokemon', size=1,query={"match_all": {}})['hits']['hits'] -->

There are a few things we can learn from this document.
The `_id` is a string, unlike in a relational database where you usually use an integer, but in elasticsearch every `_id` will always be a `string` (keep that in mind for later).
The `_score` indicates us the relevance of the document, since we have matched all of them, the score is a 1.0 because all are of the same relevance.
In `_source` we see the document data and therefore the fields with and values.<br>
A Pokemon has `[PokeIndex,Name,Primary Type,Secondary Type,Evolves From,Evolves Into,Notes]` as fields.

Lets asume the `_id` follows the rules how do we find the last pokemon wich has a `_id` of `0151`.

### Search and get

<!-- dict(es.get(index='pokemon', id='0151')) -->

Lets search for Pikachu.
<!-- dict(es.search(index='pokemon', query={
    "match":{
        "name":"Pikachu"
    }
}))['hits']['hits'] -->

Search for any document that contains the name Pikachu anywhere.
<!-- dict(es.search(index='pokemon', query={
    "query_string":{
        "query":"Pikachu"
    }
}))['hits'] -->

If we look at the 4 documents found, we see different results. The more often Pickachu is mentioned, the higher the score.<br> 
In the second document, Pikachu has most of its own field, which gets a higher score than being somewhere in a text field.<br>
And the last two documents only mention Pikachu once, but in a text way later, which lowers the relevance score. 

Here are some search options:

| Query Type        | Description                                                                                      |
|-------------------|--------------------------------------------------------------------------------------------------|
| `match`           | Full-text search for a single field, analyzing the query text and matching against the document field. |
| `match_phrase`    | Searches for exact phrases in the text, with the words in the specified order.                     |
| `match_all`       | Returns all documents in the index, usually used for testing purposes.                            |
| `bool`            | A compound query that combines multiple queries using logical operators (e.g., `must`, `should`, `must_not`, `filter`). |
| `term`            | Searches for exact terms in a field without analyzing the query text.                             |
| `terms`           | Searches for multiple exact terms in a field without analyzing the query text.                    |
| `range`           | Searches for documents with field values within a specified range (e.g., dates, numbers).         |
| `prefix`          | Searches for documents with field values that start with the specified prefix.                    |
| `wildcard`        | Searches for documents with field values that match a pattern with wildcards.                     |
| `regexp`          | Searches for documents with field values that match a specified regular expression pattern.       |
| `fuzzy`           | Searches for documents with field values that are similar to the query text based on edit distance (e.g., "apple" and "appel"). |
| `nested`          | Searches for documents with matching nested objects in a field.                                   |
| `query_string`    | Allows using a query string to search for documents with a more flexible syntax, including logical operators, wildcards, and field-specific searches. |
| `geo_distance`    | Searches for documents with geo-points within a specified distance from a central point.          |
| `geo_bounding_box`| Searches for documents with geo-points within a specified bounding box.                            |
| `geo_polygon`     | Searches for documents with geo-points within a specified polygon.                                |

Lets get more complex and create a search for Pokemons without evolutions.
So we need to create a compound query.
Be careful to use match_phrase, otherwise elasticsearch will find no or evolution alone, which is not what we want here.

<!-- es.search(index='pokemon', query={
    "bool": {
        "must": [
            {"match_phrase": {"Evolves from": "No evolution"}},
            {"match_phrase": {"Evolves into": "No evolution"}}
            ]
        }
    }
)['hits'] -->

Try Yourself!<br>
What is the Trainer with the badge `Clown`<br>
You may look in the index trainer.

<!-- dict(es.search(index='trainer', query={
    "match":{
        "badges":"Clown"
    }
}))['hits']['hits'] -->

## Adding Stuff
Add yourself as a trainer

First we need to know how is the trainer build and what rules do we have to follow
<!-- es.indices.get_mapping(index='trainer')['trainer'] -->

Now that we know the mapping properties, we will create ourselves as a trainer. To do this, we will use a temporary dict for a better overview

<!-- tmpDocument={
  "first_name": "Patrick",
  "last_name": "Somone",
  "age": 69,
  "gender": "male",
  "pokemon": [
    {
      "id": 132,
      "name": "Pinker ritter",
      "level": 99,
      "hp": "1337"
    }
  ],
  "badges": ['Geiler typ','Data Scientist','Elasticsearch Expert']
}
es.index(index='trainer',id="Something", document=tmpDocument) -->

<!-- es.search(index='trainer', size=10,query={"match_all": {}})['hits']['hits'] -->

<!-- es.delete(index="trainer", id="Something") -->

## Create your own index

<!-- mapping = {
    "properties": {
        "member": {"type": "text"},
        "pokemon": {"type": "text"},
        "blast_offs":{"type":"integer"}
    }
}

es.indices.create(index='team_rocket', mappings=mapping) -->

### fill your index with one example document
<!-- es.index(index='team_rocket', document={"member": "Jessie",
                                        "pokemon": "Arbok",
                                        "blast_offs": 123456}
        ) -->

<!-- es.search(index='team_rocket', size=10,query={"match_all": {}})['hits']['hits'] -->

[{'_index': 'team_rocket',
  '_id': 'WAM_q4cB2o45eqWzf0H3',
  '_score': 1.0,
  '_source': {'member': 'Jessie', 'pokemon': 'Arbok', 'blast_offs': 123456}}]