In [2]:
#!pip install -r requirements.txt

# <center> How to Elasticsearch </center>
This is only a short guide for the presentation, for more information see the [documentation](https://elasticsearch-py.readthedocs.io/en/v8.7.0/) or ask the AI of your trust.

In [24]:
from elasticsearch import Elasticsearch
from os import path
from json import dumps as jdumps

## Establish Connection to the server

Elasticsearch has a number of security features, which are
+ Authentication: Controls access to the cluster with built-in user management.
+ Role-based access control (RBAC): Enables fine-grained permissions management through roles.
+ Encryption: Provides SSL/TLS for data in transit and supports encryption for data at rest

None of these are enabled for this tutorial. Keep this in mind when setting up your own elasticsearch server or trying to connect to another.

Elasticsearch is still a young database, and the libraries change a lot. Currently you need the same Python library version as the elasticsearch server, they are not backwards compatible!

In [11]:
if path.exists('secret.txt'):
    with open('secret.txt','r') as f:
        connection=f.read()
else:
    connection='http://0.0.0.0:9200'


# connect to the Server
print("Trying to connect to:",connection)
es = Elasticsearch([connection])
try:
    print("Connection successful\n",es.info())
except Exception as e:
    print("ERROR! trying to connect!\n",e)

Trying to connect to: http://10.0.0.168:9200
Connection successful
 {'name': '5023a93705cc', 'cluster_name': 'docker-cluster', 'cluster_uuid': 'bU3iWkjPQ_u0F4VrBeHiXQ', 'version': {'number': '8.7.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '09520b59b6bc1057340b55750186466ea715e30e', 'build_date': '2023-03-27T16:31:09.816451435Z', 'build_snapshot': False, 'lucene_version': '9.5.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


## Different terminology
Elasticsearch is a document-oriented database, and although there are similarities to relational databases, the way data is stored, indexed and queried is different. Elasticsearch is primarily designed for search and analysis, whereas relational databases are designed for structured data and relationships between tables.

| Elasticsearch      | Relational Database |
|--------------------|---------------------|
| Index              | Table               |
| Document           | Row                 |
| Field              | Column              |
| Mapping            | Schema              |
| Nested field       | Foreign Key / Join  |
| _id                | Primary Key         |

## Lets poke around in the databse

As there are no user roles, we can also see the system indexes and we do not want to touch them in this tutorial. So we will exclude them from the search.
If you were to search with `"*"` you would find everything. Combining this with `"-.*"`, which excludes indices starting with `'.'`, we only see the four indices we want.

In [98]:
for index in es.indices.get_alias(index='*,-.*'):
    print(index)

poke_market
pokemon
trainer
arena


From the index pokemon, let us select a document and look at it.
Since we do not yet know what the id looks like, we have to use a search query. 

The object returned is an `ObjectApiResponse`, but we can navigate in this object like in any other python `dictionary`.
In this return we will get a lot of information like how long the query time was and so on, but we are not interested in this information, at least not for this short tutorial, we will look at the found document.

To see the hits we just nagivate into `['hits']`, this will tell us how many possible documents would be eq = equal to our search.<br>
But we also want to skip this information, so we navigate deeper with `['hits']` again.

In [55]:
es.search(index='pokemon', size=1,query={"match_all": {}})['hits']['hits']

[{'_index': 'pokemon',
  '_id': '0001',
  '_score': 1.0,
  '_source': {'PokeIndex': '0001',
   'name': 'Bulbasaur',
   'Primary type': 'Grass',
   'Secondary type': 'Poison',
   'Evolves from': 'Beginning of evolution',
   'Evolves into': 'Ivysaur (#002)',
   'Notes': 'It uses the nutrients that are stored in the seed on its back in order to grow. The reception of Bulbasaur has been largely positive and it often appears in "top Pokémon lists".[3][4] Its English name is a portmanteau of "bulb" and "dinosaur".[5]'}}]

There are a few things we can learn from this document.
The `_id` is a string, unlike in a relational database where you usually use an integer, but in elasticsearch every `_id` will always be a `string` (keep that in mind for later).
The `_score` indicates us the relevance of the document, since we have matched all of them, the score is a 1.0 because all are of the same relevance.
In `_source` we see the document data and therefore the fields with and values.<br>
A Pokemon has `[PokeIndex,Name,Primary Type,Secondary Type,Evolves From,Evolves Into,Notes]` as fields.

Lets asume the `_id` follows the rules how do we find the last pokemon wich has a `_id` of `0151`.

### Search and get

In [97]:
dict(es.get(index='pokemon', id='0151'))

{'_index': 'pokemon',
 '_id': '0151',
 '_version': 1,
 '_seq_no': 150,
 '_primary_term': 1,
 'found': True,
 '_source': {'PokeIndex': '0151',
  'name': 'Mew',
  'Primary type': 'Psychic',
  'Secondary type': 'Psychic',
  'Evolves from': 'No evolution',
  'Evolves into': 'No evolution',
  'Notes': "This Mythical Pokemon is so rare, only a few expert worldwide have found it, though a growing number of people have reportedly seen it recently. It apparently originates in South America which were thought to be extinct. It's said that it appears only to those who have a true heart and a strong passion to see it. It's DNA is said to contain the genetic code of every Pokemon and every move. Because of this, many scientists believe that it's the ancestor of all pokemon. If you view it's body under a microscope, you can see it's fine, small and delicate hairs. It is capable of turning invisible at will so that people are unaware of its presence. It is considered one of the original pregenetor Po

Lets search for Pikachu.

In [78]:
dict(es.search(index='pokemon', query={
    "match":{
        "name":"Pikachu"
    }
}))['hits']['hits']

[{'_index': 'pokemon',
  '_id': '0025',
  '_score': 4.6555166,
  '_source': {'PokeIndex': '0025',
   'name': 'Pikachu',
   'Primary type': 'Electric',
   'Secondary type': 'Electric',
   'Evolves from': 'Pichu (#172)',
   'Evolves into': 'Raichu (#026) Gigantamax',
   'Notes': "Pikachu is the primary mascot of the Pokémon franchise, as well as Pokémon Yellow and Let's Go, Pikachu!. It is also playable in every Super Smash Bros. game. Its Gigantamax form looks like its old sprite from Red and Blue with a glowing, whitish tail. Pikachu raises its tail to check its surroundings and it might get struck by lightning in this pose. When groups gather and do this, the lightning shock would be very dangerous to be around. That's why the forests they inhabitat away from cities are very dangerous to be in. Pikachu are now adept at storing electricity and will discharge it at foes and they recharge when sleeping. Pikachu with the stretchiest and softest cheeks are more powerful, but when their tai

Search for any document that contains the name Pikachu anywhere.

In [84]:
dict(es.search(index='pokemon', query={
    "query_string":{
        "query":"Pikachu"
    }
}))['hits']

{'total': {'value': 4, 'relation': 'eq'},
 'max_score': 6.2525344,
 'hits': [{'_index': 'pokemon',
   '_id': '0025',
   '_score': 6.2525344,
   '_source': {'PokeIndex': '0025',
    'name': 'Pikachu',
    'Primary type': 'Electric',
    'Secondary type': 'Electric',
    'Evolves from': 'Pichu (#172)',
    'Evolves into': 'Raichu (#026) Gigantamax',
    'Notes': "Pikachu is the primary mascot of the Pokémon franchise, as well as Pokémon Yellow and Let's Go, Pikachu!. It is also playable in every Super Smash Bros. game. Its Gigantamax form looks like its old sprite from Red and Blue with a glowing, whitish tail. Pikachu raises its tail to check its surroundings and it might get struck by lightning in this pose. When groups gather and do this, the lightning shock would be very dangerous to be around. That's why the forests they inhabitat away from cities are very dangerous to be in. Pikachu are now adept at storing electricity and will discharge it at foes and they recharge when sleeping. 

If we look at the 4 documents found, we see different results. The more often Pickachu is mentioned, the higher the score.<br> 
In the second document, Pikachu has most of its own field, which gets a higher score than being somewhere in a text field.<br>
And the last two documents only mention Pikachu once, but in a text way later, which lowers the relevance score. 

Here are some search options:

| Query Type        | Description                                                                                      |
|-------------------|--------------------------------------------------------------------------------------------------|
| `match`           | Full-text search for a single field, analyzing the query text and matching against the document field. |
| `match_phrase`    | Searches for exact phrases in the text, with the words in the specified order.                     |
| `match_all`       | Returns all documents in the index, usually used for testing purposes.                            |
| `bool`            | A compound query that combines multiple queries using logical operators (e.g., `must`, `should`, `must_not`, `filter`). |
| `term`            | Searches for exact terms in a field without analyzing the query text.                             |
| `terms`           | Searches for multiple exact terms in a field without analyzing the query text.                    |
| `range`           | Searches for documents with field values within a specified range (e.g., dates, numbers).         |
| `prefix`          | Searches for documents with field values that start with the specified prefix.                    |
| `wildcard`        | Searches for documents with field values that match a pattern with wildcards.                     |
| `regexp`          | Searches for documents with field values that match a specified regular expression pattern.       |
| `fuzzy`           | Searches for documents with field values that are similar to the query text based on edit distance (e.g., "apple" and "appel"). |
| `nested`          | Searches for documents with matching nested objects in a field.                                   |
| `query_string`    | Allows using a query string to search for documents with a more flexible syntax, including logical operators, wildcards, and field-specific searches. |
| `geo_distance`    | Searches for documents with geo-points within a specified distance from a central point.          |
| `geo_bounding_box`| Searches for documents with geo-points within a specified bounding box.                            |
| `geo_polygon`     | Searches for documents with geo-points within a specified polygon.                                |

[TRY YOURSELF] Lets Try to find all `rare` Pokemons.

In [94]:
es.search(index='pokemon', query={
    "bool": {
        "must": [
            {"match_phrase": {"Evolves from": "No evolution"}},
            {"match_phrase": {"Evolves into": "No evolution"}}
            ]
        }
    }
)['hits']

{'total': {'value': 7, 'relation': 'eq'},
 'max_score': 8.331094,
 'hits': [{'_index': 'pokemon',
   '_id': '0128',
   '_score': 8.331094,
   '_source': {'PokeIndex': '0128',
    'name': 'Tauros',
    'Primary type': 'Normal',
    'Secondary type': 'Normal',
    'Evolves from': 'No evolution',
    'Evolves into': 'No evolution',
    'Notes': "When attacking, it violently charges while whipping its self with its three tails. A really rowdy and hyper pokemon, once it charges, it won't stop until it hits something. Although powerful, it can only charge in a straight line. They fight each other by locking horns to prove their strength, and the leader prides itself with its battle scared horns. If there's no opponents, it will charge into thick trees, knocking them down. Historically, people have ridden Tauros for ages, but the practice started in Alola, and Tauros in Alola actually have a calm side, most likely due to the climate. Tauros in Galar though are more volatile and won't let peop

In [33]:
es.indices.get_mapping(index='pokemon')['pokemon']['mappings']['properties']

{'Evolves from': {'type': 'text'},
 'Evolves into': {'type': 'text'},
 'Notes': {'type': 'text'},
 'PokeIndex': {'type': 'integer'},
 'Primary type': {'type': 'text'},
 'Secondary type': {'type': 'text'},
 'name': {'type': 'text'}}

es.indices.get_mapping(index=index_name)

In [None]:
# Überprüfen, ob Elasticsearch erreichbar ist
def check_connection():
    if es.ping():
        print("Elasticsearch is connected!")
    else:
        print("Elasticsearch is not connected!")

# Index erstellen
def create_index(index_name):
    es.indices.create(index=index_name)

# Index löschen
def delete_index(index_name):
    es.indices.delete(index=index_name)

# Dokument indizieren
def index_document(index_name, doc_id, document):
    es.index(index=index_name, id=doc_id, body=document)

# Dokument abrufen
def get_document(index_name, doc_id):
    return es.get(index=index_name, id=doc_id)

# Dokument aktualisieren
def update_document(index_name, doc_id, update_body):
    es.update(index=index_name, id=doc_id, body=update_body)

# Dokument löschen
def delete_document(index_name, doc_id):
    es.delete(index=index_name, id=doc_id)

# Einfache Suche
def search(index_name, query):
    return es.search(index=index_name, body={"query": {"match": query}})

# Beispielverwendung
if __name__ == "__main__":
    check_connection()
    index_name = "example_index"
    doc_id = 1
    document = {"name": "John Doe", "age": 30, "city": "New York"}

    create_index(index_name)
    index_document(index_name, doc_id, document)
    print("Indexed document:", get_document(index_name, doc_id))

    update_body = {"doc": {"age": 31}}
    update_document(index_name, doc_id, update_body)
    print("Updated document:", get_document(index_name, doc_id))

    query = {"name": "John Doe"}
    print("Search results:", search(index_name, query))

    delete_document(index_name, doc_id)
    delete_index(index_name)