**Connect to Elasticsearch**

In [23]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

client_info = es.info()

print("Connected to Elasticsearch successfully!")
pprint(client_info.body)

Connected to Elasticsearch successfully!
{'cluster_name': 'docker-cluster',
 'cluster_uuid': 'T1HeaWnRTOqX_BBgREVVbA',
 'name': '64c49e436740',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2025-10-21T10:06:21.288851013Z',
             'build_flavor': 'default',
             'build_hash': '25d88452371273dd27356c98598287b669a03eae',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '10.3.1',
             'minimum_index_compatibility_version': '8.0.0',
             'minimum_wire_compatibility_version': '8.19.0',
             'number': '9.2.0'}}


**Create Index**

In [24]:
es.indices.delete(index="my_index", ignore_unavailable=True)
es.indices.create(index="my_index")

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'my_index'})

**Indexing Sequentially**

In [25]:
import json
from tqdm import tqdm 

documents = json.load(open("data.json"))
documents_ids= []

for document in tqdm(documents, total=len(documents)):
    response = es.index(index="my_index", body=document)
    documents_ids.append(response["_id"])

documents_ids

100%|██████████| 5/5 [00:00<00:00, 10.45it/s]


['JJJjQZoBsuuLZ2nEtzL2',
 'JZJjQZoBsuuLZ2nEuDLT',
 'JpJjQZoBsuuLZ2nEuTIS',
 'J5JjQZoBsuuLZ2nEuTJS',
 'KJJjQZoBsuuLZ2nEuTKY']

In [26]:
documents[0]

{'title': 'Sample Title 1',
 'text': 'This is the first sample document text.',
 'created_on': '2024-09-22'}

### Searching
1.Leaf clauses
#
1.1. term query
#
Let's use the Query DSL language to construct a query that will find any document that was created on 2024-09-22

In [27]:
response = es.search(
    index="my_index", 
    body={
        "query": {
            "term": {'created_on': '2024-09-22'}
        }
    }
)

n_hits = response["hits"]["total"]["value"]
print(f"Found {n_hits} documents in my_index")

Found 1 documents in my_index


To retrieve the document just use the hits dictionary like this.

In [28]:
retrieve_documents = response["hits"]["hits"]
retrieve_documents

[{'_index': 'my_index',
  '_id': 'JJJjQZoBsuuLZ2nEtzL2',
  '_score': 1.0,
  '_source': {'title': 'Sample Title 1',
   'text': 'This is the first sample document text.',
   'created_on': '2024-09-22'}}]

### 1.2. match query
Now, let's search for any document that contains the word document in the text field.

In [29]:
response = es.search(
    index="my_index", 
    body={
        "query": {
            "match": {
                "text": "document"
            }
        }
    }
)

In [30]:
retrieve_documents = response["hits"]["hits"]
retrieve_documents

[{'_index': 'my_index',
  '_id': 'JJJjQZoBsuuLZ2nEtzL2',
  '_score': 1.0603602,
  '_source': {'title': 'Sample Title 1',
   'text': 'This is the first sample document text.',
   'created_on': '2024-09-22'}},
 {'_index': 'my_index',
  '_id': 'KJJjQZoBsuuLZ2nEuTKY',
  '_score': 0.73292506,
  '_source': {'title': 'Sample Title 5',
   'text': 'FastAPI is an excellent choice for building high-performance APIs with async capabilities. This is a document too',
   'created_on': '2025-02-02'}}]

### 1.3. range query
Let's find documents that were created before `2024-09-24`

In [35]:
response = es.search(
    index="my_index", 
    body={
        "query": {
            "range": {
                "created_on": {
                    "gte": "2024-11-14"
                }
            }
        }
    }
)

In [36]:
retrieve_documents = response["hits"]["hits"]
retrieve_documents

[{'_index': 'my_index',
  '_id': 'JpJjQZoBsuuLZ2nEuTIS',
  '_score': 1.0,
  '_source': {'title': 'Sample Title 3',
   'text': 'Django Rest Framework simplifies API development with powerful serialization tools.',
   'created_on': '2024-11-14'}},
 {'_index': 'my_index',
  '_id': 'J5JjQZoBsuuLZ2nEuTJS',
  '_score': 1.0,
  '_source': {'title': 'Sample Title 4',
   'text': 'Python provides a wide range of libraries for data analysis, automation, and backend development.',
   'created_on': '2025-01-10'}},
 {'_index': 'my_index',
  '_id': 'KJJjQZoBsuuLZ2nEuTKY',
  '_score': 1.0,
  '_source': {'title': 'Sample Title 5',
   'text': 'FastAPI is an excellent choice for building high-performance APIs with async capabilities. This is a document too',
   'created_on': '2025-02-02'}}]

This is how you use the leaf clauses. Now, if you want to combine leaf clauses together, you do that with the compound clauses.

### 2. Compound clauses
Let's search for documents that meet the following criteria:

* Created on `2024-09-22`
* Have the word `document` in the text field.

In [39]:
response = es.search(
    index="my_index",
    body= {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "text": "document"
                        }
                    },
                    {
                        "range": {
                            "created_on": {
                                "lte": "2024-09-22",
                                "gte": "2024-09-22"
                            }
                        }
                    }
                ]
            }
        }
    }
)

retrieve_documents = response["hits"]["hits"]
retrieve_documents

[{'_index': 'my_index',
  '_id': 'JJJjQZoBsuuLZ2nEtzL2',
  '_score': 2.0603602,
  '_source': {'title': 'Sample Title 1',
   'text': 'This is the first sample document text.',
   'created_on': '2024-09-22'}}]

With the compound clause, we were to combine two leaf clauses to find a specific document.