![](./lab%20header%20image.png)

<div style="text-align: center;">
    <h3>Assignment No. 05</h3>
</div>

<img src="./Student%20Information.png" style="width: 100%;" alt="Student Information">

<div style="border: 1px solid #ccc; padding: 8px; background-color: #f0f0f0; text-align: start;">
    <strong>Q. How can you build a search engine using Elasticsearch to handle efficient full-text search, data indexing, and query processing? Set up Elasticsearch, index a collection of documents, run relevant search queries, and analyze the performance. In your analysis, explain how Elasticsearch optimizes search results and processes queries in real-world applications.</strong>
</div>

**1. What is Elasticsearch?** Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. It is designed for full-text search and can handle structured and unstructured data. Elasticsearch provides powerful features like real-time indexing, distributed search, and scalability.

**2. Core Concepts:**

- **Index**: A collection of documents that share similar characteristics. It is analogous to a database in traditional relational databases.
- **Document**: A JSON object that contains the data. Each document is stored in an index.
- **Field**: A key-value pair in a document (e.g., title, content).
- **Shard**: A horizontal partition of an index that can be stored on different nodes.
- **Replica**: A copy of a shard for redundancy and high availability.
- **Inverted Index**: A data structure that maps terms to their locations in documents, allowing for fast full-text searches.

**3. How Elasticsearch Optimizes Search Results:**

- **Inverted Indexing**: This allows Elasticsearch to quickly look up which documents contain a specific term.
- **Scoring and Relevance**: Elasticsearch uses algorithms like BM25 to rank search results based on relevance.
- **Distributed Nature**: It can handle large volumes of data by distributing the load across multiple nodes and shards.
- **Caching**: Elasticsearch caches the results of frequent queries to speed up response times.

In [None]:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import time

# Step 1: Connect to Elasticsearch
es = Elasticsearch("http://localhost:9200")

# Check if Elasticsearch is up and running
if not es.ping():
    print("Elasticsearch is not available")
else:
    print("Connected to Elasticsearch")

# Step 2: Create an Index
def create_index(index_name):
    if not es.indices.exists(index=index_name):
        es.indices.create(index=index_name)
        print(f"Index '{index_name}' created.")
    else:
        print(f"Index '{index_name}' already exists.")

# Step 3: Index Documents
def index_documents(index_name):
    documents = [
        {"title": "Elasticsearch Basics", "content": "Elasticsearch is a distributed search engine."},
        {"title": "Advanced Elasticsearch", "content": "This document covers advanced topics in Elasticsearch."},
        {"title": "Full-text Search", "content": "Elasticsearch provides powerful full-text search capabilities."},
        {"title": "Search Optimization", "content": "Optimizing search queries is crucial for performance."},
        {"title": "Indexing Strategies", "content": "Choosing the right indexing strategy can improve performance."},
    ]
    
    # Prepare the documents for bulk indexing
    actions = [
        {
            "_index": index_name,
            "_id": str(i + 1),
            "_source": doc
        }
        for i, doc in enumerate(documents)
    ]

    bulk(es, actions)
    print(f"Indexed {len(documents)} documents into '{index_name}'.")

# Step 4: Perform a Search Query with Performance Analysis
def search(index_name, search_term):
    query = {
        "query": {
            "match": {
                "content": search_term
            }
        }
    }
    
    start_time = time.time()  # Start timing
    response = es.search(index=index_name, body=query)
    end_time = time.time()  # End timing

    print(f"\nSearch results for '{search_term}':")
    print(f"Response Time: {end_time - start_time:.4f} seconds")
    print(f"Total Hits: {response['hits']['total']['value']}")
    for hit in response['hits']['hits']:
        print(f" - {hit['_source']['title']}: {hit['_source']['content']}")

# Step 5: Main Function
def main():
    index_name = "documents"
    create_index(index_name)
    index_documents(index_name)
    
    # Run some search queries
    search(index_name, "search")
    search(index_name, "Elasticsearch")
    search(index_name, "optimization")
    search(index_name, "performance")

if __name__ == "__main__":
    main()


<div style="float: right; border: 1px solid black; display: inline-block; padding: 10px; text-align: center">
    <br>
    <br>
    <span style="font-weight: bold;">Signature of Lab Incharge</span>
    <br>
    <span>(Prof. Rupali Sharma)</span> 
</div>