### Importing Necessary Libraries and Modules

Before diving into the code, ensure that the necessary Python libraries and modules are installed. If not already installed, you can install them using `pip`, Python's package installer.

#### Libraries to Install:

1. **Elasticsearch**:
   - Install with: `pip install elasticsearch`
   - The Elasticsearch library allows connecting to and interacting with an Elasticsearch cluster.

2. **FastAPI**:
   - Install with: `pip install fastapi`
   - FastAPI is a modern, fast web framework for building APIs. `HTTPException` is used for handling HTTP errors in the API.

3. **Pydantic**:
   - Install with: `pip install pydantic`
   - Pydantic is used for data validation and settings management with Python type annotations.

4. **UUID**:
   - The `uuid` module is part of Python’s standard library, so no additional installation is required.
   - Used to generate unique identifiers for resources or entities.

5. **Typing**:
   - The `typing` module is also part of Python’s standard library.
   - Provides support for type hints in Python.

6. **OpenAI**:
   - Install with: `pip install openai`
   - This library is used to integrate OpenAI functionalities, like GPT models, into the application.

7. **Logging**:
   - `logging` is a module available in Python’s standard library.
   - Used for logging messages and debugging.

8. **Uvicorn**:
  - Install with: `pip install uvicorn`
  - Uvicorn is an ASGI server implementation, which is essential for serving FastAPI applications. It offers high performance and is easy to deploy.


#### Installing with Pip:

To install these libraries, run the following commands in your terminal or command prompt:

```bash
pip install elasticsearch
pip install fastapi
pip install pydantic
pip install openai
pip install uvicorn
```

In [2]:
from elasticsearch import Elasticsearch, NotFoundError
from fastapi import FastAPI
from fastapi import HTTPException
from pydantic import BaseModel
import uuid
from typing import List, Optional
from openai import OpenAI
import logging


### Initializing OpenAI Client:
OpenAI Client Setup:
openai_client = OpenAI(api_key="your-openai-key")
This line creates an instance of the OpenAI client using your unique API key ("your-openai-key"). Replace "your-openai-key" with your actual OpenAI API key.
The OpenAI client is used to interact with OpenAI's API, enabling the application to utilize AI models like GPT-3 for various tasks.
### Setting Up Embedding Model:
Embedding Model Specification:
embedding_model = "text-embedding-ada-002"
Here, we specify the model to be used for text embedding. In this case, "text-embedding-ada-002" is assigned to the embedding_model variable.
This model can be used for converting text into meaningful numerical representations, which are essential in various natural language processing tasks.

In [4]:
openai_client = OpenAI(api_key="your-openai-key")
embedding_model = "text-embedding-ada-002"
logging.basicConfig(level=logging.INFO)


In [8]:
app = FastAPI()

INDEX_NAME = "qa_data"
es = Elasticsearch(hosts="http://localhost:9200")

### Overview:

- **Purpose**: 
  - The function `create_index` sets up an Elasticsearch index with a specific structure to store question and answer data along with their vector representations.

- **Significance of 1536**: 
  - The dimension size `1536` is critical as it aligns with the vector size returned by OpenAI for text embeddings. This ensures that the index can efficiently store and handle the vector data provided by OpenAI's models.

- **Index Creation**: 
  - The function uses the Elasticsearch client to create an index with the defined mappings, facilitating effective data storage and search operations.

- This function is essential for integrating Elasticsearch with OpenAI's AI models, particularly for applications involving natural language processing and text analysis.


In [10]:
def create_index(es_client, index_name):
    mapping = {
        "mappings": {
            "properties": {
                "question": {"type": "text"},
                "answer": {"type": "text"},
                "tags": {"type": "keyword"},
                "question_vector": {"type": "dense_vector", "dims": 1536},
                "answer_vector": {"type": "dense_vector", "dims": 1536}
            }
        }
    }
    es_client.indices.create(index=index_name, body=mapping)
    create_index(es, INDEX_NAME)


In [11]:
class QuestionAnswerModel(BaseModel):
    question: str
    answer: str
    tags: Optional[List[str]] = None

class QueryModel(BaseModel):
    query: str

In [12]:
@app.get("/health/")
def health_check():
    return {"status": "healthy"}

In [13]:
@app.post("/train/")
async def train(data: QuestionAnswerModel):
    # Extract data from request body
    question = data.question.lower()
    answer = data.answer
    tags = data.tags
    
    # embedding using openai
    question_vector = openai_client.embeddings.create(input=[question], model=embedding_model).data[0].embedding
    answer_vector = openai_client.embeddings.create(input=[question], model=embedding_model).data[0].embedding
    # Create and index the document
    doc = {
        "question": question,
        "answer": answer,
        "tags": tags,
        "question_vector": question_vector,
        "answer_vector": answer_vector
    }
    doc_id = uuid.uuid4()
    es.index(index="qa_data", id=doc_id, body=doc)
    return {"message": "Question and answer added successfully", "document_id": doc_id}

In [14]:
@app.post("/ask/")
async def ask(data: QueryModel):
    # Vectorize the query using sentence transformers
    # query_vector = sentence_model.encode(query).tolist()
    logging.info("Query: %s", data.query)
    embedding_response = openai_client.embeddings.create(
        input=[data.query], model=embedding_model
    )
    logging.info("Embedding Response: %s", embedding_response)
    query_vector = embedding_response.data[0].embedding
    logging.info("Query Vector: %s", query_vector)
    # Elasticsearch query for cosine similarity
    script_query = {
        "script_score": {
            "query": {"match_all": {}},
            "script": {
                "source": "cosineSimilarity(params.query_vector, 'question_vector') + 1.0",
                "params": {"query_vector": query_vector}
            }
        }
    }

    # Perform the search
    response = es.search(
        index="qa_data",
        body={
            "size": 2,
            "query": script_query,
            "_source": {"includes": ["answer"]}
        }
    )

    # Extracting answers
    answers = [hit['_source']['answer'] for hit in response['hits']['hits']]

    return {"answers": answers}

In [16]:
import nest_asyncio
nest_asyncio.apply()

import uvicorn
uvicorn.run(app, host="0.0.0.0", port=5001, loop="none")


INFO:     Started server process [23460]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/6ced6465-5365-40a2-afc2-b1f7b268b68d [status:201 duration:0.846s]


INFO:     127.0.0.1:15490 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/1a8465b8-a963-40bd-a50f-7214686b274d [status:201 duration:0.088s]


INFO:     127.0.0.1:15530 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/de676df2-28ee-4992-8ba1-bb6bcb379905 [status:201 duration:0.081s]


INFO:     127.0.0.1:15534 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/2f7e85bc-acc3-49a4-bd0f-15adf1980159 [status:201 duration:0.075s]


INFO:     127.0.0.1:15538 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/30c9dc62-d768-4200-a84e-decc3d904429 [status:201 duration:0.026s]


INFO:     127.0.0.1:15544 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/7a59a8b3-fe48-4b69-8fe8-41186b995a31 [status:201 duration:0.024s]


INFO:     127.0.0.1:15578 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/1f19f6bc-a645-47db-9685-7834a97334a0 [status:201 duration:0.022s]


INFO:     127.0.0.1:15583 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/01927c43-7f44-4f44-81bd-b7e6332343c7 [status:201 duration:0.034s]


INFO:     127.0.0.1:15583 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/011a332b-f792-4ade-838e-8e6cb6ef83a3 [status:201 duration:0.019s]


INFO:     127.0.0.1:15589 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/ef9cbdd6-e76e-4b19-969f-06a814c263ff [status:201 duration:0.019s]


INFO:     127.0.0.1:15593 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/a1fa046d-120e-4b5b-913d-96f771d30bd4 [status:201 duration:0.017s]


INFO:     127.0.0.1:15593 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/9f7d58fb-f65e-405c-9609-4de339b323e1 [status:201 duration:0.022s]


INFO:     127.0.0.1:15598 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/5d49b52c-f257-47f4-a07a-f7c21b87f543 [status:201 duration:0.019s]


INFO:     127.0.0.1:15602 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/ccc01f94-55bd-4220-b055-6f19633aadd6 [status:201 duration:0.032s]


INFO:     127.0.0.1:15606 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/eae5e194-b177-44d3-a75e-473b08a9d7d1 [status:201 duration:0.035s]


INFO:     127.0.0.1:15610 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/d4d9a9bc-a5b5-47a0-83f6-95f9e7bf187d [status:201 duration:0.024s]


INFO:     127.0.0.1:15610 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/9ff3917a-0409-4443-aa5e-03aba4aa5ae3 [status:201 duration:0.024s]


INFO:     127.0.0.1:15610 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/8e82f30c-c804-48b1-ac91-904b2bcf2ed0 [status:201 duration:0.020s]


INFO:     127.0.0.1:15610 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/038e57b8-fb29-4728-b4f9-25cc85e589ff [status:201 duration:0.038s]


INFO:     127.0.0.1:15619 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/1453a5f8-f061-4409-9854-04a01e919f40 [status:201 duration:0.021s]


INFO:     127.0.0.1:15623 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/9c8020dd-bf37-4293-9671-59dfe37332b5 [status:201 duration:0.025s]


INFO:     127.0.0.1:15623 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/6c158144-d0f1-436f-a7f3-7427bda4a916 [status:201 duration:0.022s]


INFO:     127.0.0.1:15623 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/c551b207-2cee-4e27-90ac-b47201f80d70 [status:201 duration:0.019s]


INFO:     127.0.0.1:15629 - "POST /train/ HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:elastic_transport.transport:PUT http://localhost:9200/qa_data/_doc/ee3bd89b-098e-4251-ac6b-d58c28464b6e [status:201 duration:0.025s]


INFO:     127.0.0.1:15633 - "POST /train/ HTTP/1.1" 200 OK


INFO:root:Query: What is payments?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:root:Embedding Response: CreateEmbeddingResponse(data=[Embedding(embedding=[0.0011098596, 0.0028799518, 0.0020759723, -0.02567386, -0.024564002, -0.005890279, -0.035435278, 0.0014224255, -0.007107113, -0.043190923, 0.029792376, 0.00637835, -0.027759861, 0.0013096009, 0.029150529, -0.0056195, 0.037146866, -0.0042890054, 0.0015812156, -0.017089164, -0.03629107, 0.0057565607, -0.027051156, 0.016594406, -0.010349775, -0.006044055, 0.01846646, -0.013137796, -0.0020843297, -0.0049074516, -0.0012235199, -0.021983244, -0.01973678, -0.0076754144, -0.024390167, -0.012014564, 0.013612495, 0.006565555, -0.01490956, 0.006351606, 0.02263846, 0.012823558, -0.009828275, -0.011620096, -0.022972757, 0.01164684, 0.0023133217, -0.032012094, -0.023186706, -0.00030086556, 0.021849524, 0.0117136985, -0.012295372, -0.030808633, -0.007100427, -0.0068129334, -0.0031657743, 0.0030470996, 0

INFO:     127.0.0.1:15637 - "POST /ask/ HTTP/1.1" 200 OK


INFO:root:Query: What is ML?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:root:Embedding Response: CreateEmbeddingResponse(data=[Embedding(embedding=[-0.006487228, -0.0061319023, -0.008791768, -0.018625824, -0.020371994, -3.9260294e-06, -0.008128493, 0.019383851, -0.005180984, -0.023431176, 0.014754468, 0.033407364, -0.02431103, 0.0023772966, 0.0017529391, 0.0030084224, 0.024175668, -0.008480434, 0.019275561, -0.00646354, -0.004805354, 0.007404306, -0.012236733, -0.03614168, 0.0128593985, 0.0067105754, -0.0054043313, -0.035925098, -0.014551424, 0.014321309, 0.03132279, -0.010544706, -0.0082435515, -0.011627603, -0.017326348, -0.013583586, 0.023864336, 0.0013037061, -0.004798586, -0.012081066, 0.02845311, 0.014050584, -0.004937332, -0.024730653, -0.015336525, 0.009245231, 0.0019678264, -0.0057427366, -0.009773143, 0.003634472, 0.028967487, 0.0026040282, 0.004355275, -0.021576717, 0.0020507355, -0.010131852, -0.008669942, 0.0017850875, 0.01911

INFO:     127.0.0.1:15641 - "POST /ask/ HTTP/1.1" 200 OK
