## **Loading Dataset ()**

In [4]:
import json

with open("/content/dataset.json", "r") as file:
    data = json.load(file)


In [5]:
# Extract content and topics
contents = [entry['content'] for entry in data]
topics = [entry['topic'] for entry in data]

print("Contents:", contents)
print("Topics:", topics)


Contents: ['In Python, a class is a blueprint for creating objects that encapsulate data and behavior. This principle supports abstraction and encapsulation in Python programming.', 'In Python, objects are created from classes using the class name followed by parentheses. Python classes provide a clear framework for implementing object-oriented principles.', 'Classes allow Python developers to create scalable and organized applications. Objects represent unique data instances, each created from a class.', 'Objects store instance-specific data, while classes provide the shared structure. Python classes provide a clear framework for implementing object-oriented principles.', 'Python classes define the blueprint, while objects represent the tangible instances. Classes in Python promote better collaboration and code clarity in teams.', 'Using classes, Python organizes related data and methods into cohesive units. Using classes and objects, Python simplifies the design of complex systems.',

## Installing **Libraries**

In [6]:
pip install sentence-transformers




## **Generate Embeddings**

In [7]:
from sentence_transformers import SentenceTransformer

# Load Sentence Transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for the content
embeddings = model.encode(contents, convert_to_tensor=True)

print("Embeddings Generated:", embeddings)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embeddings Generated: tensor([[-0.0972,  0.0557, -0.0539,  ...,  0.1194,  0.0803,  0.0104],
        [-0.0768,  0.0183, -0.0754,  ...,  0.1621,  0.1350,  0.0055],
        [-0.0931, -0.0003, -0.0510,  ...,  0.1053,  0.1020,  0.0175],
        ...,
        [-0.0723,  0.0362,  0.0562,  ...,  0.0185,  0.0805,  0.0249],
        [-0.0540,  0.0031,  0.0328,  ...,  0.0477,  0.0049, -0.0006],
        [-0.0294,  0.0269,  0.0921,  ..., -0.0145,  0.0688,  0.0029]])


## **Saving Model**

In [44]:
import os
import torch

# Save model, embeddings, and data
os.makedirs("saved_model_with_data", exist_ok=True)
model.save("saved_model_with_data/pythonmodel")
torch.save(embeddings, "saved_model_with_data/embeddings.pt")
with open("saved_model_with_data/topics.json", "w") as file:
    json.dump(topics, file)
with open("saved_model_with_data/contents.json", "w") as file:
    json.dump(contents, file)

print("Model, embeddings, and data saved together!")


Model, embeddings, and data saved together!


## **Testing Model**

In [187]:
import torch
import json
import random
from sentence_transformers import SentenceTransformer, util



# Step 2: Define the Response Retrieval Function
def get_response_with_dynamic_threshold(query, model, embeddings, topics, contents):
    # Step 1: Encode the query
    query_embedding = model.encode(query, convert_to_tensor=True)

    # Step 2: Compute cosine similarity scores
    scores = util.pytorch_cos_sim(query_embedding, embeddings)[0]

    # Step 3: Select a dynamic threshold
    threshold = random.uniform(0.6, 0.9)  # Randomly choose a threshold between 0.2 and 0.9
    print(f"Dynamic Threshold: {threshold:.2f}")  # Debugging

    # Step 4: Filter results based on the threshold
    filtered_indices = [i for i, score in enumerate(scores) if score >= threshold]

    # Step 5: Handle cases where no results meet the threshold
    if not filtered_indices:
        print("No results above threshold. Lowering threshold to 0.2.")
        filtered_indices = [i for i, score in enumerate(scores) if score >= 0.2]

    # Step 6: Randomly select one result from the filtered list
    chosen_index = random.choice(filtered_indices)

    # Step 7: Return the chosen result along with the index
    return {
        "index": int(chosen_index),  # Include the chosen index
        "topic": topics[chosen_index],  # Return the topic associated with the content
        "content": contents[chosen_index]
    }

# Step 3: Test the System
if __name__ == "__main__":
    # Load the model and data
    model, embeddings, topics, contents
    # Example Query
    query = "What is class in python?"

    # Test multiple runs
    for i in range(5):  # Run the query 5 times
        response = get_response_with_dynamic_threshold(query, model, embeddings, topics, contents)

        # Output the response with the retrieved value number
        print(f"Run {i + 1}:")
        print("Dynamic Threshold Used:", response['index'])
        print("Topic:", response['topic'])
        print("Content:", response['content'])
        print("-" * 50)


Dynamic Threshold: 0.87
Run 1:
Dynamic Threshold Used: 2274
Topic: Class and Objects
Content: In Python, a class is a blueprint for creating objects that encapsulate data and behavior. Classes in Python promote better collaboration and code clarity in teams.
--------------------------------------------------
Dynamic Threshold: 0.64
Run 2:
Dynamic Threshold Used: 4531
Topic: Class and Objects
Content: Classes allow Python developers to create scalable and organized applications. This principle supports abstraction and encapsulation in Python programming.
--------------------------------------------------
Dynamic Threshold: 0.67
Run 3:
Dynamic Threshold Used: 1624
Topic: Class and Objects
Content: A class in Python acts as a factory for creating and managing objects. Objects inherit attributes and methods from their respective classes.
--------------------------------------------------
Dynamic Threshold: 0.68
Run 4:
Dynamic Threshold Used: 8062
Topic: Attributes and Methods
Content: Inst

### **Final Testing **

In [188]:
import torch
import json
import random
from sentence_transformers import SentenceTransformer, util


# Step 2: Define the Response Retrieval Function
def get_response_with_fixed_threshold(query, model, embeddings, topics, contents):
    # Step 1: Encode the query
    query_embedding = model.encode(query, convert_to_tensor=True)

    # Step 2: Compute cosine similarity scores
    scores = util.pytorch_cos_sim(query_embedding, embeddings)[0]

    # Step 3: Predefined thresholds
    thresholds = [0.65, 0.70, 0.75, 0.80, 0.85]

    # Step 4: Select one threshold from the list
    threshold = random.choice(thresholds)  # Randomly pick a threshold

    # Step 5: Filter results based on the threshold
    filtered_indices = [i for i, score in enumerate(scores) if score >= threshold]

    # Step 6: Handle cases where no results meet the threshold
    if not filtered_indices:
        print("No results above threshold. Lowering threshold to 0.2.")
        filtered_indices = [i for i, score in enumerate(scores) if score >= 0.2]

    # Step 7: Randomly select one result from the filtered list
    chosen_index = random.choice(filtered_indices)

    # Step 8: Return the chosen result along with the index
    return {
        "index": int(chosen_index),  # Include the chosen index
        "topic": topics[chosen_index],  # Return the topic associated with the content
        "content": contents[chosen_index]
    }

# Step 3: Test the System (Single Run)
if __name__ == "__main__":
    # Load the model and data
    model, embeddings, topics, contents

    # Example Query
    query = "What is python?"

    # Get the response once
    response = get_response_with_fixed_threshold(query, model, embeddings, topics, contents)

    # Output the response
    print("Answers:", response['content'])


Answers: A Python class allows developers to encapsulate data and behavior into a single unit. Using classes and objects, Python ensures logical separation of concerns in code.
