# WORKING

### How FAISS Works

- #### FAISS provides various methods to store and retrieve embeddings efficiently. The most common technique is IndexFlatL2, which performs a brute-force L2 (Euclidean) distance search efficiently.

#### 👉 Basic Steps:

- #### Convert Text into Embeddings:
    - #### Sentences like "I love pizza" are converted into vectors using an embedding model (e.g., OpenAI's text-embedding-ada-002 or SentenceTransformers from Hugging Face).
    - #### Example: "I love pizza" → [0.4, 0.6, 0.1, 0.8]

- #### Store Embeddings in FAISS
    - #### FAISS stores these embeddings in a searchable index.

- #### Query Search
    ##### When you search with "Pizza is amazing", its embedding is computed and compared against stored embeddings using a distance metric (like cosine similarity or Euclidean distance).

In [None]:
!pip install faiss-cpu   # or faiss-gpu if you have a GPU


In [None]:
import faiss
import numpy as np

# Example document embeddings (each row represents a sentence embedding)
document_vectors = np.array([
    [0.1, 0.8, 0.3],  # Embedding for "I love pizza"
    [0.5, 0.2, 0.9],  # Embedding for "Pizza is my favorite"
    [0.3, 0.7, 0.2]   # Embedding for another document
]).astype("float32")

# Create FAISS index using L2 distance
index = faiss.IndexFlatL2(3)  # 3D vector space
index.add(document_vectors)  # Add documents to index

# Query vector (embedding for "I enjoy eating pizza")
query_vector = np.array([[0.2, 0.75, 0.25]]).astype("float32")

# Search for the closest document
_, indices = index.search(query_vector, 1)  # Find 1 closest match

print(f"Most similar document index: {indices[0][0]}")


Most similar document index: 2


## Let's work with a basic Q/A dataset to build a basic Q/A chatbot prototype

In [None]:
customer_queries = {
"How do I place an order?": "To place an order, browse products, add them to your cart, and proceed to checkout. Enter your shipping and payment details and confirm the order.",

"Can I modify my order after placing it?": "You can modify your order within 30 minutes of placing it by going to 'My Orders' and selecting 'Edit Order'. After this period, modifications may not be possible.",

"How do I cancel my order?": "Go to 'My Orders', select the order, and click 'Cancel'. If the order has already been shipped, you may need to request a return instead.",

"What payment methods do you accept?": "We accept credit/debit cards, UPI, net banking, PayPal, and cash on delivery (COD) for eligible orders.",

"How do I track my order?": "You can track your order by clicking on 'My Orders' and selecting 'Track Order'. You’ll receive a tracking link via email/SMS once it’s shipped.",

"What is your return policy?": "You can return products within 7 days of delivery if they are unused and in original packaging. Refunds are processed within 5-7 business days.",

"How do I initiate a return?": "Go to 'My Orders', select the order, and click on 'Return'. Follow the instructions to schedule a pickup or drop-off.",

"When will I get my refund?": "Refunds for prepaid orders are processed within 5-7 business days after the return is approved. COD orders receive a refund via bank transfer or store credit.",

"Why is my order delayed?": "Order delays can happen due to high demand, weather conditions, or courier service issues. Check 'My Orders' for real-time tracking updates.",

"Do you offer express delivery?": "Yes, we offer express delivery in select locations for an additional fee. Choose 'Express Shipping' at checkout if available.",

"Can I change my delivery address?": "You can change the delivery address before the order is shipped by going to 'My Orders' and selecting 'Edit Address'. Once shipped, address changes are not possible.",

"What should I do if I receive a damaged product?": "If you receive a damaged product, report it within 48 hours by visiting 'My Orders' and selecting 'Report Issue'. You may be eligible for a replacement or refund.",

"Do you provide international shipping?": "Currently, we ship only within [Country Name]. International shipping options will be available soon.",

"How do I apply a discount code?": "Enter your discount code at checkout in the 'Apply Coupon' section. If valid, the discount will be applied to your total amount.",

"Why was my payment declined?": "Payments may be declined due to incorrect details, insufficient funds, or bank restrictions. Try using another payment method or contact your bank.",

"Can I buy now and pay later?": "Yes, we offer 'Buy Now, Pay Later' options through [BNPL Provider]. Choose this option at checkout to pay in installments.",

"How can I contact customer support?": "You can reach our customer support via email at support@example.com, live chat on our website, or by calling our helpline at [phone number].",
}


In [None]:
questions=[i for i in customer_queries.keys()] # Get all the keys from the dictionary(questions)

# get the maximum length of sentences among all in questions
- ## helpful to get padding of vector embedding (better for visualizing)

In [None]:
from transformers import AutoTokenizer
t = AutoTokenizer.from_pretrained("bert-base-uncased")
tokenized_sentences = [t.tokenize(se) for se in questions]
max_length = max(len(tokens) for tokens in tokenized_sentences)
max_length

11

### Encode questions to vector-vector Embedding

In [None]:
!pip install sentence-transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')


# You can do it using library 
# Embeddings = model.encode(questions, 
#                           convert_to_numpy=True,  # Get NumPy output
#                           normalize_embeddings=True,  # Normalize the output
#                             show_progress_bar=True)  # Show progress

## If you want to do it manually
encoded_questions = model.encode(questions, 
                           convert_to_numpy=True, 
                            normalize_embeddings=True ) # Normalize for better similarity (some tokens might have higher value commpared to others in vector space so we normalize it)
encoded_questions


array([[-0.00525269, -0.0027741 ,  0.00439012, ...,  0.05966287,
        -0.00959073, -0.03116936],
       [-0.01255482,  0.02828169,  0.03745556, ..., -0.05449416,
         0.02443732, -0.05462256],
       [-0.0098854 ,  0.05941395,  0.07196155, ..., -0.00860798,
        -0.041535  , -0.06529609],
       ...,
       [ 0.03389934,  0.11285317,  0.09194537, ..., -0.0623031 ,
        -0.03781408, -0.02075073],
       [-0.01548825, -0.00544532, -0.00193917, ..., -0.09687741,
         0.07747193, -0.0597066 ],
       [-0.06125683,  0.0057992 ,  0.05149605, ..., -0.00036161,
        -0.00158965, -0.00253615]], dtype=float32)

# Provide questions from user and finding similarity with given questions in our dataset to fetch answer

## Fetching question from user and encoding it in vector

In [None]:
user_questions=input("Enter Your question: ") # Get the user question
encoded_user_questions= model.encode(user_questions, 
                           padding='max_length',  
                           truncation=True, 
                           max_length=max_length, 
                           return_tensors="pt").reshape(1,-1)   # cosine_similarity() requires 2D input. So, reshaping it to 2D
encoded_user_questions


array([[-7.46011809e-02, -6.28258428e-03,  2.81015523e-02,
         2.24318001e-02,  7.64028952e-02, -4.33388818e-03,
         5.30833527e-02,  2.31589153e-02,  3.49312462e-02,
         4.59228940e-02, -2.90395524e-02, -1.25194266e-01,
        -3.37083973e-02, -7.53579959e-02, -1.09357804e-01,
        -4.03210036e-02,  1.25522958e-03,  7.84120709e-02,
        -5.64449877e-02,  2.99849045e-02,  4.54625441e-03,
         2.30254494e-02, -1.16831055e-02,  6.63108751e-03,
        -3.12581612e-03,  3.77187952e-02, -7.64694512e-02,
         1.56072723e-02, -7.80703649e-02, -5.33086471e-02,
         7.63645992e-02,  3.49485613e-02,  2.12562867e-02,
        -3.98536175e-02,  2.39744820e-02, -3.10086440e-02,
        -2.49487031e-02, -2.03769952e-02, -2.03674636e-03,
        -9.06769093e-03, -5.37639894e-02, -5.40074073e-02,
         1.99646000e-02,  3.96168008e-02,  1.59770176e-02,
         4.34888043e-02, -3.15230154e-02,  4.81897146e-02,
         1.11820586e-01,  6.01163022e-02, -2.27848114e-0

# Cosine Similarity

In [None]:

from sklearn.metrics.pairwise import cosine_similarity

# Load pre-trained model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Compute similarity with all   query in our dataset
similarity = cosine_similarity(encoded_user_questions,encoded_questions)
most_similar_idx = np.argmax(similarity)
print(f"Cosine Similarity: {similarity[0][0]:.2f}")
print(f'Question that matches it is : {questions[most_similar_idx]}')


Cosine Similarity: 0.17
Question that matches it is : When will I get my refund?


<!-- How FAISS Works

FAISS provides various methods to store and retrieve embeddings efficiently. The most common technique is IndexFlatL2, which performs a brute-force L2 (Euclidean) distance search efficiently.

👉 Basic Steps:

    Convert Text into Embeddings
        Sentences like "I love pizza" are converted into vectors using an embedding model (e.g., OpenAI's text-embedding-ada-002 or SentenceTransformers from Hugging Face).
        Example: "I love pizza" → [0.4, 0.6, 0.1, 0.8]

    Store Embeddings in FAISS
        FAISS stores these embeddings in a searchable index.

    Query Search
        When you search with "Pizza is amazing", its embedding is computed and compared against stored embeddings using a distance metric (like cosine similarity or Euclidean distance). -->