In [1]:
RED = "\033[1;31m"
GREEN = "\033[0;32m"
RESET = "\033[0;0m"

## **Libería SBERT**

### Configuración del Log y Carga del Modelo

Se configura el logging y se carga el modelo de SentenceTransformer. Esto permite registrar información útil mientras el bot está en funcionamiento y asegurar que el modelo se carga correctamente.


In [2]:
import logging
from sentence_transformers import SentenceTransformer

# Logging configuration
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

# Load the SentenceTransformer model
modelSBERT = SentenceTransformer('all-MiniLM-L6-v2')

  from .autonotebook import tqdm as notebook_tqdm
2024-04-14 16:30:44,430 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2024-04-14 16:30:44,441 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
2024-04-14 16:30:44,822 - DEBUG - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L6-v2/resolve/main/modules.json HTTP/1.1" 200 0
2024-04-14 16:30:44,948 - DEBUG - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L6-v2/resolve/main/config_sentence_transformers.json HTTP/1.1" 200 0
2024-04-14 16:30:45,427 - DEBUG - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L6-v2/resolve/main/README.md HTTP/1.1" 200 0
2024-04-14 16:30:45,534 - DEBUG - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L6-v2/resolve/main/modules.json HTTP/1.1" 200 0
2024-04-14 16:30:45,639 - DEBUG - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L6-v2/resolve/main/sentence_bert_config.json HTTP/1.1" 200 0

### Definición de Respuestas del Chatbot

Se definen las respuestas posibles del chatbot. Estas respuestas serán utilizadas para entrenar el modelo y generar los embeddings correspondientes.

#### **Propósito**
En este caso es un chatbot para atención al cliente para una empresa de buses.

In [3]:
# Model's responses definition
responses = [
    "You can buy tickets on our website or at the station.",
    "The bus schedule is from 6 a.m. to 10 p.m.",
    "The bus fare depends on your destination.",
    "Our buses depart every 30 minutes.",
    "You can cancel your ticket 24 hours before departure.",
    "Please provide an ID document when purchasing your ticket.",
    "Children under 5 years old travel for free.",
    "You can bring up to two bags in the luggage compartment at no extra cost.",
    "The buses are equipped with free Wi-Fi.",
    "We have special services for people with disabilities."
]

### Conversión de Respuestas en Embeddings

Se convierten las respuestas en embeddings usando el modelo cargado. Estos embeddings se almacenan en un arreglo para su uso posterior durante la búsqueda de respuestas similares.

In [4]:
# Conversion of responses to embeddings using the model
response_embeddings_SBERT = modelSBERT.encode(responses)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches: 100%|██████████| 1/1 [00:00<00:00,  2.87it/s]


### Función para Encontrar la Respuesta Más Similar

Se implementa una función que convierte la pregunta del usuario en un embedding y compara este con los embeddings de las respuestas usando la similitud de coseno. La función devolverá la respuesta más similar si la similitud supera un umbral definido.

In [5]:
from sklearn.metrics.pairwise import cosine_similarity

# Function to find the most similar response
def find_similar_response(question, threshold=0.5):
    question_embedding = modelSBERT.encode([question])[0]
    similarities = cosine_similarity([question_embedding], response_embeddings_SBERT)
    logging.debug(f"Similarities: {similarities}")
    similar_response_index = similarities.argmax()
    max_similarity = similarities[0][similar_response_index]
    logging.debug(f"Most similar response index: {similar_response_index}, with similarity of: {RED}{max_similarity}")
    if max_similarity < threshold:
        return "I'm sorry, I don't understand your question. Can you rephrase it?"
    return responses[similar_response_index]

### Prueba del Chatbot

Se prueba la función del chatbot con una pregunta de ejemplo para verificar que todo está funcionando correctamente.


In [6]:
# Testing with multiple questions
questionsFirstBot = [
    "How much does the bus ticket cost?",
    "Where can I buy a ticket?",
    "Do the buses have internet connection?",
    "Are your buses inclusive for people with disabilities?",
    "asdfasdf"
]

# Loop through questions and get responses
for question in questionsFirstBot:
    response = find_similar_response(question)
    print("Question:", question)
    print("Answer:", response)
    print()

Batches: 100%|██████████| 1/1 [00:00<00:00,  4.29it/s]
2024-04-14 16:30:48,274 - DEBUG - Similarities: [[0.4184351  0.48985595 0.7404922  0.41205186 0.3041825  0.45003402
  0.32196605 0.2980988  0.45304674 0.15949886]]
2024-04-14 16:30:48,276 - DEBUG - Most similar response index: 2, with similarity of: [1;31m0.7404922246932983


Question: How much does the bus ticket cost?
Answer: The bus fare depends on your destination.



Batches: 100%|██████████| 1/1 [00:00<00:00,  5.29it/s]
2024-04-14 16:30:48,473 - DEBUG - Similarities: [[0.7051648  0.22318847 0.3142142  0.18687859 0.4629287  0.6089513
  0.27267462 0.16547292 0.226758   0.1766938 ]]
2024-04-14 16:30:48,474 - DEBUG - Most similar response index: 0, with similarity of: [1;31m0.7051647901535034


Question: Where can I buy a ticket?
Answer: You can buy tickets on our website or at the station.



Batches: 100%|██████████| 1/1 [00:00<00:00, 20.41it/s]
2024-04-14 16:30:48,533 - DEBUG - Similarities: [[0.24244018 0.47238392 0.532012   0.44691795 0.18229948 0.16971964
  0.18064415 0.13823308 0.6641406  0.13947774]]
2024-04-14 16:30:48,534 - DEBUG - Most similar response index: 8, with similarity of: [1;31m0.6641405820846558


Question: Do the buses have internet connection?
Answer: The buses are equipped with free Wi-Fi.



Batches: 100%|██████████| 1/1 [00:00<00:00, 22.22it/s]
2024-04-14 16:30:48,587 - DEBUG - Similarities: [[0.17788033 0.42843872 0.53550875 0.44483542 0.1463207  0.21010485
  0.3188027  0.15023677 0.5088656  0.55049133]]
2024-04-14 16:30:48,588 - DEBUG - Most similar response index: 9, with similarity of: [1;31m0.5504913330078125


Question: Are your buses inclusive for people with disabilities?
Answer: We have special services for people with disabilities.



Batches: 100%|██████████| 1/1 [00:00<00:00, 26.31it/s]
2024-04-14 16:30:48,633 - DEBUG - Similarities: [[ 0.00483785  0.01367205 -0.00110462  0.02725348  0.05284074  0.14165837
   0.05424194 -0.0411313  -0.01045261  0.20565969]]
2024-04-14 16:30:48,637 - DEBUG - Most similar response index: 9, with similarity of: [1;31m0.2056596875190735


Question: asdfasdf
Answer: I'm sorry, I don't understand your question. Can you rephrase it?



## **Google API**
Se introduce la API de Gemini de Google como una herramienta poderosa para la generación de contenido y la creación de embeddings. Esta API, parte de las herramientas de inteligencia generativa de Google, permite explorar y prototipar aplicaciones de AI generativa de manera accesible. En este ejemplo, se configura la API, se lista los modelos disponibles, y se procede a utilizar un modelo específico para la generación de embeddings de textos, lo que facilita tareas como la recuperación de documentos y la comparación de similitud semántica.


In [7]:

import numpy as np
import google.generativeai as genai


# Configure API and model
genai.configure(api_key='SECRET_KEY')
model = 'models/embedding-001'



# Convert responses to embeddings using the Gemini API
response_embeddings = genai.embed_content(model=model,
                                          content=responses,
                                          task_type="retrieval_document")

# Define function to find the most similar response
def find_similar_response_gemini(question, threshold=0.8):
    global response_embeddings
    question_embedding = genai.embed_content(model=model,
                                             content=question,
                                             task_type="retrieval_document")

    question_embeddingArray = np.array(question_embedding["embedding"]).reshape(1, -1)
    response_embeddingsArray = np.array(response_embeddings["embedding"])

    # Calculate cosine similarity between the question and each response in dictionary
    similarities = cosine_similarity(question_embeddingArray, response_embeddingsArray)
    logging.debug(f"Similarities: {similarities}")
    similarities = similarities.flatten()  # Flatten the similarities array
    
    # Get the index of the most similar response
    index = np.argmax(similarities)
    logging.debug(f"Most similar response index: {index}, with similarity of: {RED}{similarities[index]}")
    max_similarity = similarities[index]
    # Check if the highest similarity score is below the threshold
    if max_similarity < threshold:
        return "I'm sorry, I don't understand your question. Can you rephrase it?"
    
    return responses[index]


# Testing with multiple questions
questions = [
    "How much does the bus ticket cost?",
    "Where can I buy a ticket?",
    "Do the buses have internet connection?",
    "Are your buses inclusive for people with disabilities?",
    "asdfasdf"
]

# Loop through questions and get responses
for question in questions:
    response = find_similar_response_gemini(question)
    print("Question:", question)
    print("Answer:", response)
    print()


2024-04-14 16:30:51,073 - DEBUG - Similarities: [[0.77064441 0.7874335  0.87427176 0.79279087 0.80261985 0.73916771
  0.78858777 0.76667137 0.77922478 0.69734184]]
2024-04-14 16:30:51,075 - DEBUG - Most similar response index: 2, with similarity of: [1;31m0.8742717645084002


Question: How much does the bus ticket cost?
Answer: The bus fare depends on your destination.



2024-04-14 16:30:51,326 - DEBUG - Similarities: [[0.8913632  0.79853758 0.79211193 0.80184422 0.84541858 0.88650367
  0.83944273 0.78219161 0.75847285 0.77683351]]
2024-04-14 16:30:51,327 - DEBUG - Most similar response index: 0, with similarity of: [1;31m0.8913632046458893


Question: Where can I buy a ticket?
Answer: You can buy tickets on our website or at the station.



2024-04-14 16:30:51,570 - DEBUG - Similarities: [[0.74176042 0.78445277 0.81274167 0.79439202 0.7661319  0.71081177
  0.75322246 0.74683511 0.88399265 0.720201  ]]
2024-04-14 16:30:51,571 - DEBUG - Most similar response index: 8, with similarity of: [1;31m0.8839926494717021


Question: Do the buses have internet connection?
Answer: The buses are equipped with free Wi-Fi.



2024-04-14 16:30:51,828 - DEBUG - Similarities: [[0.77961396 0.81527821 0.81800859 0.83126093 0.80214472 0.76347929
  0.80039063 0.78495559 0.82528646 0.84328968]]
2024-04-14 16:30:51,829 - DEBUG - Most similar response index: 9, with similarity of: [1;31m0.8432896768543996


Question: Are your buses inclusive for people with disabilities?
Answer: We have special services for people with disabilities.



2024-04-14 16:30:52,083 - DEBUG - Similarities: [[0.75407702 0.77765883 0.77629029 0.76531303 0.7571019  0.79276084
  0.79125123 0.72849478 0.75379476 0.77599439]]
2024-04-14 16:30:52,084 - DEBUG - Most similar response index: 5, with similarity of: [1;31m0.7927608400393369


Question: asdfasdf
Answer: I'm sorry, I don't understand your question. Can you rephrase it?

