![Colegio Bourbaki](./Images/Bourbaki.png)

## Procesamiento de Lenguaje Natural

### Contexto

El objetivo de este notebook es hacer una demostración de la creación de chatbots estilo ChatGPT con conocimiento de datos específicos.

En primer lugar, enseñaremos cómo conectar con el API de OpenAI para utilizar GPT-3.5 Turbo, el modelo que alimenta a la versión abierta de ChatGPT, desde código.

Después, veremos cómo podemos introducir material a la base de conocimiento del chatbot, para así obtener respuestas más personalizadas.

### Librerías

In [1]:
# NLP Chatbots
#!pip install openai langchain duckdb unstructured chromadb tiktoken
import openai
from langchain.document_loaders.unstructured import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import VectorDBQA
from langchain.chat_models import ChatOpenAI

#Utils
import os
from dotenv import load_dotenv #!pip install python-dotenv
from pdfminer.high_level import extract_text #!pip install pdfminer.six

In [2]:
import warnings
warnings.filterwarnings("ignore")

Conectamos con el API GPT-3.5 por medio de una llave privada a cada usuario.

In [3]:
load_dotenv() # This method loads the variables from .env into the environment

True

In [4]:
#api_key = os.environ['CHATGPT_API_KEY']
api_key = os.getenv("CHATGPT_API_KEY") # This method loads the variables from .env into the environment with dotenv
if api_key is None:
    raise ValueError("API key not found. Please set the CHATGPT_API_KEY environment variable.")

In [5]:
client = openai.OpenAI(
    api_key=api_key,
)

Una vez hecha la conexión con GPT-3.5 podemos aprovechar las capacidades conversacionales de ChatGPT. 

Desde este punto ya es posible integrar asistentes inteligentes a aplicaciones y sistemas.

In [6]:
prompt = "¡Hola! ¿Por qué el cielo es azul?"

In [7]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    #model="gpt-4",
)

In [8]:
print(chat_completion.choices[0].message.content)

Hola! El cielo es azul debido a un fenómeno llamado dispersión de Rayleigh. La luz del sol está compuesta por diferentes colores, cada uno con una longitud de onda diferente. Cuando la luz solar llega a la atmósfera de la Tierra, las partículas en la atmósfera dispersan la luz. Estas partículas dispersan más eficientemente los colores con longitudes de onda más cortas, como los colores azules y violetas. Por lo tanto, vemos el cielo como azul durante el día. Sin embargo, durante el amanecer o el atardecer, la luz solar atraviesa una mayor cantidad de la atmósfera, lo que hace que los colores con longitudes de onda más largas, como los rojos y los naranjas, sean menos dispersados y más visibles, creando así los hermosos tonos cálidos en el cielo.


Podemos lograr mejores respuestas del modelo si modificamos el atributo **system**.

In [9]:
sistema = "Eres un asistente de poetas, habilidoso en explicar conceptos complejos de programación creativamente."
usuario = "Compón un poema que explique el concepto de recursión en programación."

chat_completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": sistema},
    {"role": "user", "content": usuario}
  ]
)

In [10]:
print(chat_completion.choices[0].message.content)

En el lenguaje de los bits y los códigos,
la recursión emerge contando en su historia.
Un bucle que vuelve, se repite en su gloria,
con pasos que danzando nos muestran sus modos.

En la creación de palabras y símbolos,
la magia de la recursión se despliega,
un llamado que al fin el alma despliega,
en un abismo de ciclos paralelos.

Una función que a sí misma se llama,
un eco infinito de abrazos al aire,
creando caminos al infinito se expande.

Como un espejo que al mirarse se ama,
la recursión progresa sin desespero,
una danza infinita sin cesar avance.


### Creación de un asistente especializado

A su vez, podemos aprovechar aún más las capacidades de loss LLM haciendo una especie de fine-tuning. La idea consiste en alimentar al modelos con documentos propios para así lograr respuestas informadas sobre ellos.

Esto es posible en GPT-3.5 a través de los encajes y la generación de una base de datos vectorizada.

Primero, extraemos texto desde archivos pdfs...


... y extramos el texto de un pdf para después guardarlo en .txt.

In [11]:
def extractor_texto(ruta):
    # Assume extract_text is a function defined elsewhere to extract text from the given path
    txt = extract_text(ruta)
    
    # Clean and format the text
    replacements = {
        '\n\n\x0c': ' ',  # Remove specific pattern
        '...': ' ',       # Replace ellipses with space
        '\n': ' ',        # Replace newline characters with space
        '  ': ' ',        # Replace double spaces with single space
        "\f": ' ',        # Remove form feed characters
        "-": ' '          # Replace hyphens with space
    }
    
    # Apply replacements
    for old, new in replacements.items():
        txt = txt.replace(old, new)
    
    # Split into paragraphs and filter based on conditions
    paragraphs = txt.split('\n\n')
    paragraphs = [paragraph.strip() for paragraph in paragraphs if len(paragraph.strip()) > 30]
    
    # Join the cleaned paragraphs
    cleaned_text = '\n'.join(paragraphs)
    
    # Write the cleaned text to a file, appending '.txt' to the original path and using utf-16 encoding
    with open(ruta + '.txt', "w", encoding="utf-16") as archivo:
        archivo.write(cleaned_text)
    
    # Optionally, return the cleaned text if needed
    return cleaned_text


In [12]:
ruta = './Data/Feynman1982_Article_SimulatingPhysicsWithComputers.pdf'

In [13]:
paper = extractor_texto(ruta)

In [14]:
with open("./Data/output.txt", "w") as text_file:
    print(paper, file=text_file)

In [15]:
loader = UnstructuredFileLoader('./Data/output.txt')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(openai_api_key = api_key)

db = Chroma.from_documents(texts, embeddings)

Created a chunk of size 1460, which is longer than the specified 1000
Created a chunk of size 1493, which is longer than the specified 1000
Created a chunk of size 1480, which is longer than the specified 1000
Created a chunk of size 1385, which is longer than the specified 1000
Created a chunk of size 1425, which is longer than the specified 1000
Created a chunk of size 1405, which is longer than the specified 1000
Created a chunk of size 1348, which is longer than the specified 1000
Created a chunk of size 1484, which is longer than the specified 1000
Created a chunk of size 1354, which is longer than the specified 1000
Created a chunk of size 1467, which is longer than the specified 1000
Created a chunk of size 1318, which is longer than the specified 1000
Created a chunk of size 1437, which is longer than the specified 1000
Created a chunk of size 1490, which is longer than the specified 1000
Created a chunk of size 1315, which is longer than the specified 1000
Created a chunk of s

En seguida, cargamos el documento a la base de datos.

In [16]:
qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(openai_api_key=api_key), chain_type="stuff", vectorstore=db, k=1)

Este modelo ahora se puede utilizar como un ChatGPT con conocimiento especializado:

In [17]:
query = "What the document is about?"
qa.run(query)

'The document is about the problem of simulating physics with computers. Richard Feynman discusses the possibility of using computers to learn more about physical laws and explores the intersection between computers and physics.'

In [18]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": query,
        }
    ],
    model="gpt-3.5-turbo",
    #model="gpt-4",
)
print(chat_completion.choices[0].message.content)

I apologize, but you have not provided any information about the document you are referring to. Could you please provide more specific details?


Y es posible obtener respuestas muy específicas y personalizadas sobre nuestros documentos:

In [19]:
query = "How photons are polarized?"
qa.run(query)

"Photons can be polarized through various methods. One common method is through the use of polarizing filters, which are materials that only allow light waves vibrating in a particular direction to pass through. When unpolarized light passes through a polarizing filter, it becomes polarized in the same direction as the filter's orientation. Another method is by reflection, where light waves bouncing off a surface become partially polarized in the direction parallel to the surface. Additionally, certain materials, like calcite crystals mentioned in the context, can also polarize light through their internal structure and optical properties."

In [20]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": query,
        }
    ],
    model="gpt-3.5-turbo",
    #model="gpt-4",
)
print(chat_completion.choices[0].message.content)

Photons, which are particles of light, can be polarized through various mechanisms. Polarization refers to the direction in which the electric field oscillates as the photon propagates through space.

One way of polarizing light is through the use of polarizing filters. These filters are composed of a material that selectively transmits light waves vibrating in a specific plane of polarization, while blocking others. The filter aligns the electric field of the light waves with a particular direction, thus polarizing the light.

Another method of polarizing light is through reflection or scattering. When light waves strike a surface, such as a shiny non-metallic material or a rough metal surface, the reflected or scattered light becomes polarized to some extent. This occurs because the electric field of the incident light is preferentially oriented in a specific direction upon reflection or scattering.

Additionally, certain materials can induce polarization in light waves by interactin

In [21]:
query = "Expand the concept of negative probabilities"
qa.run(query)

'Negative probabilities are a concept that arises in certain mathematical and theoretical frameworks, particularly in quantum mechanics and information theory. In these contexts, negative probabilities do not represent actual probabilities in the traditional sense, but rather mathematical quantities that can be used to describe certain phenomena or calculations.\n\nNegative probabilities can occur when dealing with systems that exhibit interference or entanglement, where the probabilities of different outcomes can interfere with each other. In such cases, the probabilities assigned to different events may have negative values, indicating a sort of "anti-chance" or counterbalancing effect.\n\nIt\'s important to note that negative probabilities do not have a direct physical interpretation and cannot represent the likelihood of an event occurring in the real world. They are simply mathematical tools used to describe and calculate certain phenomena in a consistent and coherent manner.\n\nW

In [22]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": query,
        }
    ],
    model="gpt-3.5-turbo",
    #model="gpt-4",
)
print(chat_completion.choices[0].message.content)

Negative probabilities are a mathematical concept that may seem counterintuitive at first, as probabilities are generally understood to be values between 0 and 1, representing the likelihood of an event occurring. However, negative probabilities have been introduced in certain contexts to extend the mathematical formalism and provide a more comprehensive framework for modeling and understanding complex phenomena.

In traditional probability theory, positive probabilities are used to represent the likelihood of events happening, while zero probability denotes an event that cannot occur. Negative probabilities, on the other hand, have been incorporated in different branches of mathematics and physics, such as quantum mechanics and information theory, to capture certain phenomena that cannot be effectively described using only positive probabilities.

One important area where negative probabilities have found application is quantum mechanics. In quantum mechanics, probabilities are used t

In [23]:
query = "What are quantum computers?"
qa.run(query)

'Quantum computers are a type of computer that utilize the principles of quantum mechanics to perform computations. Unlike classical computers that use bits to represent information as either 0 or 1, quantum computers use quantum bits or qubits, which can represent information as a combination of 0 and 1 simultaneously due to a property called superposition. This allows quantum computers to perform certain calculations much faster than classical computers. Additionally, quantum computers can also take advantage of another quantum property called entanglement, which allows qubits to be linked together in such a way that the state of one qubit can affect the state of another, even if they are physically separated. This property can be harnessed to perform complex computations and solve problems that are currently intractable for classical computers.'

In [24]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": query,
        }
    ],
    model="gpt-3.5-turbo",
    #model="gpt-4",
)
print(chat_completion.choices[0].message.content)

Quantum computers are a type of computer that utilize principles of quantum mechanics to perform computational tasks. Unlike classical computers that store and process information using bits (which can represent either a 0 or a 1), quantum computers use quantum bits or qubits. Qubits can exist in a superposition of states, simultaneously representing both 0 and 1, which enables quantum computers to perform certain calculations at an exponentially faster rate compared to classical computers.

Quantum computers leverage quantum phenomena like superposition, entanglement, and interference to perform complex computations. They can handle large amounts of data simultaneously and quickly, making them potentially suitable for solving problems that are currently intractable for classical computers. Some of the applications of quantum computers include optimization, cryptography, machine learning, quantum simulations, and drug discovery.

However, developing and maintaining stable and error-fre

**Conexión con la API de OpenAI:**

1) ¿Qué información necesitas para autenticarte y realizar peticiones al API de OpenAI?
2) ¿Cuál es el propósito de utilizar una clave API en la conexión con OpenAI y cómo se debe proteger?
3) ¿Qué diferencias hay entre los distintos modelos de OpenAI y cómo elegirías uno para tu aplicación específica?

**Uso de GPT-3.5 Turbo:**

4) ¿Qué ventajas ofrece GPT-3.5 Turbo para la creación de chatbots comparado con versiones anteriores?
5) ¿Cómo se formulan las peticiones al modelo GPT-3.5 Turbo para generar respuestas coherentes y relevantes?
6) ¿Qué limitaciones tiene el modelo GPT-3.5 Turbo y cómo puedes mitigarlas?

**Introducción de Material a la Base de Conocimiento:**

7) ¿Cómo puedes personalizar las respuestas de GPT-3.5 Turbo utilizando información específica?
8) ¿Cuál es la importancia de la relevancia y precisión del material que se introduce en la base de conocimientos del bot?
9) ¿Qué estrategias se pueden utilizar para mantener actualizada la base de conocimientos del chatbot?

**Personalización y Respuestas del Chatbot:**

10) ¿De qué manera se puede ajustar el tono o el estilo de las respuestas que genera GPT-3.5 Turbo?
11) ¿Cómo afecta el contexto proporcionado a las respuestas generadas por el chatbot?
    Describe un método para evaluar la precisión y utilidad de las respuestas del chatbot.

**Problemas Éticos y de Privacidad:**

12) ¿Cuáles son las consideraciones éticas al utilizar modelos de lenguaje generativos como GPT-3.5 Turbo en un chatbot?
13) ¿Cómo debería manejar un chatbot las solicitudes de datos personales o sensibles de los usuarios?
14) ¿Qué medidas se pueden tomar para garantizar la privacidad y la seguridad de los usuarios al interactuar con un chatbot?

![Lenguaje Matemático](./Images/Matematicas.png)

![Contacto](./Images/Contacto.png)