https://python.langchain.com/docs/tutorials/llm_chain/

In [15]:
import os

from dotenv import load_dotenv
from IPython.display import Markdown
from langchain_mistralai import ChatMistralAI

In [22]:
# Load environment variables from .env file
load_dotenv()

# Retrieve API key from the environment
api_key = os.getenv("MISTRAL_API_KEY")

if api_key:
    model = ChatMistralAI(name="open-mixtral-8x22b")
else:
    print("API key not found in environment or .env file.")

In [8]:
from langchain_core.messages import HumanMessage, SystemMessage

In [11]:
messages = [
    SystemMessage(content="Answer briefly!"),
    HumanMessage(content="What is indexing in database?"),
]

result = model.invoke(messages)

In [12]:
from langchain_core.output_parsers import StrOutputParser

In [13]:
parser = StrOutputParser()

In [16]:
Markdown(parser.invoke(result))

Indexing in a database is a technique used to optimize the performance of a database by reducing the time it takes to retrieve data. It is similar to the index found at the back of a book, which allows you to quickly find specific information without having to scan every page. In a database, an index is a data structure that stores a small portion of the data in a table, along with a pointer to the location of the full record. This allows the database to quickly locate and retrieve specific records based on the indexed data, improving search and query performance.

In [23]:
chain = model | parser

In [24]:
messages = [
    SystemMessage(content="Give exact math formula for provided term"),
    HumanMessage(content="Self-attention in transformer architecture"),
]

In [25]:
result = chain.invoke(messages)

In [26]:
Markdown(result)

The self-attention mechanism in the Transformer architecture can be calculated using the following formulas:

First, the input sequence `x` is represented as a matrix `X` with dimensions `(batch_size, sequence_length, embedding_dimension)`.

Then, the query, key, and value matrices `Q`, `K`, and `V` are computed as follows:

Q = X \* W^Q
K = X \* W^K
V = X \* W^V

where `W^Q`, `W^K`, and `W^V` are learnable weight matrices with dimensions `(embedding_dimension, heads * head_dimension)`, and `heads` is the number of attention heads.

Next, the attention scores `A` are calculated as:

A = softmax(Q \* K^T / sqrt(head_dimension))

where `K^T` is the transpose of the key matrix, `sqrt` is the square root function, and `softmax` is the row-wise softmax function.

Finally, the output of the self-attention layer `O` is computed as:

O = A \* V

where `*` denotes matrix multiplication.

Therefore, the self-attention mechanism in the Transformer architecture can be summarized as:

O = softmax(Q \* K^T / sqrt(head\_dimension)) \* V

where `Q`, `K`, and `V` are computed from the input sequence matrix `X` using learnable weight matrices `W^Q`, `W^K`, and `W^V`. The output `O` has the same shape as the input `X`.

In [27]:
from langchain_core.prompts import ChatPromptTemplate

In [28]:
system_template = "Translate the following into {language}:"
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{text}")]
)

In [31]:
result = prompt_template.invoke({"language": "Russian", "text": "Machine Learning is good"})
result.to_messages()

[SystemMessage(content='Translate the following into Russian:', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Machine Learning is good', additional_kwargs={}, response_metadata={})]

In [32]:
chain = prompt_template | model | parser

In [33]:
out = chain.invoke({"language": "Russian", "text": "Machine Learning is good"})
Markdown(out)

Машинное обучение хорошо (Mashinnoe obuchenie horosho).

I hope this helps! Let me know if you have any other questions.