# LangTorch

To install LangTorch using pip:

In [None]:
%%capture
# Setup Colab
%load_ext autoreload
%autoreload 2

# Install LangTorch
!pip install langtorch

To use the OpenAI API as our LLM, we need to set the `OPENAI_API_KEY` environment variable. You can find your API key on platform.openai.com

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "your_api_key"  # Replace with your actual OpenAI API key

## 1. Perform multiple LLM calls with TextTensors

In [None]:
from langtorch import TextTensor  # holds texts instead of weights, supports tensor operations
from langtorch import TextModule  # torch.nn modules working on TextTensors, perform prompt templating and llm calls

**`TextTensors`** are designed to streamline working with many pieces of text and performing parallel LLM calls. `langtorch.TextTensor` is a subclass of PyTorch's `torch.Tensor` that:

- **Holds text entries** instead of numerical weights.
- **Special Structure:** `TextTensors` entries can represent chunked documents, prompt templates, completion dictionaries, chat histories, and more.
- **Represents Geometrically:** `TextTensors` have a shape and can be modified with PyTorch functions (reshape, stack, etc.).



---

In this example, we will create tensors holding prompt templates, fill them with a tensor of completion dictionaries, and send them to the OpenAI API.


In [None]:
prompt_tensor = TextTensor([["Is this an email address? {input_field}"],
                            ["Is this a valid web link? {input_field}"]])

# Adding TextTensors appends their content according to broadcasting rules
prompt_tensor += " (Answer 'Yes' or 'No')"
print(prompt_tensor)
print("Shape =", prompt_tensor.shape)

`TextModules` are `torch.nn.Modules` that work on `TextTensors`:

- **Tensor of Prompts:** They hold a tensor of prompts instead of numerical weights.
- **Input Handling:** They accept `TextTensors` as input, which are used to format the prompt tensor.
- **Formatting and Broadcasting:** This allows formatting multiple prompts on multiple completions, controlling which prompt gets which input through broadcasting rules.
- **Activation Function:** Most torch layers end with an *activation function*. Similarly, `TextModules` end in an *activation* of an LLM call.

In this example, we will create a `TextModule` that ends in a call to an OpenAI model. **This module can now execute both tasks in parallel on as many inputs as we'd like:**


In [None]:
tasks_module = TextModule(prompt_tensor, activation="gpt-3.5-turbo")

input_completions = TextTensor([{"input_field": "contact@langtorch.org"}, {"input_field": "https://langtorch.org" }])

# The first row of the output are answers to "Is this an email address?", second to "Is this a valid web link?"
# Columns are the two input completions
print(tasks_module(input_completions))

### Comparison with the OpenAI Package

The `TextModule` above both formats the prompts and sends them to the OpenAI activation (`langtorch.OpenAI`). Let's compare LangTorch to the OpenAI package.

First, we'll separate the formatting and API steps.

> A core feature of `TextTensors` is that they allow us to easily format several prompts on several inputs.
>
> LangTorch achieves this by **defining the product** of two `TextTensors`: `text1*text2` as an operation akin to `text1.format(**text2)`. As shown below this is what happens in a `TextModule` before adding an activation:


In [None]:
# Using TextModule
tasks_module = TextModule(prompt_tensor)
prompts = tasks_module(input_completions)

# Equivalently, using "TextTensor multiplication"
prompts = prompt_tensor*input_completions
print(prompts)

 The code above introduces the multiplication operation (used in TextModules), which acts like a more powerful format operation and allows for the various features of TextTensors. For a more in depth look, see [TextTensor Multiplication](langtorch.org/reference/multiplication).

We can send the formatted prompts to the OpenAI API by creating a `langtorch.OpenAI` module (the "activation") and compare speed between three API use cases:


In [None]:
import openai
import langtorch
import time

langtorch_api = langtorch.OpenAI("gpt-3.5-turbo", system_message="You are a helpful assistant.", max_token=1, T=0.)
openai_api = openai.OpenAI()

# Open AI package
start = time.time()
responses = []
for prompt in prompts.flat:
    responses.append(openai_api.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful assistant."},
                  {"role": "user", "content": prompt}],
        max_tokens=1,
        temperature=0.
    ))
print(f"1.\n{str(responses)[:125]}...")
print(f" OpenAI loop time taken: {time.time() - start:.2f} seconds")

# LangTorch
start = time.time()
responses = langtorch_api(prompts)
print(f"2.\n{responses}")
print(f" LangTorch time taken: {time.time() - start:.2f} seconds")

# LangTorch on repeated requests
start = time.time()
responses = langtorch_api(prompts)
print(f"3.\n{responses}")
print(f" LangTorch on repeated requests time taken: {time.time() - start:.2f} seconds")





The OpenAI Activation in LangTorch (`langtorch.OpenAI`) isn't just a wrapper around the OpenAI package. The observed speed up comes from the fact that the LangTorch implementation:

- Sends API calls in parallel, allowing multiple completions to be generated much faster than calling the OpenAI chat completion endpoint in sequence.
- Saves on tokens and speeds up subsequent calls by caching API results, especially for embeddings and when the temperature is set to zero.
- Optimizes requested API calls removing duplicates

`langtorch.OpenAI` also:

- Operates directly on `TextTensors`, returning calls in the same shape as the input.
- Handles API errors and retries by default

## 2. Implementing popular methods

### Chained Calls and Simple Zero-Shot Chain of Thought

LangTorch integrates seamlessly with torch, allowing you to easily chain `TextModules` using `torch.nn.Sequential`. This can be used to chain multiple LLM calls or additional prompting methods. A simplified example is a zero-shot Chain of Thought, for which we can create a reusable `TextModule`:

In [None]:
CoT = TextModule("{*} Let's think step by step.")

`{}` in a prompt template is a positional argument, taking one input argument in each entry. For our chain of thought module we use the placeholder `{*}`, which is a "wildcard" key that places all the input entries in its place.

Now to chain these `torch.nn.Sequential`:

In [None]:
import torch
calculate = TextModule("Calculate the following: {} = ?")

calculate_w_CoT = torch.nn.Sequential(
    calculate,
    CoT,
    langtorch.OpenAI("gpt-3.5-turbo") ,
    # You can add sequential calls here
)

input_tensor = TextTensor(["170*32", "123*45/10", "2**10*5"])
output_tensor = calculate_w_CoT(input_tensor)
output_tensor.view(1,-1) # We use torch methods to reshape TextTensors and view entries in columns

### Ensemble / Self-Consistency

Representing texts geometrically in a matrix or tensor allows for creating meaningful structures. Methods like ensemble voting and self-consistency involve generating multiple completions for the same task, easily represented by adding a dimension.

In this example, we build a module that creates multiple Chain-of-Thought answers for each input. These create separate `TextTensor` entries that we combine using a "linear layer" to marginalize over them, improving overall performance (see [Wang et al., 2022](https://arxiv.org/abs/2203.11171)).


In [None]:
calculate = TextModule("Calculate the following: {} = ? Let's think step by step.")

ensemble_llm = langtorch.OpenAI("gpt-3.5-turbo",T=1.4,n = 3) # 3 completions per input with high temperature

combine_answers = langtorch.Linear([[ f"\nAnswer {i}: " for i in [1,2,3] ]]) # Here we use properties of matrix multiplication:
# Linear uses matmul, where row_of_labels @ column_of_completions == one long entry with labeled completions

chose = TextModule("Select from these reasoning paths the most consistent final answer: {}")

llm = langtorch.OpenAI("gpt-3.5-turbo", T=0)

self_consistent_calculate = torch.nn.Sequential(
    calculate,
    ensemble_llm,
    combine_answers,
    chose,
    llm
)

In [None]:
input_tensor = TextTensor("171*33")
self_consistent_calculate(input_tensor)

Saving results from repeating these calls, let's us see accuracy increasing from 25% (using `calculate`) to well over 50% (using `self_consistent_calculate`) on this input.




## 3. Automatic TextTensor Embeddings for Building Retrievers

`TextTensors` offer a straightforward way to work with embeddings. Every `TextTensor` can generate its own embeddings -- held in a torch tensor that preserves their shape. Moreover, `TextTensors` automatically act as their embeddings when passed to torch functions like cosine similarity.

These representations (available under the `.embedding` attribute) are created automatically right before they are needed, using a set embedding model (default is OpenAI's `text-embedding-3-small`).


In [None]:
import torch

tensor1 = TextTensor([[["Yes"],
                       ["No"]]])
tensor2 = TextTensor(["Yeah", "Nope", "Yup", "Non"])

torch.cosine_similarity(tensor1, tensor2)

We can access the embedding tensor under `.embedding`, change the embedding model and embed using `.embed()`:

In [None]:
# To change embedding model and embed
tensor1.embedding_model = "text-embedding-3-large"
tensor1.embed()
# To access the embedding tensor
tensor1.embedding

## Working with embeddings and documents (parsing, chunking and indexing)

To enable its functionalities `TextTensor` entries aren't just strings, but structured [`Text`](https://langtorch.org/reference/text) objects, which can be created from f-string templates, dictionaries and markup documents and are represented by a sequence of `(label, text)` pairs.

For the next task we need chunked text data. We can use the above fact to convenietly manipulate markdown files -- in this example, a [paper on the abilities of language models](https://link.springer.com/article/10.1007/s11023-022-09602-0).

In [None]:
%%capture
!wget https://raw.githubusercontent.com/adamsobieszek/langtorch/main/src/langtorch/conf/paper.md

We can create a tensor with each markdown block in a seperate entry simply with:

In [None]:
paper = TextTensor.from_file("/content/paper.md")
print(paper[:3],"\n  (...)")
print(f"shape = {paper.shape}")

As the text has headers and other text blocks, we need to extract only paragraphs. This is where text entries being structured becomes useful, as LangTorch provides `iloc` and `loc` accessors for `Text` entries and tensors:

In [None]:
# Select paragraphs
paragraphs = paper.loc["Para"]
# Remove empty entries
paragraphs = paragraphs[paragraphs!=""]
print(paragraphs[:].apply(lambda x: x[:40] + "..."))

### Build Custom Retriever and RAG modules

For complex modules we can subclass `TextModule` and as in PyTorch define our own init and forward methods.

Using how `TextTensors` can automatically act as a`Tensor` of its embeddings, we can very compactly implement e.g. a retriever, which for each entry in the input finds in parallel `k` entries with the highest cosine similarity among the documents it holds:

In [None]:
class Retriever(TextModule):
    def __init__(self, documents: TextTensor):
        super().__init__()
        self.documents = TextTensor(documents).view(-1)

    def forward(self, query: TextTensor, k: int = 5):
        cos_sim = torch.cosine_similarity(self.documents, query)
        return self.documents[cos_sim.topk(k)]

In [None]:
retriever = Retriever(paragraphs)
query = TextTensor("What's the relationship between prediction and compression?")
retriever(query)

Note how the implementation didn't require us to learn about any new operations we would not find in regular PyTorch. One goal of LangTorch is to give developers control over these lower level operations, while being able to write compact code without a multitude of classes. For this reason implementations such as the retriever above are not pre-defined classes in the main package.  
We can now compose this module with a Module making LLM calls to get a custom Retrieval Augmented Generation pipeline:


In [None]:
class RAG(TextModule):
    def __init__(self, documents: TextTensor, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.retriever = Retriever(documents)

    def forward(self, user_message: TextTensor, k: int = 5):
        retrieved_context = self.retriever(user_message, k) + "\n"
        user_message = user_message + "\nCONTEXT:\n" + retrieved_context.sum()
        return super().forward(user_message)


In [None]:
rag_chat = RAG(paragraphs,
               prompt="Use the context to answer the following user query: ",
               activation="gpt-3.5-turbo")
assistant_response = rag_chat(query)
print(assistant_response.reshape(1,1))

With only small modifications to the retriever this module could also perform batched inference — performing multiple simultaneous queries without much additional latency. Note, `prompt` and `activation` are arguments inherited from TextModule and need the `super().forward` call to work.
We are excited to see what you will build with LangTorch. If you want to share some examples or have any questions, feel free to ask on our [discord](https://discord.gg/jkreqtCCkv). In the likely event of encountering a bug send it on discord or post on the [GitHub Repo](https://github.com/AdamSobieszek/langtorch) and we will fix it ASAP.