# Lanlab

### General outlook of the framework's structure

Lanlab is a framework that aims at providing convenient and simple tools to help researchers build their own language model pipeline. As research in language model is developing rapidly, it is important to have a flexible framework that can be easily extended to support new models and new tasks. Lanlab is built around very simple and general concepts, and it is easy to extend it to support new models and new tasks.

Objects in lanlab are either **stuctures** or **modules** that operate on the stuctures. 

In native lanlab, there are three types of structures :
- Segment
- Sequence (lists of segments)
- Batch (array of sequences)

Let's start with the basics : Text

### Text

#### Creating a text

In [1]:
#Creating text segments
from lanlab import Text

my_text = Text("I like chocolate")

print("Text creates a sequence with one segment containing the given text")
print("len(my_text) = ", len(my_text))
print("my_text[0].__class__ = ", my_text[0].__class__)
print("my_text[0].format() = ", my_text[0].format())
print()
print("The format() method returns a formatting of the text of the segment but it can also be used on sequences concatenating the formatting of all its segments")
print("my_text.format() = ", my_text.format())

Text creates a sequence with one segment containing the given text
len(my_text) =  1
my_text[0].__class__ =  <class 'lanlab.core.structure.segment.Segment'>
my_text[0].format() =  I like chocolate

The format() method returns a formatting of the text of the segment but it can also be used on sequences concatenating the formatting of all its segments
my_text.format() =  I like chocolate


#### Concatenating segments

In [2]:
#Concatenating segments
my_text2 = Text(" and I like ice cream")
my_text3 = my_text + my_text2

print("Texts can be concatenated with the + operator")
print("my_text3 = my_text + my_text2")
print("my_text3.format() = ", my_text3.format())

print()
print("my_text3 contains 2 segments")
print("len(my_text3) = ", len(my_text3))
print("my_text3[0].format() = ", my_text3[0].format())
print("my_text3[1].format() = ", my_text3[1].format())

Texts can be concatenated with the + operator
my_text3 = my_text + my_text2
my_text3.format() =  I like chocolate and I like ice cream

my_text3 contains 2 segments
len(my_text3) =  2
my_text3[0].format() =  I like chocolate
my_text3[1].format() =   and I like ice cream


#### Editing segments data

In [3]:
#Segment data
segment = my_text3[0]

print("Segments store a lot of data more than text",segment.keys())
print("segment['text'] = ", segment['text'],"-- is the text of the segment")
print("segment['format'] = ", segment['format'],"-- is the formatting system of the segment. LLMs will get the completion formatting or chat formatting depending on their nature. This formating will just replace the [text] with the segment's text")

print()
segment['format']['chat'] = '{[text]}'
print("Let's update it :",segment['format'])
print("segment.format('completion') = ", segment.format('completion'),"-- is the completion formatting")
print("segment.format('chat') = ", segment.format('chat'),"-- is the chat formatting")
print("This makes it possible to use the same segment and to adapt the prompting for chat and completion models when needed")

print()
print("segment['origin'] = ", segment['origin'], "-- is the origin of the segment. It is about who generated this segment. It can be a user, a model or the system. This will then be communicated to chat models. This information isn't used by completion models")
print("segment['model'] = ", segment['model'], "-- is the name of the model that generated this segment. It is put to None if the user created it")

print()
print("There are many more that will be discussed later as they are not important for now")

Segments store a lot of data more than text ['text', 'origin', 'tags', 'info', 'format', 'model', 'tokens', 'logp', 'top_logp', 'finish_reason', 'logits']
segment['text'] =  I like chocolate -- is the text of the segment
segment['format'] =  {'chat': '[text]', 'completion': '[text]'} -- is the formatting system of the segment. LLMs will get the completion formatting or chat formatting depending on their nature. This formating will just replace the [text] with the segment's text

Let's update it : {'chat': '{[text]}', 'completion': '[text]'}
segment.format('completion') =  I like chocolate -- is the completion formatting
segment.format('chat') =  {I like chocolate} -- is the chat formatting
This makes it possible to use the same segment and to adapt the prompting for chat and completion models when needed

segment['origin'] =  user -- is the origin of the segment. It is about who generated this segment. It can be a user, a model or the system. This will then be communicated to chat mode

### Models

#### Using a model

In [4]:
#Calling a model
from lanlab import GPT35
model = GPT35()

print("Modules can be called on sequences to complete them with an additional segment")
my_text4 = model(my_text3)
print("new_text = model(my_text3) = ", my_text4.format())

print()
print("Chats are not very readable with the format method as it concatenates all the segments. Let's use the show method instead to print a human readable version of the sequence")
print("print(new_text) = ")
print(my_text4.show())


Modules can be called on sequences to complete them with an additional segment
new_text = model(my_text3) =  I like chocolate and I like ice creamThat's great! Chocolate and ice cream make a delicious combination. Do you have

Chats are not very readable with the format method as it concatenates all the segments. Let's use the show method instead to print a human readable version of the sequence
print(new_text) = 
user(None): I like chocolate
user(None):  and I like ice cream
assistant(gpt-3.5-turbo-0613): That's great! Chocolate and ice cream make a delicious combination. Do you have



Some clarifications are needed here to explain how lanlab handles text and present them to the models. For completion models, the model will get as input q string : ```sequence.format("completion")```. For chat models a chat is required. The ```segment.format("chat")``` for the content along with ```segment.['origin']``` for the origin of the text are used to feed the model. Each segment is a message, a sequence being a conversation. The conversation can include system messages as well.

#### Configuring a model

In [5]:
#Configuring a model
print("Modules have their own configurations like segments")
print("model.config.keys() = ", model.config.keys())

print()
print("model.config['max_tokens'] = ", model.config['max_tokens'], "-- is the maximum number of tokens of the generated text")
print("model.config['temperature'] = ", model.config['temperature'], "-- is the temperature of the generated text")
print("model.config['top_p'] = ", model.config['top_p'], "-- is the top_p of the generated text")

print("The other parameters are more complex and will be discussed later")

Modules have their own configurations like segments
model.config.keys() =  ['temperature', 'max_tokens', 'top_p', 'stop', 'logit_bias']

model.config['max_tokens'] =  16 -- is the maximum number of tokens of the generated text
model.config['temperature'] =  0.7 -- is the temperature of the generated text
model.config['top_p'] =  1 -- is the top_p of the generated text
The other parameters are more complex and will be discussed later


In [6]:
#Available models
from lanlab import get_hf_model_classes, get_openai_model_classes
print("In native lanlab there are 3 preimplemented bridges to other framework : OpenAI, HuggingFace and Palm (coming soon)")
print("In the future more will be added and the user is free to add his own")
print("The available models are :")
print("OPENAI")
for m in get_openai_model_classes():
    print(" ",m().engine)

print()
print("HuggingFace")
for m in get_hf_model_classes():
    print(" ",m().engine)

print()
print("The classes can be used to create new models and are referenced in lanlab.get_openai_model_classes() and lanlab.get_hf_model_classes()")
print("Please note that OPENAI models access require an API key and HuggingFace models require hosting them on your own computer with all their own requirements")

In native lanlab there are 3 preimplemented bridges to other framework : OpenAI, HuggingFace and Palm (coming soon)
In the future more will be added and the user is free to add his own
The available models are :
OPENAI
  ada
  text-ada-001
  babbage
  text-babbage-001
  curie
  text-curie-001
  davinci
  davinci-instruct-beta
  text-davinci-001
  code-davinci-002
  text-davinci-002
  text-davinci-003
  gpt-3.5-turbo-instruct
  gpt-3.5-turbo-instruct-0914
  davinci-002
  babbage-002
  gpt-3.5-turbo
  gpt-3.5-turbo-0613
  gpt-3.5-turbo-0301
  gpt-3.5-turbo-16k
  gpt-3.5-turbo-16k-0613
  gpt-4
  gpt-4-0314
  gpt-4-0613
  gpt-4-1106-preview
  gpt-4-vision-preview

HuggingFace
  None
  llama-7b
  llama-13b
  alpaca-7b
  wizard-7b
  vicuna-7b-v1.1
  vicuna-7b-v1.3
  vicuna-7b-v1.5
  vicuna-13b-v1.1
  vicuna-13b-v1.3
  vicuna-13b-v1.5
  baize-7b
  guanaco-7b
  tiny-llama-fast-tokenizer
  llama-2-7b
  llama-2-13b
  llama-2-7b-hf
  llama-2-13b-hf
  Orca-2-7b
  Orca-2-13b
  None
  bloom-3b
  blo

### Batches

In [7]:
#Batches
from lanlab import Batch,set_number_workers
print("Batches are arrays of sequences")
batch = Batch((5,2)) #Creates a batch of 5x2 empty sequences
print("batch.shape = ", batch.shape)
print("batch[0,0] = ", batch[0,0])

print()
print("Batched sequences can be accessed with the same syntax as numpy arrays and can be filled with your own sequences")
batch[0,0] = my_text
print("batch[0,0] = ", batch[0,0].format())

for i in range(5):
    for j in range(2):
        batch[i,j] = Text(str(i)+"+"+str(j)+"=")

set_number_workers(10) #Set the number of queries to be sent in parallel to the model
print("Models can be called on top of batches and will process sequences in parallel up to a given batch size set_number_workers(n). This speeds up the process a lot.")
batch_completed = model(batch)
for i in range(5):
    for j in range(2):
        print("batch_completed[",i,",",j,"] = ", batch_completed[i,j].format())

Batches are arrays of sequences
batch.shape =  (5, 2)
batch[0,0] =  Sequence(
)

Batched sequences can be accessed with the same syntax as numpy arrays and can be filled with your own sequences
batch[0,0] =  I like chocolate
Models can be called on top of batches and will process sequences in parallel up to a given batch size set_number_workers(n). This speeds up the process a lot.
batch_completed[ 0 , 0 ] =  0+0=0
batch_completed[ 0 , 1 ] =  0+1=1
batch_completed[ 1 , 0 ] =  1+0=1
batch_completed[ 1 , 1 ] =  1+1=2
batch_completed[ 2 , 0 ] =  2+0=2
batch_completed[ 2 , 1 ] =  2+1=2 + 1 = 3
batch_completed[ 3 , 0 ] =  3+0=3
batch_completed[ 3 , 1 ] =  3+1=4
batch_completed[ 4 , 0 ] =  4+0=4
batch_completed[ 4 , 1 ] =  4+1=5


### Sequential

In [8]:
#Sequential
from lanlab import Sequential
print("Sequential is a container that can be used to chain modules")
from lanlab import GPT35_0613, GPT35_0301

#This will first call the model, then append the text "Are you sure ?" to the completion and then call the model again. TO make it more convenient, despite text not being a module it is interpreted as a module that concatenates the text to the sequence.
model1 = GPT35_0613()
model2 = GPT35_0301().configure(max_tokens=32)
seq = Sequential(model1, "Are you sure ?", model2)

results = seq(batch)
for i in range(5):
    for j in range(2):
        print("results[",i,",",j,"] = \n", results[i,j].show())
        print()

Sequential is a container that can be used to chain modules
run
results[ 0 , 0 ] = 
 user(None): 0+0=
assistant(gpt-3.5-turbo-0613): 0
user(None): Are you sure ?
assistant(gpt-3.5-turbo-0301): Yes, I am sure. When you add zero to zero, you get zero as a result.


results[ 0 , 1 ] = 
 user(None): 0+1=
assistant(gpt-3.5-turbo-0613): 1
user(None): Are you sure ?
assistant(gpt-3.5-turbo-0301): Yes, as an AI language model, I am programmed to perform basic arithmetic operations, and the sum of 0 and 1 is always 1.


results[ 1 , 0 ] = 
 user(None): 1+0=
assistant(gpt-3.5-turbo-0613): 1
user(None): Are you sure ?
assistant(gpt-3.5-turbo-0301): Yes, I am an AI language model and I am sure that 1 + 0 equals 1.


results[ 1 , 1 ] = 
 user(None): 1+1=
assistant(gpt-3.5-turbo-0613): 2
user(None): Are you sure ?
assistant(gpt-3.5-turbo-0301): Yes, as an AI language model, I am programmed to perform basic arithmetic operations accurately, and 1+1 always equals 2.


results[ 2 , 0 ] = 
 user(None): 

### Save and Load

In [9]:
from lanlab import save, load
import os

#Saving a structure (segment, sequence, batch)
save(batch,os.path.join("tutorial_files","my_batch"))

#Loading a structure (segment, sequence, batch)
loaded_batch = load(os.path.join("tutorial_files","my_batch"))

#Saving automatically in Sequential
seq(batch,os.path.join("tutorial_files","my_seq2"))

#If you run it again, it will load the saved file instead of running the model again
results = seq(batch,os.path.join("tutorial_files","my_seq2")) #Another computation has the same path -> load it instead of running it

### Loading HF models

To run this cell you need to ```git clone https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer``` and to follow the instructions in the lanlab README repository to install the model. This is a dummy model that completes with random token but is very convenient to verify that the pipeline works before running it on larger models on a GPU cluster.

In [10]:
#HuggingFace models require to be hosted on the local computer. Lanlab provides a way to do so
from lanlab import TinyLlama,load
import os
batch = load(os.path.join("tutorial_files","my_batch"))
model = TinyLlama() #This doesn't load the model yet and doesn't require memory

with model.host(port=52431) as server: #Starts the server on port 12345
    out = model(batch) #The model will load when it gets the first query. This can take a few minutes. If you don't host the model before, the query will crash
#memory is freed at the end of the scope

for i in range(5):
    for j in range(2):
        print("out[",i,",",j,"] = \n", out[i,j].show())
        print()

 * Serving Flask app 'lanlab.core.module.models.hf_models' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


INFO:werkzeug: * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:52431
 * Running on http://10.188.129.27:52431 (Press CTRL+C to quit)


model dtype torch.float32


  return torch._C._cuda_getDeviceCount() > 0
Using pad_token, but it is not set yet.
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [12/Dec/2023 15:17:59] "POST /completions HTTP/1.1" 200 -


out[ 0 , 0 ] = 
 user(None): 0+0=
assistant(TLLA):  second otroROP blah goldengetNameioǧheit spark accompgas rein орган soupюза


out[ 0 , 1 ] = 
 user(None): 0+1=
assistant(TLLA):  приня Top용ew ocupamaут участ mens располо illorigine hockey сталиждён synd


out[ 1 , 0 ] = 
 user(None): 1+0=
assistant(TLLA):  precise имеет площа Rightktet".$ зда wordsasy nombreux Even SongetailedVariable Arc th


out[ 1 , 1 ] = 
 user(None): 1+1=
assistant(TLLA): xf智 Nebenвсямом catch pride Agricult який Reinogo regiónlookignon Young fruit


out[ 2 , 0 ] = 
 user(None): 2+0=
assistant(TLLA): bareca canción Fail ocean════łużulyPer dim opposedrepabadciale accomplishedteenth


out[ 2 , 1 ] = 
 user(None): 2+1=
assistant(TLLA):  TováFig customers більUMN限 enjoyeduralgetElementんпростра culture Erstcribed帝 Region


out[ 3 , 0 ] = 
 user(None): 3+0=
assistant(TLLA):  ВелиO posible事 xml estim badly Spirit Kostouri doub XIII DouEL repla trab


out[ 3 , 1 ] = 
 user(None): 3+1=
assistant(TLLA):  String 'UID disp