# LLaMA Loading Tutorial
- Environment
    - torch
    - fairscale
    - fire
    - sentencepiece==0.1.97
    - hiq-python (if hiq is needed)
- model and ckpt
    - follow the command bellow (or Just using url to get)

In [2]:
import sys
import os

# Get the absolute path of the directory containing the current script
current_dir = os.path.dirname(os.path.abspath('./'))
print(current_dir)
# Construct the path to the 'llama' folder
module_path = os.path.join(current_dir, 'llama')
print(module_path)
# Add the 'llama' folder path to the Python module search path
sys.path.append(module_path)


/home/Kasy/kasy_files/iArt.ai_Lab_Tech_Stack/model_deployment
/home/Kasy/kasy_files/iArt.ai_Lab_Tech_Stack/model_deployment/llama


## Tokenizer
- wget https://agi.gpt4.org/llama/LLaMA/tokenizer.model
- wget https://agi.gpt4.org/llama/LLaMA/tokenizer_checklist.chk

In [3]:
from llama import LLaMA_Tokenizer

## LLaMA Model
- wget https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth
- wget https://agi.gpt4.org/llama/LLaMA/7B/checklist.chk
- wget https://agi.gpt4.org/llama/LLaMA/7B/params.json

In [4]:
# The LLaMA Construction
from llama import ModelArgs, LLaMA_Transformer
from pathlib import Path
import json


ckpt_dir = './llama_model_config/7B/'

with open(Path(ckpt_dir)/ "params.json", 'r') as f:
    params = json.loads(f.read())

for para in params:
    print(para, ': ', params[para])

dim :  4096
multiple_of :  256
n_heads :  32
n_layers :  32
norm_eps :  1e-06
vocab_size :  -1


# Tokenization Tutorial

In [19]:
from llama import LLaMA_Tokenizer

tokenizer_pth = './llama_model_config/tokenizer.model'
llama_tokenizer = LLaMA_Tokenizer(
    model_path = tokenizer_pth
)

In [22]:
# Function Test
llama_tokenizer.encode(
    s = 'sss',
    bos=True, 
    eos=False
)

[1, 269, 893]

In [21]:
# Some Information of the LLaMA Tokenizer
print('the number of ',llama_tokenizer.n_words)
print('the token_id of begining of the sentence is:', llama_tokenizer.bos_id)
print('the token_id of end of the sentence is:', llama_tokenizer.eos_id)
print('the token_id of padding is:', llama_tokenizer.pad_id)

the number of  32000
the token_id of begining of the sentence is: 1
the token_id of end of the sentence is: 2
the token_id of padding is: -1


### Tokenization in Generation Process

In [43]:
# Construct a Prompt
prompts = [
        # Text Generation Test
        "Said you love me, but I don't ",
        "Building a website can be done in 10 simple steps: (this website should be able to have a calculator function)\n",
        "Expand on this sentence: <the meaning of life is> "
        # Instruction Prompting (Expand later)
]

In [44]:
max_batch_size = 32  # Assume a max

# Count the number of prompts, and make sure it's not exceed max batch size
bsz = len(prompts)  # batch size
assert bsz <= max_batch_size, f'the prompts exceeds the max number for you prompt {bsz} but max is {max_batch_size}'

In [45]:
# Tokenization to each prompt using the LLaMA_Tokenizer
prompt_tokens = [llama_tokenizer.encode(x, bos=True, eos=False) for x in prompts]

min_prompt_size = min([len(t) for t in prompt_tokens])
max_prompt_size = max([len(t) for t in prompt_tokens])

print(len(prompt_tokens))
print(min_prompt_size)
print(max_prompt_size)

3
13
28


In [46]:
max_gen_len = 256
max_seq_len = 2048  # Which indicate the max prompt length is 2048

# max_gen_len + max_prompt_size as total length but without exceeding max_seq_len
total_len = min(max_seq_len, max_gen_len + max_prompt_size)

In [None]:
import torch

# Fullfil the tensor with pad tokens with the dimension of (batch_size, total_len)
tokens = torch.full((bsz, total_len), llama_tokenizer.pad_id).cuda().long()

for k, t in enumerate(prompt_tokens):
    print(k)
    print(t)
    print(len(t))
    tokens[k, : len(t)] = torch.tensor(t).long()
print(tokens)

In [None]:
# get the mask info, which is the position of the prompt tokens
# Used in the generation process if to replace the prompt tokens with the generated tokens
input_text_mask = tokens != llama_tokenizer.pad_id

print(input_text_mask)

# Generator Test

In [12]:
# Run the command on At Least 16G RAM, with at least 24G GPU RAM
!torchrun --nproc_per_node 1 gen.py --ckpt_dir ./llama_model_config/7B --tokenizer_path ./llama_model_config/tokenizer.model

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
Loaded in 9.43 seconds
I believe the meaning of life is to find happiness and be satisfied with what you have.
People have different definitions of happiness. Some people feel that if they could only win the lottery, they would be happy. Some people feel that if they could only get that promotion, they would be happy. Some people feel that if they could only be the top scorer in a game, they would be happy.
If you do not know what happiness is, I suggest you ask a psychologist. A psychologist has studied the subject of happiness and he or she knows what happiness is. A psychologist has a Ph.D. in psychology and is an expert on the subject of happiness. A psychologist knows how to make people happy.
Although you might know what happiness is, you might have forgotten it. If that is the case, I suggest you consult a psychologist. A psychologist can make you happy again. A p