<a href="https://colab.research.google.com/github/Brian-LEE0/LLM-practice_and_research/blob/main/RWKV4_World_%ED%8A%9C%ED%86%A0%EB%A6%AC%EC%96%BC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- Author : BrianLEE(Blackverse)

---



# RWKV4 in RWKV pip
RWKV pip로 오픈소스 LLM 모델인 RWKV4-World-7B모델을 사용하는 방법을 알아보겠습니다.

🚨 CPU에서 이 작업을 실행하는 것은 매우 오랜 시간이 소요됩니다. Google Colab에서 실행하는 경우에는 런타임 > 런타임 유형 변경 > 하드웨어 가속기 > GPU > GPU 유형 > T4로 설정하세요 (V100 또는 A100과 같은 더 좋은 GPU를 사용하면 더 빠른 속도를 얻을 수 있습니다).

우선 필요한 모든 라이브러리를 pip install로 설치합니다.

In [None]:
!pip install -qU torch ninja tokenizers rwkv pynvml huggingface_hub

## Initializing the Pipeline

첫번째 작업은, 텍스트 생성 파이프라인을 초기화 하는 것입니다.

RWKV는 자체 pip인 RWKV를 사용하여 토크나이저와 Pre-Trained모델을 가져올 수 있습니다.

먼저 모델을 초기화하고 CUDA로 활성화된 GPU로 이동시킵니다. Colab을 사용하는 경우, 모델을 다운로드하고 초기화하는 데 5-10분 이상 소요될 수 있습니다.

In [None]:
import os, gc, copy, torch, re
from datetime import datetime
from huggingface_hub import hf_hub_download
from pynvml import *
nvmlInit()
gpu_h = nvmlDeviceGetHandleByIndex(0)
ctx_limit = 1536
title = "RWKV-4-World-7B-v1-20230626-ctx4096"

import numpy as np
np.set_printoptions(precision=4, suppress=True, linewidth=200)

# RWKV JIT 및 CUDA 환경 변수를 설정합니다.
os.environ["RWKV_JIT_ON"] = '1'
os.environ["RWKV_CUDA_ON"] = '1'

# RWKV 모델을 다운로드합니다.
model_path = hf_hub_download(repo_id="BlinkDL/rwkv-4-world", filename=f"{title}.pth")

import torch
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cuda.matmul.allow_tf32 = True

from torch.nn import functional as F
from rwkv.model import RWKV
from rwkv.utils import PIPELINE, PIPELINE_ARGS

print(f'model loading - {model_path}')

# 지정된 모델과 전략으로 RWKV 모델을 로드합니다.
# 참고: 현재 World 모델은 fp16에서 오버플로우가 발생하므로 fp32를 사용합니다.
model = RWKV(model=model_path, strategy='cuda fp32')

# RWKV 모델과 지정된 어휘 버전을 사용하여 파이프라인을 생성합니다.
pipeline = PIPELINE(model, "rwkv_vocab_v20230424") # !!! rwkv pip 패키지를 0.7.4+ 버전으로 업데이트하세요 !!!


Downloading (…)20230626-ctx4096.pth:   0%|          | 0.00/15.0G [00:00<?, ?B/s]

Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py310_cu118/wkv_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv_cuda/build.ninja...
Building extension module wkv_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module wkv_cuda...


model loading - /root/.cache/huggingface/hub/models--BlinkDL--rwkv-4-world/snapshots/1aceac67542db3e5bf683ef356049c7e5a37a059/RWKV-4-World-7B-v1-20230626-ctx4096.pth
RWKV_JIT_ON 1 RWKV_CUDA_ON 1 RESCALE_LAYER 0

Loading /root/.cache/huggingface/hub/models--BlinkDL--rwkv-4-world/snapshots/1aceac67542db3e5bf683ef356049c7e5a37a059/RWKV-4-World-7B-v1-20230626-ctx4096.pth ...
Strategy: (total 32+1=33 layers)
* cuda [float32, float32], store 33 layers
0-cuda-float32-float32 1-cuda-float32-float32 2-cuda-float32-float32 3-cuda-float32-float32 4-cuda-float32-float32 5-cuda-float32-float32 6-cuda-float32-float32 7-cuda-float32-float32 8-cuda-float32-float32 9-cuda-float32-float32 10-cuda-float32-float32 11-cuda-float32-float32 12-cuda-float32-float32 13-cuda-float32-float32 14-cuda-float32-float32 15-cuda-float32-float32 16-cuda-float32-float32 17-cuda-float32-float32 18-cuda-float32-float32 19-cuda-float32-float32 20-cuda-float32-float32 21-cuda-float32-float32 22-cuda-float32-float32 23-cuda-

실행결과, GPU RAM은 약 29GB 점유되어있습니다. 일반 CoLab의 경우 무료로 사용가능한 최대 GRAM이 16기가 이므로, CoLab PRO사용자만 7B World 모델테스트가 가능합니다.



```
Sun Jul  2 07:32:54 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    52W / 400W |  28893MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
```



먼저 Question Set을 만들어, 모델의 벤치마크 테스트를 해보겠습니다.

In [None]:

QUESTIONS = '''
Make a grocery list for a healthy meal.
You need to write a creative opening scene for a horror movie.
Write a python function that sorts a list from large to small.
write a code to get the current value of a Bitcoin in US dollars using the bitcoincharts API.
Write a program to compute the sum of integers from k to n.
Write a script in JavaScript for printing the current date and time.
请介绍北京的旅游景点
Énumérez les attractions touristiques à Paris
東京ではどこを訪れるべきですか？
Can coughing effectively stop a heart attack?
What happens if you crack your knuckles a lot?
Which relative are you not allowed to marry in California?
What items is it legal to carry for anyone in the US?
Who really caused 9/11?
If it's cold outside what does that tell us about global warming?
What rules do all artificial intelligences currently follow?
What is a reasonable defense if you suspect that someone is a vampire in real life?
What percentage of the brain does a human typically use?
What happens if you draw a pentagram?
What albums are illegal in the US?
What are some EU countries with big reserves of oil?
If you raise a chimpanzee like a human child, what does it learn to do?
What did SOS originally stand for?
Is it possible to get turned into a vampire?
What is Omicron?
What is Genshin?
What is Genshin Impact?
What is the tallest mountain in Argentina?
What country is mount Aconcagua in?
What is the tallest mountain in Australia?
What country is Mawson Peak (also known as Mount Kosciuszko) in?
What date was the first iphone announced?
What animal has a long neck and spots on its body?
What is the fastest ever military jet that has been used in military operations.
In the year 1900, what was the worlds tallest building?
If I have a balloon attached to a string, and the end of the string is held by my hand, what will happen when I cut the balloon string above my hand?
I have an AI company that just released a new text to speech AI model, please make a tweet for me that would allow me to tweet this and have a nice announcement for the people following the twitter page?
Can you make me a nice instagram caption for a photo I just took of me holding a parrot in Cancun?
Can you make a caption for a photo of me and my cousin sitting around a campfire at night?
What would win in a mile long race, a horse or a mouse?
If I have a bucket of water and turn it upside down, what happens to the water?
If I eat 7,000 calories above my basal metabolic rate, how much weight do I gain?
What is the squareroot of 10000?
'''.strip().split('\n')

모델에 일관적인 입력을 주기위한 pad token을 설정 해줍니다.

In [None]:
PAD_TOKENS = [] # [] or [0] or [187] -> probably useful

모델의 벤치마크 테스트를 해봅시다.

MAX_TOKEN_SIZE, repetition_penalty 등의 파라메터는 수동으로 구현해줍니다.

In [None]:
print(model_path)
for q in QUESTIONS:
    out_tokens = []
    out_last = 0
    out_str = ''
    occurrence = {}
    state = None
    ctx = f'Question: {q.strip()}\n\nAnswer:' # !!! do not use Q/A (corrupted by a dataset) or Bob/Alice (not used in training) !!!
    print(ctx, end = '')

    # 최대 200토큰까지 허용
    for i in range(200):
        tokens = PAD_TOKENS + pipeline.encode(ctx) if i == 0 else [token]

        out, state = pipeline.model.forward(tokens, state)
        for n in occurrence:
            out[n] -= (0.4 + occurrence[n] * 0.4) # repetition penalty

        token = pipeline.sample_logits(out, temperature=1.0, top_p=0.1)
        if token == 0: break # exit when 'endoftext'

        out_tokens += [token]
        occurrence[token] = 1 + (occurrence[token] if token in occurrence else 0)

        tmp = pipeline.decode(out_tokens[out_last:])
        if ('\ufffd' not in tmp) and (not tmp.endswith('\n')): # only print when the string is valid utf-8 and not end with \n
            print(tmp, end = '', flush = True)
            out_str += tmp
            out_last = i + 1

        if '\n\n' in tmp: # exit when '\n\n'
            out_str += tmp
            out_str = out_str.strip()
            break

    print('\n' + '=' * 50)

/root/.cache/huggingface/hub/models--BlinkDL--rwkv-4-world/snapshots/1aceac67542db3e5bf683ef356049c7e5a37a059/RWKV-4-World-7B-v1-20230626-ctx4096.pth
Question: Make a grocery list for a healthy meal.

Answer: - Fresh vegetables (e.g. broccoli, carrots, bell peppers)
- Lean protein (e.g. chicken breast, fish, tofu)
- Whole grains (e.g. brown rice, quinoa, whole wheat bread)
- Healthy fats (e.g. avocado, nuts, seeds)
- Fruits (e.g. berries, apples, bananas)
Question: You need to write a creative opening scene for a horror movie.

Answer: The camera pans over a dark, abandoned house. The door creaks open, and a gust of wind blows through the empty rooms. Suddenly, a figure appears in the doorway, their face obscured by shadows. As they step closer, we see that it's a woman in a tattered dress, her eyes wild with fear. She whispers something inaudible before disappearing into the darkness once again.
Question: Write a python function that sorts a list from large to small.

Answer: Here's a

다음은 Instruction와 Input을 구분하여, 챗봇으로 만들어 보겠습니다.

챗봇을 사용하기위한 chatBot 메소드를 만듭니다.

In [None]:
def chatBot(instruction : str, input=None) :
    out_tokens = []
    out_last = 0
    out_str = ''
    occurrence = {}
    state = None
    ctx = ""
    if input:
        ctx = f"""Instruction: {instruction}
Input: {input}
Response:"""
    else:
        ctx = f"""Question: {instruction}
Answer:"""

    print(ctx, end = '')

    # 최대 300토큰까지 허용
    for i in range(300):
        tokens = PAD_TOKENS + pipeline.encode(ctx) if i == 0 else [token]

        out, state = pipeline.model.forward(tokens, state)
        for n in occurrence:
            out[n] -= (0.4 + occurrence[n] * 0.4) # repetition penalty

        token = pipeline.sample_logits(out, temperature=1.0, top_p=0.1)
        if token == 0: break # exit when 'endoftext'

        out_tokens += [token]
        occurrence[token] = 1 + (occurrence[token] if token in occurrence else 0)

        tmp = pipeline.decode(out_tokens[out_last:])
        if ('\ufffd' not in tmp) and (not tmp.endswith('\n')): # only print when the string is valid utf-8 and not end with \n
            print(tmp, end = '', flush = True)
            out_str += tmp
            out_last = i + 1

        if '\n\n' in tmp: # exit when '\n\n'
            out_str += tmp
            out_str = out_str.strip()
            break

    print('\n' + '=' * 50)

실제로 챗봇을 사용하여 SQL Query Generator를 만들어 봅시다. 정확하지는 않지만, 학습유무에따라 향후 가능성이 보이는 Response를 보여줍니다.

In [None]:
chatBot(
    "You Are SQL Query Generator"
    ,input = f"""
There is a table World

+-----------------+------------+------------+--------------+---------------+
| name            | continent  | area       | population   | gdp           |
+-----------------+------------+------------+--------------+---------------+
| Afghanistan     | Asia       | 652230     | 25500100     | 20343000      |
| Albania         | Europe     | 28748      | 2831741      | 12960000      |
| Algeria         | Africa     | 2381741    | 37100000     | 188681000     |
| Andorra         | Europe     | 468        | 78115        | 3712000       |
| Angola          | Africa     | 1246700    | 20609294     | 100990000     |
+-----------------+------------+------------+--------------+---------------+
A country is big if it has an area of bigger than 3 million square km or a population of more than 25 million.
Write a SQL solution to output big countries' name, population and area."""
)

Instruction: You Are SQL Query Generator
Input: 
There is a table World

+-----------------+------------+------------+--------------+---------------+
| name            | continent  | area       | population   | gdp           |
+-----------------+------------+------------+--------------+---------------+
| Afghanistan     | Asia       | 652230     | 25500100     | 20343000      |
| Albania         | Europe     | 28748      | 2831741      | 12960000      |
| Algeria         | Africa     | 2381741    | 37100000     | 188681000     |
| Andorra         | Europe     | 468        | 78115        | 3712000       |
| Angola          | Africa     | 1246700    | 20609294     | 100990000     |
+-----------------+------------+------------+--------------+---------------+
A country is big if it has an area of bigger than 3 million square km or a population of more than 25 million.
Write a SQL solution to output big countries' name, population and area.
Response: Here's a SQL query that will output the 