### 기본 흐름

- 모델 선택 -> download -> ollama에 로드 -> 사용 

### 한글 fine tunning model 

아래 야놀자에서 만든 한글로 파인튜닝된 모델을 다운로드
https://huggingface.co/
https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0
https://huggingface.co/heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF

### 모델 선택 후 다운로드
- 사이트를 참조
  -  https://huggingface.co/

#### huggingface 
- 모델 선택 후 User this model에서 ollama를 선택
> ex ) ollama run hf.co/heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF:Q5_K_M

### 모델 gguf로 변환(양자화)

- https://github.com/abetlen/llama-cpp-python


In [None]:
pip install huggingface_hub
pip install llama-cpp-python

In [None]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

import time
from pprint import pprint


# download model
model_name_or_path = "heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF" # repo id
# 
model_basename = "ggml-model-Q5_K_M.gguf" # file name
model_name = "EEVE-Korean-Q5_K_M"

model_path = hf_hub_download(
    repo_id=model_name_or_path
    , filename=model_basename
    )
print(f" model_name : {model_name}")
print(f" model_path : {model_path}")


In [None]:

# CPU
# lcpp_llm = Llama(
#     model_path=model_path,
#     n_threads=2,
#     )

# GPU에서 사용하려면 아래 코드로 실행
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43, # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096, # Context window
)

In [None]:

prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"
text = '한국의 수도는 어디인가요?'

prompt = prompt_template.format(prompt=text)

start = time.time()
response = lcpp_llm(
    prompt=prompt,
    max_tokens=256,
    temperature=0.5,
    top_p=0.95,
    top_k=50,
    stop = ['</s>'], # Dynamic stopping when such token is detected.
    echo=True # return the prompt
)
pprint(response)
print(time.time() - start)

In [None]:
import ollama

ollama.delete(model=model_name)

In [None]:
#  load to ollama 
import ollama

modelfile=f'''
FROM "{model_path}"

SYSTEM You are "Knd도움이".
'''

ollama.create(model=model_name, modelfile=modelfile)


In [None]:
import ollama

for model in ollama.list().models:
    print(model.model)

In [3]:
from ollama import Client
model_name = "EEVE-Korean-Q5_K_M"

client = Client(
  host='http://localhost:11434',
  headers={'x-some-header': 'some-value'}
)
response = client.chat(model=model_name, stream=True, messages=[
  {
    'role': 'user',
    'content': '''
    노을이 지는 그림에 드래곤볼 손오공 나오는 그림 그려줘. 
    - 그림 포멧 : svg
    - 결과를 스스로 재평가
    ''',
  },
])

resultText = ''
for res in response:
  resultText += res.message.content
  # print()   

print(resultText)

```css
#svg-container {
  width: 100%;
  height: 500px;
}

svg {
  width: 100%;
  height: 100%;
}

path.cloud {
  fill: #d9dffa;
  shape-rendering: auto;
}

path.mountain {
  fill: #7f8c8d;
  shape-rendering: auto;
}

circle.sun {
  fill: #fba456;
  stroke: none;
  r: 20%;
}

polygon.dragonball {
  fill: url(#dragonball_pattern);
  shape-rendering: auto;
}

circle.cloud_particle {
  fill: #f8f9fa;
  stroke: none;
  r: 5px;
}

rect.son_goku {
  fill: url(#dragonball_pattern);
  shape-rendering: auto;
}

path.cloud_arc {
  fill: #d9dffa;
  stroke: none;
  opacity: 0.4;
}
```
