# 3. Model Testing (Inference Simulation)

This notebook simulates the inference process on the target hardware (4 CPU, 4GB RAM).
We assume the model has been converted to GGUF format (e.g., `qwen2.5-3b-reminder-bot-q4_k_m.gguf`).

If the GGUF file is not ready, this notebook will fail. 
You can use `llama-cpp-python` to run GGUF models in Python.

## Prerequisites
```bash
pip install llama-cpp-python
```


In [5]:
from llama_cpp import Llama
import json

# Path to your GGUF model
# You must have run the conversion in step 2 or downloaded a GGUF version
MODEL_PATH = "qwen2.5-3b-reminder-bot-q4_k_m.gguf" 

try:
    # Initialize model with 4GB RAM constraint logic
    # n_ctx=2048 context window
    llm = Llama(
        model_path=MODEL_PATH,
        n_ctx=2048,
        n_threads=4,      # 4 CPU cores
        n_gpu_layers=0    # 0 GPU layers (CPU only simulation)
    )
    print("Model loaded successfully on CPU.")
except Exception as e:
    print(f"Error loading model (Did you convert it to GGUF?): {e}")
    llm = None

llama_model_loader: loaded meta data with 28 key-value pairs and 434 tensors from qwen2.5-3b-reminder-bot-q4_k_m.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 20
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.800000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 0.700000
llama_model_loader: - kv   5:                               general.name str              = Qwen2.5 3b Reminder Bot_Merged
llama_model_loader: - kv   6:                           general.finetune str              = reminder-bot_merged
llama_model_loader: - kv   7:

Model loaded successfully on CPU.


In [6]:
import pandas as pd
import random

def extract_reminder(text, context_date="2026-02-18"):
    if not llm:
        return "Model not loaded."
    
    system_prompt = """–¢—ã ‚Äî —Å–∏—Å—Ç–µ–º–∞ –¥–ª—è –∏–∑–≤–ª–µ—á–µ–Ω–∏—è –ø–∞—Ä–∞–º–µ—Ç—Ä–æ–≤ –Ω–∞–ø–æ–º–∏–Ω–∞–Ω–∏–π.
–¢–≤–æ—è –∑–∞–¥–∞—á–∞: –∏–∑–≤–ª–µ—á—å —Ç–µ–∫—Å—Ç, –¥–∞—Ç—É, –≤—Ä–µ–º—è –∏ –ø–µ—Ä–∏–æ–¥–∏—á–Ω–æ—Å—Ç—å –∏–∑ —Å–æ–æ–±—â–µ–Ω–∏—è –ø–æ–ª—å–∑–æ–≤–∞—Ç–µ–ª—è –∏ –≤–µ—Ä–Ω—É—Ç—å JSON.
–ò—Å–ø–æ–ª—å–∑—É–π —Ç–µ–∫—É—â—É—é –¥–∞—Ç—É (Context Date) –¥–ª—è —Ä–∞–∑—Ä–µ—à–µ–Ω–∏—è –æ—Ç–Ω–æ—Å–∏—Ç–µ–ª—å–Ω—ã—Ö –¥–∞—Ç.
–§–æ—Ä–º–∞—Ç: {"text": "...", "date": "YYYY-MM-DD", "time": "HH:MM", "repeat": "..."}"""
    
    user_prompt = f"Context Date: {context_date}\nMessage: \"{text}\"\n\nJSON:"
    
    prompt = f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{user_prompt}<|im_end|>\n<|im_start|>assistant\n"
    
    output = llm(
        prompt,
        max_tokens=256,
        stop=["<|im_end|>"],
        temperature=0.1,
        echo=False
    )
    
    return output['choices'][0]['text']

# Load random 15 messages from CSV
try:
    df = pd.read_csv("user_messages.csv")
    # Filter for non-empty messages
    df = df.dropna(subset=['message_text'])
    df = df[df['message_text'].str.len() > 10]
    
    # Sample 15 random messages
    sample_df = df.sample(25)
    test_messages = sample_df['message_text'].tolist()
    print(f"Loaded {len(test_messages)} random messages from CSV.")
    
except Exception as e:
    print(f"Error loading CSV, using default examples: {e}")
    test_messages = [
        "–ù–∞–ø–æ–º–Ω–∏ –∫—É–ø–∏—Ç—å –º–æ–ª–æ–∫–æ –∑–∞–≤—Ç—Ä–∞ –≤ 10 —É—Ç—Ä–∞",
        "–í—Å—Ç—Ä–µ—á–∞ —Å –±–æ—Å—Å–æ–º 25 —Ñ–µ–≤—Ä–∞–ª—è –≤ 14:00",
        "–ö–∞–∂–¥—ã–π –¥–µ–Ω—å –≤ 9:00 –ø–∏—Ç—å —Ç–∞–±–ª–µ—Ç–∫–∏"
    ]

for msg in test_messages:
    if llm:
        print(f"Input: {msg}")
        # Clean up newlines for cleaner output display
        print(f"Output: {extract_reminder(msg)}")
        print("-" * 20)

Loaded 25 random messages from CSV.
Input: –ö–ê–ñ–î–û–ï 28 –ß–ò–°–õ–û –ü–†–û–í–ï–†–ö–ê –ü–†–ï–ú–ò–£–ú–ê –¢–ò–ù–¨–ö–û–§–§


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     516.51 ms /   164 tokens (    3.15 ms per token,   317.52 tokens per second)
llama_perf_context_print:        eval time =    2121.81 ms /    53 runs   (   40.03 ms per token,    24.98 tokens per second)
llama_perf_context_print:       total time =    2683.81 ms /   217 tokens
llama_perf_context_print:    graphs reused =         51
Llama.generate: 127 prefix-match hit, remaining 29 prompt tokens to eval


Output: {"text": "–ü–†–û–í–ï–†–ö–ê –ü–†–ï–ú–ò–£–ú–ê –¢–ò–ù–¨–ö–û–§–§", "date": "2026-02-28", "time": "00:00", "repeat": "monthly"}
--------------------
Input: 16 –§–ï–í–†–ê–õ–Ø –ò–¢–û–ì–ò –†–û–ó–´–ì–†–´–®–ê


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     121.68 ms /    29 tokens (    4.20 ms per token,   238.33 tokens per second)
llama_perf_context_print:        eval time =    1853.27 ms /    46 runs   (   40.29 ms per token,    24.82 tokens per second)
llama_perf_context_print:       total time =    2007.37 ms /    75 tokens
llama_perf_context_print:    graphs reused =         43
Llama.generate: 127 prefix-match hit, remaining 28 prompt tokens to eval


Output: {"text": "–ò—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞", "date": "2026-02-16", "time": "00:00", "repeat": "none"}
--------------------
Input: –ö–ê–ñ–î–û–ï 1 –ß–ò–°–õ–û –û–ü–õ–ê–¢–ê –ü–û–ß–¢–´


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     114.61 ms /    28 tokens (    4.09 ms per token,   244.30 tokens per second)
llama_perf_context_print:        eval time =    1665.70 ms /    41 runs   (   40.63 ms per token,    24.61 tokens per second)
llama_perf_context_print:       total time =    1808.68 ms /    69 tokens
llama_perf_context_print:    graphs reused =         38
Llama.generate: 127 prefix-match hit, remaining 18 prompt tokens to eval


Output: {"text": "–û–ø–ª–∞—Ç–∞ –ø–æ—á—Ç—ã", "date": "2026-02-01", "time": "00:00", "repeat": "monthly"}
--------------------
Input: 10-00 –°–¥–µ–ª–∞–π –∞–Ω–≥–ª–∏–π—Å–∫–∏–π


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      96.04 ms /    18 tokens (    5.34 ms per token,   187.41 tokens per second)
llama_perf_context_print:        eval time =    1638.23 ms /    41 runs   (   39.96 ms per token,    25.03 tokens per second)
llama_perf_context_print:       total time =    1761.94 ms /    59 tokens
llama_perf_context_print:    graphs reused =         39
Llama.generate: 126 prefix-match hit, remaining 208 prompt tokens to eval


Output: {"text": "–°–¥–µ–ª–∞–π –∞–Ω–≥–ª–∏–π—Å–∫–∏–π", "date": "2026-02-18", "time": "10:00", "repeat": "none"}
--------------------
Input: [–†–µ–ª–∏–∑—ã]
https ://wiki.domrf.ru/pages/viewpage.action?pageId=267595334

–ë–∞–Ω–∫:
‚ÄºÔ∏è12,12.2025 –≤ 11:00 (–ø—è—Ç–Ω–∏—Ü–∞)
https://jira.domrf.ru/browse/INTEGRATIO-4766
Paycontrol - –¥–æ—Ä–∞–±–æ—Ç–∫–∞ –ø—Ä–æ–≤–µ—Ä–∫–∏ –ø–æ–¥–ø–∏—Å–∏
https://jira.domrf.ru/browse/RFC-9711
@hopheylalalei —Å–¥–µ–ª–∞–π —Å—Ç—Ä–∞–Ω–∏—Ü—É –ø–æ —ç—Ç–æ–º—É —Ä–µ–ª–∏–∑—É –∏ –ø—Ä–æ–ø–∏—à–∏ –≤–µ—Ç–∫–∏ –ø–ª–∑ –≤ —Ç–∞–±–ª–∏—á–∫–µ https://wiki.domrf.ru/pages/viewpage.action?pageId=315611369 
 [elk-facade] [paycontrol-adapter] –ï–õ–ö-–§–ê–°–ê–î –≤–µ—Ç–∫–∏ —Å–æ–≤–º–µ—Å—Ç–Ω–æ —Å @Pajadla

@alexzodiac @PristTech


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     648.19 ms /   208 tokens (    3.12 ms per token,   320.89 tokens per second)
llama_perf_context_print:        eval time =    2474.96 ms /    59 runs   (   41.95 ms per token,    23.84 tokens per second)
llama_perf_context_print:       total time =    3169.06 ms /   267 tokens
llama_perf_context_print:    graphs reused =         56
Llama.generate: 161 prefix-match hit, remaining 138 prompt tokens to eval


Output: {"text": "‚ÄºÔ∏è12,12.2025 –≤ 11:00 (–ø—è—Ç–Ω–∏—Ü–∞)", "date": "2025-12-12", "time": "11:00", "repeat": "none"}
--------------------
Input: [–†–µ–ª–∏–∑—ã]
https ://wiki.domrf.ru/pages/viewpage.action?pageId=267595334

–ë–∞–Ω–∫:
‚ÄºÔ∏è20.01.2026 –≤ 22:00 (@TurovVal)
https://jira.domrf.ru/browse/INTEGRATIO-3935
—Ä–µ—Ñ–∞–∫—Ç–æ—Ä–∏–Ω–≥, –Ω–∞—Å—Ç—Ä–æ–π–∫–∞ –ø–æ–¥–∫–ª—é—á–µ–Ω–∏—è –∫ mq
https://jira.domrf.ru/browse/RFC-9919
@alexandra00001 –ø—Ä–æ–ø–∏—à–∏ —Ä–µ–ª–∏–∑–Ω–æ—É—Ç –∏ –≤–µ—Ç–∫–∏ –ø–ª–∑ https://wiki.domrf.ru/pages/viewpage.action?pageId=315611369 

[rbs-paydocs]

@alexzodiac @PristTech


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     568.68 ms /   138 tokens (    4.12 ms per token,   242.67 tokens per second)
llama_perf_context_print:        eval time =    2644.27 ms /    65 runs   (   40.68 ms per token,    24.58 tokens per second)
llama_perf_context_print:       total time =    3257.53 ms /   203 tokens
llama_perf_context_print:    graphs reused =         62
Llama.generate: 126 prefix-match hit, remaining 16 prompt tokens to eval


Output: {"text": "—Ä–µ—Ñ–∞–∫—Ç–æ—Ä–∏–Ω–≥, –Ω–∞—Å—Ç—Ä–æ–π–∫–∞ –ø–æ–¥–∫–ª—é—á–µ–Ω–∏—è –∫ mq\n–ø—Ä–æ–ø–∏—à–∏ —Ä–µ–ª–∏–∑–Ω–æ—É—Ç –∏ –≤–µ—Ç–∫–∏ –ø–ª–∑", "date": "2026-01-20", "time": "22:00", "repeat": "none"}
--------------------
Input: 12  —è–Ω–≤–∞—Ä—è –∏—Ç–æ–∏–≥


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      71.89 ms /    16 tokens (    4.49 ms per token,   222.57 tokens per second)
llama_perf_context_print:        eval time =    1572.10 ms /    39 runs   (   40.31 ms per token,    24.81 tokens per second)
llama_perf_context_print:       total time =    1669.90 ms /    55 tokens
llama_perf_context_print:    graphs reused =         37
Llama.generate: 127 prefix-match hit, remaining 21 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–∏–≥", "date": "2026-01-12", "time": "00:00", "repeat": "none"}
--------------------
Input: 27 —è–Ω–≤–∞—Ä—è –∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      92.66 ms /    21 tokens (    4.41 ms per token,   226.63 tokens per second)
llama_perf_context_print:        eval time =    1837.10 ms /    45 runs   (   40.82 ms per token,    24.50 tokens per second)
llama_perf_context_print:       total time =    1960.23 ms /    66 tokens
llama_perf_context_print:    graphs reused =         42
Llama.generate: 127 prefix-match hit, remaining 205 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞", "date": "2026-01-27", "time": "00:00", "repeat": "none"}
--------------------
Input: ‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è‚ô•Ô∏è

–î–∞—Ä–∏–º –¥–≤–æ–∏–º –ø–û 500‚ÇΩ –Ω–∞ –∫–∞—Ä—Ç—É

‚ô•Ô∏è–£ –° –õ –û –í –ò –Ø

‚ô•Ô∏è–ü–æ–¥–ø–∏—Å–∞—Ç—å—Å—è –Ω–∞ 
‚ú® –¢–∞–º –≥–¥–ï –∂–∏–≤—ë—Ç –ª–Æ–ë–æ–≤—å
‚ú® –ò–ó–¥–∞—Ç–µ–ª—å—Å—Ç–≤–æ "–ª–ò–°–´"

‚ô•Ô∏è –ù–∞–∂–∞—Ç—å –∫–ù–æ–ø–∫—É —É—á–∞—Å—Ç–≤—É—é

–í–µ—Ä–∏—Ç—å –≤ —Å–≤–û—é —É–¥–∞—á—Éüí´
‚ô•Ô∏è –í–∞–∂–Ω–æ–µ –ø—Ä–∞–í–∏–ª–æ

–£ –ø–û–±–µ–¥–∏—Ç–µ–ª–ï–π –¥–û–õ–∂–Ω–∞ –±–´—Ç—å –∫–∞—Ä—Ç–∞ –†–§ –∏–ª–∏ WB/ozon

#—Ä–æ–∑—ã–≥—Ä—ã—à

—É—á–∞—Å—Ç–Ω–∏–∫–æ–≤: 138
–ü—Ä–ò–∑–æ–≤—ã–• –º–µ—Å—Ç: 2
–î–ê—Ç–∞ —Ä–æ–∑—ã–≥—Ä—ã—à–∞: 17:00, 16.02.2026 MSK (10 –¥–Ω–µ–π)


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     635.64 ms /   205 tokens (    3.10 ms per token,   322.51 tokens per second)
llama_perf_context_print:        eval time =    8483.30 ms /   209 runs   (   40.59 ms per token,    24.64 tokens per second)
llama_perf_context_print:       total time =    9270.49 ms /   414 tokens
llama_perf_context_print:    graphs reused =        202
Llama.generate: 127 prefix-match hit, remaining 30 prompt tokens to eval


Output: {"text": "–î–∞—Ä–∏–º –¥–≤–æ–∏–º –ø–û 500‚ÇΩ –Ω–∞ –∫–∞—Ä—Ç—É\n\n–£ –° –õ –û –í –ò –Ø\n\n–ü–æ–¥–ø–∏—Å–∞—Ç—å—Å—è –Ω–∞\n‚ú® –¢–∞–º –≥–¥–ï –∂–∏–≤—ë—Ç –ª–Æ–±–æ–≤—å\n‚ú® –ò–ó–¥–∞—Ç–µ–ª—å—Å—Ç–≤–æ \"–ª–ò–°–´\"\n\n–ù–∞–∂–∞—Ç—å –∫–ù–æ–ø–∫—É —É—á–∞—Å—Ç–≤—É—é\n\n–í–µ—Ä–∏—Ç—å –≤ —Å–≤–û—é —É–¥–∞—á—Éüí´\n\n–£ –ø–æ–±–µ–¥–∏—Ç–µ–ª–ï–π –¥–û–õ–∂–Ω–∞ –±–´—Ç—å –∫–∞—Ä—Ç–∞ –†–§ –∏–ª–∏ WB/ozon\n\n#—Ä–æ–∑—ã–≥—Ä—ã—à\n\n—É—á–∞—Å—Ç–Ω–∏–∫–æ–≤: 138\n–ü—Ä–ò–∑–æ–≤—ã–• –º–µ—Å—Ç: 2\n–î–ê—Ç–∞ —Ä–æ–∑—ã–≥—Ä—ã—à–∞: 17:00, 16.02.2026 MSK (10 –¥–Ω–µ–π)", "date": "2026-02-16", "time": "17:00", "repeat": "none"}
--------------------
Input: –ö–ê–ñ–î–û–ï  1 –ß–ò–°–õ–û –û–ü–õ–ê–ê–¢ –ü–û–ß–¢–´


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     125.09 ms /    30 tokens (    4.17 ms per token,   239.83 tokens per second)
llama_perf_context_print:        eval time =    1637.46 ms /    41 runs   (   39.94 ms per token,    25.04 tokens per second)
llama_perf_context_print:       total time =    1789.82 ms /    71 tokens
llama_perf_context_print:    graphs reused =         38
Llama.generate: 127 prefix-match hit, remaining 27 prompt tokens to eval


Output: {"text": "–æ–ø–ª–∞—Ç—É –ø–æ—á—Ç—ã", "date": "2026-02-01", "time": "00:00", "repeat": "monthly"}
--------------------
Input: 08.12 –≤ 10:000 –≤–∏–≥–∞–Ω—Ç–æ–ª –∂–µ–Ω–µ


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     117.85 ms /    27 tokens (    4.36 ms per token,   229.11 tokens per second)
llama_perf_context_print:        eval time =    1723.61 ms /    42 runs   (   41.04 ms per token,    24.37 tokens per second)
llama_perf_context_print:       total time =    1869.87 ms /    69 tokens
llama_perf_context_print:    graphs reused =         39
Llama.generate: 127 prefix-match hit, remaining 27 prompt tokens to eval


Output: {"text": "–≤–∏–≥–∞–Ω—Ç–æ–ª –∂–µ–Ω–µ", "date": "2026-02-18", "time": "10:00", "repeat": "none"}
--------------------
Input: 29 —á–∏—Å–∞–ª –Ω–∞–ø–æ–º–Ω–∏—Ç—å –±–æ—Å—Å—É —á—Ç–æ–± —è –ø–æ–∫–∞–∑–∞–ª –º—É—Ö–∏–Ω—É


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     121.18 ms /    27 tokens (    4.49 ms per token,   222.81 tokens per second)
llama_perf_context_print:        eval time =    1741.80 ms /    43 runs   (   40.51 ms per token,    24.69 tokens per second)
llama_perf_context_print:       total time =    1891.62 ms /    70 tokens
llama_perf_context_print:    graphs reused =         40
Llama.generate: 127 prefix-match hit, remaining 22 prompt tokens to eval


Output: {"text": "–ø–æ–∫–∞–∑–∞—Ç—å –º—É—Ö–∏–Ω—É", "date": "2026-02-29", "time": "00:00", "repeat": "none"}
--------------------
Input: 31  –¥–µ–∫–∞–±—Ä—è –∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      95.41 ms /    22 tokens (    4.34 ms per token,   230.59 tokens per second)
llama_perf_context_print:        eval time =    1798.81 ms /    45 runs   (   39.97 ms per token,    25.02 tokens per second)
llama_perf_context_print:       total time =    1924.00 ms /    67 tokens
llama_perf_context_print:    graphs reused =         42
Llama.generate: 127 prefix-match hit, remaining 21 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞", "date": "2026-12-31", "time": "00:00", "repeat": "none"}
--------------------
Input: 25 —è–Ω–≤–∞—Ä—è, –æ–ø—É–±–ª–∏–∫–æ–≤–∞—Ç—å –∫—Ä—É–∂–æ—á–∫


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      91.69 ms /    21 tokens (    4.37 ms per token,   229.03 tokens per second)
llama_perf_context_print:        eval time =    1818.62 ms /    45 runs   (   40.41 ms per token,    24.74 tokens per second)
llama_perf_context_print:       total time =    1940.23 ms /    66 tokens
llama_perf_context_print:    graphs reused =         42
Llama.generate: 127 prefix-match hit, remaining 20 prompt tokens to eval


Output: {"text": "–æ–ø—É–±–ª–∏–∫–æ–≤–∞—Ç—å –∫—Ä—É–∂–æ—á–∫", "date": "2026-01-25", "time": "00:00", "repeat": "none"}
--------------------
Input: 18 —è–Ω–≤–∞—Ä—è –∏—Ç–æ–∏–≥ —Ä–æ–∑—ã–≥—Ä—ã—à–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      89.90 ms /    20 tokens (    4.50 ms per token,   222.46 tokens per second)
llama_perf_context_print:        eval time =    1552.99 ms /    39 runs   (   39.82 ms per token,    25.11 tokens per second)
llama_perf_context_print:       total time =    1668.80 ms /    59 tokens
llama_perf_context_print:    graphs reused =         37
Llama.generate: 127 prefix-match hit, remaining 29 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–∏–≥ —Ä–æ–∑—ã–≥—Ä—ã—à–∞", "date": "2026-01-18", "time": null, "repeat": null}
--------------------
Input: –í 7:00 –ö–ê–ñ–î–´–ò –î–ï–ù–¨ –ü–û–î–™–ï–ú


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     161.12 ms /    29 tokens (    5.56 ms per token,   179.99 tokens per second)
llama_perf_context_print:        eval time =    1870.36 ms /    47 runs   (   39.79 ms per token,    25.13 tokens per second)
llama_perf_context_print:       total time =    2062.44 ms /    76 tokens
llama_perf_context_print:    graphs reused =         44
Llama.generate: 127 prefix-match hit, remaining 39 prompt tokens to eval


Output: {"date": null, "repeat": "daily", "text": "–í 7:00 –ö–ê–ñ–î–´–ò –î–ï–ù–¨ –ü–û–î–™–ï–ú", "time": "07:00"}
--------------------
Input: –∫–∞–∂–¥—ã–∏ –¥–µ–Ω—å 30 –¥–Ω–µ–π –ø–æ–¥—Ä—è–¥ –≤ 8:00 –ø–æ –æ–¥–Ω–æ–π –∫–∞–ø—Å—É–ª–µ –≤ –¥–µ–Ω—å. 30 –¥–Ω–µ–π –ø–æ–¥—Ä—è–¥


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     196.90 ms /    39 tokens (    5.05 ms per token,   198.07 tokens per second)
llama_perf_context_print:        eval time =    2405.14 ms /    60 runs   (   40.09 ms per token,    24.95 tokens per second)
llama_perf_context_print:       total time =    2642.42 ms /    99 tokens
llama_perf_context_print:    graphs reused =         57
Llama.generate: 127 prefix-match hit, remaining 15 prompt tokens to eval


Output: {"text": "–∫–∞–∂–¥—ã–∏ –¥–µ–Ω—å 30 –¥–Ω–µ–π –ø–æ–¥—Ä—è–¥ –≤ 8:00 –ø–æ –æ–¥–Ω–æ–π –∫–∞–ø—Å—É–ª–µ –≤ –¥–µ–Ω—å.", "date": "2026-02-18", "time": "08:00", "repeat": "daily"}
--------------------
Input: 23 —Ñ–µ–≤—Ä–∞–ª—è –∏—Ç–æ–≥–∏


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      91.95 ms /    15 tokens (    6.13 ms per token,   163.14 tokens per second)
llama_perf_context_print:        eval time =    1570.27 ms /    39 runs   (   40.26 ms per token,    24.84 tokens per second)
llama_perf_context_print:       total time =    1688.57 ms /    54 tokens
llama_perf_context_print:    graphs reused =         37
Llama.generate: 127 prefix-match hit, remaining 43 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–≥–∏", "date": "2026-02-23", "time": "00:00", "repeat": "none"}
--------------------
Input: –ó–∞–≤—Ç—Ä–∞  –≤ 11-00

https://www.liveworksheets.com/worksheet/en/english-second-language-esl/348744


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     164.49 ms /    43 tokens (    3.83 ms per token,   261.42 tokens per second)
llama_perf_context_print:        eval time =    1836.91 ms /    46 runs   (   39.93 ms per token,    25.04 tokens per second)
llama_perf_context_print:       total time =    2033.13 ms /    89 tokens
llama_perf_context_print:    graphs reused =         44
Llama.generate: 127 prefix-match hit, remaining 21 prompt tokens to eval


Output: {"text": "–ó–∞–≤—Ç—Ä–∞ –≤ 11-00", "date": "2026-02-19", "time": "11:00", "repeat": "none"}
--------------------
Input: 30 —è–Ω–≤–∞—Ä—è –∏–æ—Ç–≥–∏ —Ä–æ–∑—ã–≥—Ä–´–®–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     107.36 ms /    21 tokens (    5.11 ms per token,   195.60 tokens per second)
llama_perf_context_print:        eval time =    1683.03 ms /    42 runs   (   40.07 ms per token,    24.95 tokens per second)
llama_perf_context_print:       total time =    1818.96 ms /    63 tokens
llama_perf_context_print:    graphs reused =         40
Llama.generate: 126 prefix-match hit, remaining 210 prompt tokens to eval


Output: {"text": "—Ä–æ–∑—ã–≥—Ä–´–®–∞", "date": "2026-01-30", "time": "00:00", "repeat": "none"}
--------------------
Input: [–†–µ–ª–∏–∑—ã] 
https://wiki.domrf.ru/pages/viewpage.action?pageId=267595334

–ë–∞–Ω–∫:
‚ÄºÔ∏è27.01.2026 –≤ 21:0 (@TurovVal)
https://jira.domrf.ru/browse/INTEGRATIO-5021
https://jira.domrf.ru/browse/INTEGRATIO-4803
–•–§ –ø–æ —ç—Å–∫—Ä–æ—É JurDocIntegrationError + –ª–∏–±–∞ –º–µ—Ç—Ä–∏–∫ –Ω–∞ —Ñ–∞—Å–∞–¥
https://jira.domrf.ru/browse/RFC-10003
@hopheylalalei –ø—Ä–æ–ø–∏—à–∏ —Ä–µ–ª–∏–∑–Ω–æ—É—Ç (–ø–æ –≤–æ–∑–º–æ–∂–Ω–æ—Å—Ç–∏) –∏ –≤–µ—Ç–∫–∏ –ø–ª–∑ https://wiki.domrf.ru/pages/viewpage.action?pageId=315611369 

[athena-facade] [athena-adapter]

@alexzodiac @PristTech


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     658.93 ms /   210 tokens (    3.14 ms per token,   318.70 tokens per second)
llama_perf_context_print:        eval time =    4365.94 ms /   107 runs   (   40.80 ms per token,    24.51 tokens per second)
llama_perf_context_print:       total time =    5097.10 ms /   317 tokens
llama_perf_context_print:    graphs reused =        103
Llama.generate: 127 prefix-match hit, remaining 195 prompt tokens to eval


Output: {"text": "‚ÄºÔ∏è27.01.2026 –≤ 21:0 (@TurovVal) –•–§ –ø–æ —ç—Å–∫—Ä–æ—É JurDocIntegrationError + –ª–∏–±–∞ –º–µ—Ç—Ä–∏–∫ –Ω–∞ —Ñ–∞—Å–∞–¥ @hopheylalalei –ø—Ä–æ–ø–∏—à–∏ —Ä–µ–ª–∏–∑–Ω–æ—É—Ç (–ø–æ –≤–æ–∑–º–æ–∂–Ω–æ—Å—Ç–∏) –∏ –≤–µ—Ç–∫–∏ –ø–ª–∑", "date": "2026-01-27", "time": "21:00", "repeat": "none"}
--------------------
Input: [–†–ï–õ–ò–ó–´]
WIKI.DOMRF.RU/PAGES/VIEWPAGE.ACTION?PAGEID=267595334

–ë–ê–ù–ö:
‚ÄºÔ∏è10.02.2026 –í 18.00 (–í–¢–û–†–ù–ò–ö) 
HTTPS://JIRA.DOMRF.RU/BROWSE/INTEGRATIO-5018
CONNECTION_POOL –ü–†–ê–í–ò–ú
HTTPS://JIRA.DOMRF.RU/BROWSE/RFC-10152
@EXKOT –ü–†–û–ü–ò–®–ò –†–ï–õ–ò–ó–ù–û–£–¢ –ò –í–ï–¢–ö–ò –ü–õ–ó HTTPS://WIKI.DOMRF.RU/PAGES/VIEWPAGE.ACTION?PAGEID=315611369 

[CRM-ATHENA] 

@ALEXZODIAC @PRISTTECH


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     601.92 ms /   195 tokens (    3.09 ms per token,   323.96 tokens per second)
llama_perf_context_print:        eval time =    6234.29 ms /   154 runs   (   40.48 ms per token,    24.70 tokens per second)
llama_perf_context_print:       total time =    6944.81 ms /   349 tokens
llama_perf_context_print:    graphs reused =        149
Llama.generate: 126 prefix-match hit, remaining 22 prompt tokens to eval


Output: {"text": "–ë–ê–ù–ö:‚ÄºÔ∏è10.02.2026 –í 18.00 (–í–¢–û–†–ù–ò–ö) CONNECTION_POOL –ü–†–ê–í–ò–ú HTTPS://JIRA.DOMRF.RU/BROWSE/INTEGRATIO-5018 RFC-10152 @EXKOT –ü–†–û–ü–ò–®–ò –†–ï–õ–ò–ó–ù–û–£–¢ –ò –í–ï–¢–ö–ò –ü–õ–ó HTTPS://WIKI.DOMRF.RU/PAGES/VIEWPAGE.ACTION?PAGEID=315611369", "date": "2026-02-10", "time": "18:00", "repeat": "none"}
--------------------
Input: 28 —è–Ω–≤–∞—Ä—è –∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =      97.89 ms /    22 tokens (    4.45 ms per token,   224.74 tokens per second)
llama_perf_context_print:        eval time =    1556.80 ms /    39 runs   (   39.92 ms per token,    25.05 tokens per second)
llama_perf_context_print:       total time =    1681.00 ms /    61 tokens
llama_perf_context_print:    graphs reused =         37
Llama.generate: 126 prefix-match hit, remaining 301 prompt tokens to eval


Output: {"text": "–∏—Ç–æ–≥–∏ —Ä–æ–∑—ã–≥—Ä—ã—à–∞", "date": "2026-01-28", "time": null, "repeat": null}
--------------------
Input: [–†–µ–ª–∏–∑—ã] 
https://wiki.domrf.ru/pages/viewpAge.action?paGeId=267595334

–ë–∞–Ω–∫:
‚ÄºÔ∏è18.01.2026 –≤ 10:00 (–≤–æ—Å–∫—Ä)
https://jira.domrf.ru/browse/INTEGRATIO-4771
https://jira.dOmrf.ru/bRowse/INTEGRATIO-4879
–µ–õ–∫-–≠—Å–∫—Ä–æ—É –°–ò–Ω—Ö–†–æ–Ω–∏–ó–∞—Ü–∏–Ø –¥–∞–Ω–Ω—ã—Ö –∏–∑ –ù–∞ + –¥–æ—Ä–∞–±–æ—Ç–∫–∞ –ø—Ä–æ–≤–µ—Ä–∫–∏ –±–µ–Ω–ï—Ñ–ò—Ü–ò–ê—Ä–æ–≤
https://jira.domrf.ru/browse/RFC-9906 
@exkot @hopheylalalei —Å–¥–µ–ª–∞–π—Ç–µ —Å—Ç–†–∞–Ω–∏—Ü—É –ø–æ —ç–¢–æ–º—É —Ä–µ–ª–∏–∑—É (–ø–æ –≤–æ–∑–ú–æ–∂–Ω–æ—Å—Ç–∏, –ª–∏–ë–æ —Ä–ø–æ—Å—Ç–æ –æ—Ç–ø–∏—à–∏—Ç–µ –µ—Å—Ç—å –ª–∏ –ö–∞–∫–∏–µ –¥–æ–ø–æ–ª–Ω–∏—Ç–µ–ª–¨–Ω—ã–µ –∑–ê–¥–∞–ß–∏ –ø–û –ö–æ–Ω—Ñ–∏–≥—É–≥–∏—Ä–æ–≤–∞–Ω–∏—é –≤ —ç—Ç–æ–º —Ä–µ–ª–∏–∑–µ) –∏ –ø—Ä–æ–ø–ò—à–∏—Ç–µ –≤–µ—Ç–∫–∏ –ø–ª–∑ –≤ —Ç–∞–±–ª–∏—á–∫–µ https://wiki.dOmrf.ru/pagEs/viewpAge.Action?pageId=315611369 

 [elK-facade] [athena-adapter] [athena-facade]

@alexzOdiac..


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     922.67 ms /   301 tokens (    3.07 ms per token,   326.23 tokens per second)
llama_perf_context_print:        eval time =    2523.44 ms /    62 runs   (   40.70 ms per token,    24.57 tokens per second)
llama_perf_context_print:       total time =    3488.19 ms /   363 tokens
llama_perf_context_print:    graphs reused =         59
Llama.generate: 126 prefix-match hit, remaining 23 prompt tokens to eval


Output: {"text": "–ë–∞–Ω–∫: —Å–¥–µ–ª–∞–π—Ç–µ —Å—Ç—Ä–∞–Ω–∏—Ü—É –ø–æ —ç—Ç–æ–º—É —Ä–µ–ª–∏–∑—É –∏ –ø—Ä–æ–ø–∏—à–∏—Ç–µ –≤–µ—Ç–∫–∏ –≤ —Ç–∞–±–ª–∏—á–∫–µ", "date": "2026-01-18", "time": "10:00", "repeat": "none"}
--------------------
Input: 28  —è–Ω–≤–∞—Ä—è –∏—Ç–æ–≥–∏ —Ä–æ–∑–´–≥—Ä—ã—à–∞


llama_perf_context_print:        load time =     517.78 ms
llama_perf_context_print: prompt eval time =     102.91 ms /    23 tokens (    4.47 ms per token,   223.50 tokens per second)
llama_perf_context_print:        eval time =    1799.13 ms /    45 runs   (   39.98 ms per token,    25.01 tokens per second)
llama_perf_context_print:       total time =    1932.34 ms /    68 tokens
llama_perf_context_print:    graphs reused =         42


Output: {"text": "–∏—Ç–æ–≥–∏ —Ä–æ–∑–´–≥—Ä—ã—à–∞", "date": "2026-01-28", "time": "00:00", "repeat": "none"}
--------------------


## Note setup for production

For the actual 4GB/4CPU deployment, you simply need:
1. The `.gguf` file.
2. A lightweight python script using `llama-cpp-python` (like above).
3. No heavy dependencies like `torch` or `transformers`. Just `llama-cpp-python`.
