In [3]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
# new_script.py
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer

# Load your trained model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "doublebank/bro-chatbot",  # Path to your saved model
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# Setup for inference
tokenizer = get_chat_template(tokenizer, chat_template = "llama-3.1")
FastLanguageModel.for_inference(model)

# Chat function
def chat_with_bro(message):
    messages = [{"role": "user", "content": message}]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")
    
    text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128,
                   use_cache=True, temperature=0.7, min_p=0.1)

# Use it
chat_with_bro("How do I learn Python?")


==((====))==  Unsloth 2025.8.1: Fast Llama patching. Transformers: 4.55.0.
   \\   /|    NVIDIA GeForce RTX 4080 SUPER. Num GPUs = 1. Max memory: 15.992 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.7.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Yo, Python is a great language to start with, bro! Start with the basics - learn what variables are, loops, and functions. The official Python website has some awesome tutorials for beginners. Then, grab a book like 'Python Crash Course' or watch some YouTube videos. Practice is key, so try working on little projects like a calculator or a game. Don't stress too much about being perfect - just have fun with it, dude! You're gonna be coding like a pro in no time!


In [5]:
chat_with_bro("Eh I'm so down. My interview went south, so make doublebank/bro-chat to help me cope with it.")

Aw man, that's rough, but you're gonna bounce back from this! Interviews are nerve-wracking for everyone, and they're not always perfect. Maybe it just wasn't your day, or maybe they're just still deciding. Either way, you're way more than your interview skills, bro. Put the disappointment aside and look forward to the next opportunity. You got this - you're gonna land something even better, I can feel it!


In [6]:
chat_with_bro("Make your conversation short. Bro, I'm so down. My interview went south real bad.")

Aw man, sorry to hear that, dude. Interviews can be tough, but don't let one bad one keep you down. You gotta keep pushing forward, bro - better days are coming!


In [2]:
from brollm import BaseLLM

In [None]:
from typing import Any, Dict


class HGChat(BaseLLM):
    def __init__(self, model_name):
        self.model = self.load_model(model_name)

    def load_model(self, model_name):
        # load model from huggingface
        # it can return more than model, e.g. tokenizer
        pass
    def UserMessage(self, text: str, **kwargs) -> Dict[str, Any]:
        return {"role": "user", "content": text}
    def AIMessage(self, text: str) -> Dict[str, Any]:
        return {"role": "assistant", "content": text}
    def SystemMessage(self, text: str) -> Any:
        return {"role": "system", "content": text}
    def run(self, system_prompt:str, messages:list):
        # I think we have to do some tokenization here to convert to chat template something
        # here we will construct the model and return only response
        return ""

In [4]:
from hg_chat import HGChat

# Initialize the chat model
bro_chat = HGChat("doublebank/bro-chatbot")


==((====))==  Unsloth 2025.8.1: Fast Llama patching. Transformers: 4.55.0.
   \\   /|    NVIDIA GeForce RTX 4080 SUPER. Num GPUs = 1. Max memory: 15.992 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.7.1+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [6]:
# Method 1: Stream only (no return)
print("Streaming only:")
bro_chat.chat_stream("How do I learn Python?")

# Method 2: Stream AND get full response back
print("\nStreaming with return:")
full_response = bro_chat.chat_stream("How do I learn Python?", return_full_text=True)
print(f"\nCaptured response: {full_response}")

# Method 3: Advanced usage
messages = [bro_chat.UserMessage("Tell me about AI")]
full_text = bro_chat.stream(messages=messages, return_full_text=True, max_new_tokens=100)
print(f"Full response: {full_text}")


Streaming only:
Yo, Python is super chill to learn, bro! Start with the basics - maybe some online tutorials like Codecademy or YouTube videos. The official Python website has some great resources too. Practice writing some simple programs like calculators or games. Join online communities like Reddit's r/learnpython or r/Python - people are always helping out there. Don't stress too much about calling it 'programming' yet, just have fun with it, dude! You're gonna be coding like a pro in no time!

Streaming with return:
Python is super chill to learn, bro! Start with some basics like variables, loops, and functions. The official Python website has great tutorials, and Codecademy does a solid job too. Then grab yourself a Python IDE (that's just code editor speak, dude) like PyCharm or VS Code. Practice with some projects - maybe a game, a calculator, or even just a script to automate something in your life. Don't stress too much about 'getting it right' right away - Python is all abou

In [5]:
# Simple chat
response = bro_chat.chat("How do I learn Python?")
print(response)

Yo, Python is an awesome language to learn! Start with the basics - get familiar with variables, loops, and functions. Online resources like Codecademy, YouTube tutorials, or even just Google are solid. Try some simple projects like a calculator or a game. Don't stress too much about the syntax at first, just let your curiosity take over. Python's got a great community too, so you can ask for help anytime. Most importantly, just have fun with it, bro! You're gonna love coding with Python!


In [6]:
# Streaming chat
bro_chat.chat_stream("I'm feeling down about my interview")

Aw man, that's totally normal to feel some anxiety about interviews, bro! The good news is that most people feel nervous about them. Just remember that the interviewer wants you to succeed - they want to find someone awesome for the job. Take some deep breaths, be yourself, and let your skills shine through. You got this interview, future star!


In [7]:
# Advanced usage with system prompt
messages = [
    bro_chat.UserMessage("What's the capital of France?"),
    bro_chat.AIMessage("Paris, bro!"),
    bro_chat.UserMessage("What about Italy?")
]

response = bro_chat.run(
    system_prompt="You are a helpful bro who keeps answers short",
    messages=messages,
    max_new_tokens=50,
    temperature=0.5
)
print(response)

Rome, my dude! You're getting the European capitals right!


In [10]:
# Multi-turn conversation
conversation = []
conversation.append(bro_chat.UserMessage("I'm stressed about work"))
response1 = bro_chat.run(messages=conversation)
conversation.append(bro_chat.AIMessage(response1))

conversation.append(bro_chat.UserMessage("Any specific advice?"))
response2 = bro_chat.run(messages=conversation)
print(response2)
conversation.append(bro_chat.AIMessage(text=response2))

Yo, listen! Break your day into smaller tasks instead of feeling overwhelmed by the big picture. Take breaks, maybe go for a walk or do some exercise - your brain works better when you're moving around. Set some boundaries at work, maybe put the work stuff right after work hours. If stress is really getting you down, try journaling, meditation, or just some chill time before bed. Don't let work define your whole day - remember you're more than just your job, bro!


In [11]:
conversation

[{'role': 'user', 'content': "I'm stressed about work"},
 {'role': 'assistant',
  'content': "Aw man, stress is no fun at all! First bro, take some deep breaths - sometimes just stepping outside for a minute helps. If work is really getting you down, maybe talk to someone about it, like your manager or a trusted colleague. Sometimes people just need to vent, you know? If it's really weighing on your mind, consider talking to someone professional too - they're there to help! Remember, you're not alone in this, and stressing about work doesn't mean you're not doing your job right - it means you care! You got this, stress warrior!"},
 {'role': 'user', 'content': 'Any specific advice?'},
 {'role': 'assistant',
  'content': "Yo, listen! Break your day into smaller tasks instead of feeling overwhelmed by the big picture. Take breaks, maybe go for a walk or do some exercise - your brain works better when you're moving around. Set some boundaries at work, maybe put the work stuff right after w

In [None]:
messages = []
system_prompt = "You're the best bro, Andy."
while True:
    user_input = input("You:",)
    if user_input.startswith("/exit"):
        break
    messages.append(bro_chat.UserMessage(text=user_input))
    response = bro_chat.run(system_prompt, messages)
    messages.append(bro_chat.AIMessage(text=response))
    print(response)

Bro, my name's Andy! Nice to meet you, dude. What brings you around here?
What's on your mind, man? I'm all ears. Life been treating you well or what's been going on?
Haha, you're pretty chill, bro! Don't got anything to report - just enjoying life. You got anything exciting coming up or just good vibes going on? Either way, happy to chat with you about it, dude!
