#  Career Counselor Bot — TinyLlama 1.1B


## ✅ Step 1 — Check GPU

In [2]:
!nvidia-smi
import torch
print(f'\nGPU Available: {torch.cuda.is_available()}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
print(f'Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')

Tue Feb 24 06:33:12 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   45C    P8             12W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

## ✅ Step 2 — Install Libraries

In [3]:
!pip install -q transformers datasets peft trl accelerate bitsandbytes
print('✅ Done!')

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m540.5/540.5 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.7/60.7 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h✅ Done!


## ✅ Step 3 — Upload Dataset

In [4]:
from google.colab import files
print('📂 Upload your career_qa_dataset.json')
uploaded = files.upload()
print('✅ Uploaded!')

📂 Upload your career_qa_dataset.json


Saving career_qa_dataset.json to career_qa_dataset.json
✅ Uploaded!


## ✅ Step 4 — Load and Format Dataset

In [5]:
import json
from datasets import Dataset

with open('career_qa_dataset.json', 'r') as f:
    raw_data = json.load(f)

system_prompt = 'You are an expert career counselor specializing in helping students in India choose the right career path. Always be encouraging, specific, and data-driven in your responses.'

# TinyLlama chat format
def format_example(item):
    text = (
        f"<|system|>\n{system_prompt}</s>\n"
        f"<|user|>\n{item['question']}</s>\n"
        f"<|assistant|>\n{item['answer']}</s>"
    )
    return {'text': text}

formatted = [format_example(item) for item in raw_data]
dataset = Dataset.from_list(formatted)

print(f'✅ Loaded {len(dataset)} examples!')
print('\n--- Sample ---')
print(dataset[0]['text'][:400])

✅ Loaded 300 examples!

--- Sample ---
<|system|>
You are an expert career counselor specializing in helping students in India choose the right career path. Always be encouraging, specific, and data-driven in your responses.</s>
<|user|>
What is the average salary (LPA) for a Software Engineer?</s>
<|assistant|>
The average salary for a Software Engineer is around 6 LPA.</s>


## ✅ Step 5 — Load TinyLlama Model


In [6]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'
print('⏳ Loading TinyLlama...')

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map='auto'
)
model.config.use_cache = False

print('✅ TinyLlama loaded!')
print(f'Memory used: {torch.cuda.memory_allocated()/1e9:.2f} GB')

⏳ Loading TinyLlama...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

✅ TinyLlama loaded!
Memory used: 0.83 GB


## ✅ Step 6 — Set Up LoRA

In [7]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
    bias='none',
    task_type='CAUSAL_LM'
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
print('\n✅ LoRA ready!')

trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.1023

✅ LoRA ready!


## ✅ Step 7 — Train!

In [12]:
from trl import SFTTrainer, SFTConfig

training_args = SFTConfig(
    output_dir='./tinyllama-career-counselor',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    logging_steps=10,
    save_steps=50,
    dataset_text_field='text',
    warmup_steps=10,
    lr_scheduler_type='cosine',
    optim='paged_adamw_8bit',
    report_to='none'
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args
)

print('🚀 Training started!')
trainer.train()
print('🎉 Training complete!')

Adding EOS to train dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

🚀 Training started!


Step,Training Loss
10,2.151629
20,1.627534
30,1.022457
40,0.5102
50,0.373584
60,0.299152
70,0.257492
80,0.2269
90,0.199706
100,0.19238


🎉 Training complete!


## ✅ Step 8 — Save

In [13]:
trainer.save_model('./tinyllama-career-counselor')
tokenizer.save_pretrained('./tinyllama-career-counselor')
print('✅ Saved!')

✅ Saved!


## ✅ Step 9 — Test Your Bot 🎓

In [14]:
from transformers import pipeline

def ask_bot(question):
    prompt = (
        f"<|system|>\nYou are an expert career counselor for Indian students.</s>\n"
        f"<|user|>\n{question}</s>\n"
        f"<|assistant|>\n"
    )
    pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=250)
    result = pipe(prompt)[0]['generated_text']
    return result.split('<|assistant|>')[-1].strip()

questions = [
    'I am a Science PCM student. What career should I choose?',
    'I love coding. Which engineering branch is best?',
    'What is the scope of Data Science in India?'
]

for q in questions:
    print(f'\n👤 {q}')
    print(f'🤖 {ask_bot(q)}')
    print('-' * 60)

Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=250) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



👤 I am a Science PCM student. What career should I choose?


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Both `max_new_tokens` (=250) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🤖 For < < < <meta <<<|“<<|< < <code</details <<<<\|<| < <></a: And < < < <<<<a <<> < < <Current<<<< <|var<<longrightarrow\| <code></code-<m < <Current<<<< <<< <<<Node</code> <| < < << <a-current « < «<code<<</param < < < <ms: A <recipeschus quires<<!-- | | « <convertal| < < < <code:  <</“</code>
------------------------------------------------------------

👤 I love coding. Which engineering branch is best?


Both `max_new_tokens` (=250) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🤖 The moral functions >|><m_ < "<>
< «cou< <pre> < « </strong <code|> < <recialal permissions<<<Awar<<|server"algangal program < < <! "<<< <current-code></</</code | <<<|<“<<code><title"><<longrightarrow< < < <<|<<<||<<|<</“</code <number><< < <re<<msg>
####<&|><msg|“icons <<|> => "< <<|< <<longrightarrow
------------------------------------------------------------

👤 What is the scope of Data Science in India?
🤖 The Netherlands</recims say and other « ressrecifunction<‘s already<|</code?<recium<msg = logs feel infominators > < “activ permissions << <<<<<div<| < <re <label> < < <<<|currentURL< <param><< < <<< < "<<<NAME < <| < <!Έ « ress< < <syspolm <> < < <meta< <> </a <expert <recifaces;<> < < < < <current[ <code <<<| < <rems.
<<< < << <<<<code>/mail offer templates</currentDesc => < < «</div<<<< <  <<<<|<<</sys_url|{{<!-- <|<<ms (< <var "<><< < <><><<>ALES|<< <<<< <<<<<<<<<longrightarrow|<<<<\| ><: <issue < <</div<><</< < < <<<|< <|<|<<|<<<<
-----------------------------------------

## ✅ Step 10 — Download Model

In [15]:
import shutil
from google.colab import files
shutil.make_archive('tinyllama-career-counselor', 'zip', './tinyllama-career-counselor')
files.download('tinyllama-career-counselor.zip')
print('✅ Downloaded! 🎉')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

✅ Downloaded! 🎉
