# Overview

Let's intergrate with `stabilityai/stablelm-2-zephyr-1_6b` in this bookk. This is one of the normal ways that we talk to these models.

In [1]:
%%capture
# we bump transformer version to fix "Keyword err: stalm"
!pip install transformers==4.38.2
!pip install accelerate==0.27.2

In [2]:
import os
import torch

os.environ["MODEL_NAME"]="stabilityai/stablelm-2-zephyr-1_6b"

torch.backends.cudnn.deterministic=True

In [3]:
!accelerate estimate-memory ${MODEL_NAME} --library_name transformers

Loading pretrained config for `stabilityai/stablelm-2-zephyr-1_6b` from `transformers`...
config.json: 100%|█████████████████████████████| 608/608 [00:00<00:00, 3.14MB/s]
┌──────────────────────────────────────────────────────────────────┐
│  Memory Usage for loading `stabilityai/stablelm-2-zephyr-1_6b`   │
├───────┬─────────────┬──────────┬─────────────────────────────────┤
│ dtype │Largest Layer│Total Size│       Training using Adam       │
├───────┼─────────────┼──────────┼─────────────────────────────────┤
│float32│   784.0 MB  │ 5.37 GB  │             21.49 GB            │
│float16│   392.0 MB  │ 2.69 GB  │             10.74 GB            │
│  int8 │   196.0 MB  │ 1.34 GB  │             5.37 GB             │
│  int4 │   98.0 MB   │687.67 MB │             2.69 GB             │
└───────┴─────────────┴──────────┴─────────────────────────────────┘


In [4]:
from transformers import AutoModelForCausalLM


model=AutoModelForCausalLM.from_pretrained(
    os.getenv('MODEL_NAME'),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True # needed for Stable LM 2 based models
)
print(model.model.config)

model.safetensors:   0%|          | 0.00/3.29G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/121 [00:00<?, ?B/s]

StableLmConfig {
  "_name_or_path": "stabilityai/stablelm-2-zephyr-1_6b",
  "architectures": [
    "StableLmForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 100257,
  "eos_token_id": 100257,
  "hidden_act": "silu",
  "hidden_dropout": 0.0,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5632,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 4096,
  "model_type": "stablelm",
  "num_attention_heads": 32,
  "num_hidden_layers": 24,
  "num_key_value_heads": 32,
  "partial_rotary_factor": 0.25,
  "rope_scaling": null,
  "rope_theta": 10000,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "use_qkv_bias": true,
  "vocab_size": 100352
}



In [5]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(
    os.getenv('MODEL_NAME'),
    trust_remote_code=True,
    use_fast=False
)

print(tokenizer)

tokenizer_config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.01M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/917k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/784 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

GPT2Tokenizer(name_or_path='stabilityai/stablelm-2-zephyr-1_6b', vocab_size=100289, model_max_length=2048, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|reg_extra|>', '<|endoftext|>', '<|fim_prefix|>', '<|fim_middle|>', '<|fim_suffix|>', '<|fim_pad|>', '<gh_stars>', '<filename>', '<issue_start>', '<issue_comment>', '<issue_closed>', '<jupyter_start>', '<jupyter_text>', '<jupyter_code>', '<jupyter_output>', '<empty_output>', '<commit_before>', '<commit_msg>', '<commit_after>', '<reponame>', '<|endofprompt|>', '<|im_start|>', '<|im_end|>', '<|pause|>', '<|reg0|>', '<|reg1|>', '<|reg2|>', '<|reg3|>', '<|reg4|>', '<|reg5|>', '<|reg6|>', '<|reg7|>', '<|extra0|>']}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	100256: AddedToken("<|reg_extra|>", rstrip=False, lstrip=False, single_word=False, no

In [6]:
question='Hello! How abou the weather in Melbourne?'
messages=[{'role':'user', 'content': question}]

gen_config={
    'max_new_tokens': 500,
    'temperature': 0.7,
    'top_p':0.1,
    'repetition_penalty': 1.18,
    'top_k': 40,
    'do_sample': True,
    'max_new_tokens': 500
}

input_tokens=tokenizer.apply_chat_template(
    messages, 
    return_tensors='pt',
    add_generation_prompt=True
).to('cuda')


output_tokens=model.generate(
    input_tokens,
    **gen_config
)

output_tokens=output_tokens[0][len(input_tokens[0]):]
output=tokenizer.decode(output_tokens, skip_special_tokens=True)

print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.



If you're planning an outdoor activity or event, be sure to check for potential risks associated with high temperatures and humidity levels before making your plans. It is also recommended to drink plenty of water and wear light-colored, loose-fitting clothing when outdoors during hot days in Melbourne. If you have concerns about air quality, please refer to official sources like the Victorian Air Quality Index for up-to-date information. Stay safe and enjoy the beautiful fall weather if you do decide to venture out!
