# Kaleidoscope SDK Demo

In [1]:
# Notebook Deps
from IPython.lib.pretty import pretty
from pprint import pprint
import kscope


## Overview

In this demo, we will explore an **experimental** workflow for fine-tuning models on top of activations retreived from foundational models hosted on the Vector cluster. We will briefly demonstrate a few fundamental concepts in the following sections:

* Text generation
* Model querying and activation generation
* Fine-tuning

We will be interfacing with a deployment of the Open Pre-trained Transformers (OPT). This demonstration will utilize the OPT-175B parameter model.

## Text Generation


The Vector Kaleidoscope (kscope) Client class will be our primary tool for loading and querying the large models. 

### Client Initialization

In [2]:
GATEWAY_HOST = "llm.cluster.local"
GATEWAY_PORT = 3001

In [3]:
client = kscope.Client(gateway_host=GATEWAY_HOST, gateway_port=GATEWAY_PORT)

The client provides a set of functions for loading and launching models on the Vector cluster. We can query the available models with the ``models`` property.

In [4]:
client.models

['gpt2', 'opt-6.7b', 'opt-175b', 'falcon-7b', 'falcon-40b', 'gptj']

We can view the model instances that are currently running on the cluster via the ``model_instances`` property.

In [5]:
client.model_instances

[{'id': '22f7287b-a198-447d-b1f8-ab97f6cff026',
  'name': 'falcon-40b',
  'state': 'LOADING'},
 {'id': 'e91f661a-d57e-4d66-ac7d-0931a8cb2dff',
  'name': 'opt-175b',
  'state': 'ACTIVE'}]

We can obtain a handle to a given model with the ``load_model`` function. If no model instances are currently available, ``model_instances`` will return an empty list. 

In [6]:
opt_model = client.load_model("opt-175b")

You can monitor the deployment of a model instance using the ``state`` attribute. Model instances can take the following states:

* **PENDING**: Waiting to send job to Vector SLURM.
* **LAUNCHING**: Job accepted by Vector SLURM, waiting for job to run.
* **LOADING**: SLURM job running, waiting for model to load and initialize.
* **ACTIVE**: Ready for requests.

We can verify the model state:

In [7]:
opt_model.state

'ACTIVE'

And we can perform our first generation:

In [8]:
prompt = "Hello World"
response = opt_model.generate(prompt)

print("Prompt: ", prompt)
print("Generation: ", response.generation["sequences"])

Prompt:  Hello World
Generation:  ['!\n\nHello! This is my first post on the WordPress platform, and I’m happy to be here.\n\nI’m new']


We can also modify the generation hyper-parameters:

In [9]:
response = opt_model.generate(prompt, {"temperature": 0.9})

print("Prompt: ", prompt)
print("Generation: ", response.generation["sequences"])

Prompt:  Hello World
Generation:  ['!\n\nMy name is Andrea and I have been making jewelry for about a year. I had been collecting and making jewelry for about two years before I actually']


## Activation Generation

Activation generation is also quite easy. We can use the model object to query the remote model and explore the various modules. 

In [10]:
opt_model.module_names

['decoder',
 'decoder.embed_tokens',
 'decoder.embed_positions',
 'decoder.layers',
 'decoder.layers.0',
 'decoder.layers.0.dropout_module',
 'decoder.layers.0.self_attn',
 'decoder.layers.0.self_attn.dropout_module',
 'decoder.layers.0.self_attn.qkv_proj',
 'decoder.layers.0.self_attn.out_proj',
 'decoder.layers.0.self_attn_layer_norm',
 'decoder.layers.0.fc1',
 'decoder.layers.0.fc2',
 'decoder.layers.0.final_layer_norm',
 'decoder.layers.1',
 'decoder.layers.1.dropout_module',
 'decoder.layers.1.self_attn',
 'decoder.layers.1.self_attn.dropout_module',
 'decoder.layers.1.self_attn.qkv_proj',
 'decoder.layers.1.self_attn.out_proj',
 'decoder.layers.1.self_attn_layer_norm',
 'decoder.layers.1.fc1',
 'decoder.layers.1.fc2',
 'decoder.layers.1.final_layer_norm',
 'decoder.layers.2',
 'decoder.layers.2.dropout_module',
 'decoder.layers.2.self_attn',
 'decoder.layers.2.self_attn.dropout_module',
 'decoder.layers.2.self_attn.qkv_proj',
 'decoder.layers.2.self_attn.out_proj',
 'decoder.laye

We can select the module names of interest and pass them into a ``get_activations`` function alongside our set of prompts.

In [12]:
module_names = ['decoder.layers.0']

response = opt_model.get_activations(prompt, module_names)

print("Prompt: ", prompt)
print("Activations: ", response.activations)

Prompt:  Hello World
Activations:  [{'decoder.layers.0': tensor([[ 1.2041, -0.0936,  0.0403,  ..., -0.5361,  0.4006,  0.6240],
        [ 0.1639, -0.4377, -0.4321,  ..., -0.2380,  0.0721,  0.8525]],
       dtype=torch.float16)}]


Module activations are returned as torch tensors. We can access them through the activations object:

In [16]:
response.activations[0]['decoder.layers.0']

tensor([[ 1.2041, -0.0936,  0.0403,  ..., -0.5361,  0.4006,  0.6240],
        [ 0.1639, -0.4377, -0.4321,  ..., -0.2380,  0.0721,  0.8525]],
       dtype=torch.float16)

In [17]:
response.activations[0]['decoder.layers.0'].shape

torch.Size([2, 12288])