# Kaleidoscope SDK Demo

In [1]:
# Notebook Deps
from IPython.lib.pretty import pretty
from pprint import pprint
import kaleidoscope


## Overview

In this demo, we will explore an **experimental** workflow for fine-tuning models on top of activations retreived from foundational models hosted on the Vector cluster. We will briefly demonstrate a few fundamental concepts in the following sections:

* Text generation
* Model querying and activation generation
* Fine-tuning

We will be interfacing with a deployment of the Open Pre-trained Transformers (OPT). This demonstration will utilize the OPT-175B parameter model.

## Text Generation


The Vector Kaleidoscope Client class will be our primary tool for loading and querying the large models. 

### Client Initialization

In [2]:
GATEWAY_HOST = "llm.cluster.local"
GATEWAY_PORT = 3001

In [3]:
client = kaleidoscope.Client(gateway_host=GATEWAY_HOST, gateway_port=GATEWAY_PORT)

The client provides a set of functions for loading and launching models on the Vector cluster. We can query the available models with the ``models`` property.

In [4]:
client.models

['OPT-175B']

We can view the model instances that are currently running on the cluster via the ``model_instances`` property.

In [5]:
client.model_instances

[{'id': 'b827ad94-eee5-4166-977f-4c3b22bb0373',
  'name': 'OPT-175B',
  'state': 'ACTIVE'}]

We can obtain a handle to a given model with the ``load_model`` function. If no model instances are currently available, ``model_instances`` will return an empty list. 

In [6]:
opt_model = client.load_model("OPT-175B")

You can monitor the deployment of a model instance using the ``state`` attribute. Model instances can take the following states:

* **PENDING**: Waiting to send job to Vector SLURM.
* **LAUNCHING**: Job accepted by Vector SLURM, waiting for job to run.
* **LOADING**: SLURM job running, waiting for model to load and initialize.
* **ACTIVE**: Ready for requests.

We can verify the model state:

In [7]:
opt_model.state

'ACTIVE'

And we can perform our first generation:

In [8]:
prompt = "Hello World"
response = opt_model.generate(prompt)

print("Prompt: ", prompt)
print("Generation: ", response.generation["text"])

Prompt:  Hello World
Generation:  !

Menu

Tag Archives: gluten free

There’s been such cool variation in the #sixsistersstampin projects I


We can also modify the generation hyper-parameters:

In [9]:
response = opt_model.generate(prompt, {"temperature": 0.9})

print("Prompt: ", prompt)
print("Generation: ", response.generation["text"])

Prompt:  Hello World
Generation:  : The Journal of PHP

Hello World: The Journal of PHP is a peer-reviewed open-access academic journal published by The PHP Association, covering research


## Activation Generation

Activation generation is also quite easy. We can use the model object to query the remote model and explore the various modules. 

In [10]:
opt_model.module_names

['_fsdp_wrapped_module',
 '_fsdp_wrapped_module._fpw_module',
 '_fsdp_wrapped_module._fpw_module.decoder',
 '_fsdp_wrapped_module._fpw_module.decoder.embed_tokens',
 '_fsdp_wrapped_module._fpw_module.decoder.embed_positions',
 '_fsdp_wrapped_module._fpw_module.decoder.layers',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module.dropout_module',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module.self_attn',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module.self_attn.dropout_module',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module.self_attn.qkv_proj',
 '_fsdp_wrapped_module._fpw_module.decoder.layers.0._fsdp_wrapped_module._fpw_module.self

We can select the module names of interest and pass them into a ``get_activations`` function alongside our set of prompts.

In [11]:
module_names = ['_fsdp_wrapped_module._fpw_module.decoder']

response = opt_model.get_activations(prompt, module_names)

print("Prompt: ", prompt)
print("Activations: ", response.activations)

Prompt:  Hello World
Activations:  {'_fsdp_wrapped_module._fpw_module.decoder': tensor([[-7.3906, -7.4141, -0.4392,  ..., -7.9883, -7.4492, -7.6133],
        [-6.4844, -7.0078,  6.0352,  ..., -7.5820, -7.5000, -7.9141]],
       dtype=torch.float16)}


Module activations are returned as torch tensors. We can access them through the activations object:

In [12]:
response.activations['_fsdp_wrapped_module._fpw_module.decoder']

tensor([[-7.3906, -7.4141, -0.4392,  ..., -7.9883, -7.4492, -7.6133],
        [-6.4844, -7.0078,  6.0352,  ..., -7.5820, -7.5000, -7.9141]],
       dtype=torch.float16)

In [13]:
response.activations['_fsdp_wrapped_module._fpw_module.decoder'].shape

torch.Size([2, 50272])