#CafChem Teaching - Transformers demo.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MauricioCafiero/CafChemTeach/blob/main/notebooks/Transformers_demo_CafChem.ipynb)

## This notebook allows you to:
- See how decoders generate text
- See how encoders fill in text.

## Requirements:
- It will install all needed libraries.
- You will need a HuggingFace token saved as a secret
- can run a high memory CPU, but using any GPU will greatly increase speed.

In [1]:
!git clone https://github.com/MauricioCafiero/CafChemTeach.git

Cloning into 'CafChemTeach'...
remote: Enumerating objects: 14, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 14 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (14/14), 6.42 KiB | 6.42 MiB/s, done.
Resolving deltas: 100% (3/3), done.


In [2]:
import pandas as pd
import numpy as np

import CafChemTeach.CafChemTeach as cct

## Demonstrate transformer decoder autoregressive inference.
- set-up model, tokenizer and device
- call for inference
- display results
- can also call for the top *n* tokens at each step and display the results.

In [3]:
model_name = "microsoft/Phi-3.5-mini-instruct"

tokenizer, model, device = cct.setup_decoder(model_name)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

model setup complete: microsoft/Phi-3.5-mini-instruct


In [4]:
input_txt = "what country has the best cuisine?"
iterations = cct.decoder_inference(model, tokenizer,device, input_txt, n_steps= 100,
                               TEMP = 1.0)

In [5]:
cct.display_autoregression(iterations)

Token               Percent Probability

                     34.19%

                     80.28%
Ch                    2.21%
at                    99.97%
bot                   98.63%
:                     96.43%
The                   20.56%
appreci               5.16%
ation                 100.00%
of                    84.75%
good                  0.73%
cu                    84.55%
is                    100.00%
ine                   100.00%
var                   8.55%
ies                   100.00%
by                    6.38%
individual            54.89%
t                     35.00%
ast                   99.99%
es                    100.00%
,                     24.97%
as                    2.49%
there                 11.80%
is                    43.69%
no                    93.24%
objective             9.60%
measure               57.96%
of                    30.49%
the                   39.02%
"                     72.97%
best                  99.98%
"                     55.45%
cu   

In [6]:
prob_iterations = cct.decoder_list_probs(model, tokenizer,device, input_txt, n_steps= 100,
                               TEMP = 1.0, number_to_return = 4)

  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())
  token_id.append(torch.tensor(topk).cuda())


In [7]:
cct.display_list_probs(prob_iterations)

The: 7.323189079761505
It: 8.994079381227493
This: 10.452038049697876

: 34.18930172920227
This: 1.0201356373727322
The: 1.1555506847798824
A: 2.6561640202999115

: 80.27666211128235
Ass: 4.4883981347084045
A: 5.07400669157505
Answer: 8.118794113397598
#: 18.051128089427948
Response: 0.3185216337442398
Comple: 0.4733067471534014
: 5.073525384068489
Answer: 92.40642189979553
The: 0.050070928409695625
:: 1.777053065598011
: 2.5199273601174355

: 95.26922702789307

: 0.8449164219200611
It: 3.1101729720830917
The: 3.8140229880809784
Det: 90.22757411003113
aching: 1.2448851727242527e-06
ract: 2.436488344415011e-06
ect: 2.389942324043659e-05
erm: 99.99997615814209
ination: 0.00025547697077854536
in: 0.0002906857162088272
ing: 0.00031877207220532
ining: 99.9988317489624
a: 0.25860590394586325
": 2.3430492728948593
the: 28.66705060005188
which: 68.68793368339539
cu: 0.0003123595433862647
one: 0.00047358466872537974
single: 0.0008143728337017819
country: 99.99725818634033
': 0.0099698401754722


## Demonstrate transformer encoder inference

In [8]:
encoder_tokenizer, encoder_pipe, device = cct.setup_encoder("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [10]:
raw_text = "After work I need to go to the store"
num_to_mask = 2

input_txt, mask_idx = cct.mask_text(raw_text, num_to_mask)
print(f"Masked text: {input_txt}")

Masked text: After [MASK] I need to go [MASK] the store


In [11]:
result = encoder_pipe(input_txt)

In [13]:
cct.maskfilling_results(result)

Token: school    , Prob: 23.08
Token: dinner    , Prob: 17.81
Token: that      , Prob: 17.28
Token: work      , Prob:  9.02
Token: lunch     , Prob:  8.21
Token: to        , Prob: 91.08
Token: into      , Prob:  4.65
Token: through   , Prob:  1.08
Token: by        , Prob:  0.82
Token: in        , Prob:  0.55
