# B — SGD, A — Hebb

In [1]:
!python train_hebb.py config/finetune_shakespeare_hebb_a.py

Overriding config with config/finetune_shakespeare_hebb_a.py:
import time

out_dir = 'out-shakespeare-a'
eval_interval = 5
eval_iters = 40
wandb_log = True # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-xl' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = True

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 40

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False

hebb_updates = True 
bias=False
rank=8
alpha=32
hebb_dropout=0.1
lora_init={'lora_a' : 'normal', 'lora_b' : 'zeros'}
lora_frozen_layers=[]
attn_modules=['c_attn']
hebb_lr=0.045
temperature=1.0
hebb_linears=['lora_a']
tokens per iteration will be: 32,768
Initializing from OpenAI GPT-2 weights

In [2]:
!python hebb_sample.py --out_dir=out-shakespeare-a

Overriding: out_dir = out-shakespeare-a
  checkpoint = torch.load(ckpt_path, map_location=device)
number of parameters: 1558.43M
No meta.pkl found, assuming GPT-2 encodings...


The court,
There they have their court-days,
And for to lose them, should
The King die or queen be queen,
I'll lose them no day; but I'll have
They be out of my service and then
The king shall have power to have me
Harshly dealt with, man, and then to have
The kingdom; but I'll have them out
One good day with a good man and then
They shall be in my service again.

KING RICHARDII:
Why, if I must have a good king, I'll
have one that is an honest man.

BORUN:
I'll have a good man, I'll have a good king.

KING RICHARDII:
And if my dear lord the Duke of Burgundy
shall, by my counsel, have the king in
his service, I'll have him in his
service again.

BORUN:
And if I must have a good Duke of Burgundy,
I'll have him as his lawful son.

KING RICHARDII:
And if I must have a good Duke of Burgundy,
I'll have him as the tru

# B — SGD, A — Frozen

In [3]:
!python train_hebb.py config/finetune_shakespeare_hebb_a_no_updates.py

Overriding config with config/finetune_shakespeare_hebb_a_no_updates.py:
import time

out_dir = 'out-shakespeare-a-no-updates'
eval_interval = 5
eval_iters = 40
wandb_log = True # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-xl' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = True

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 40

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False

hebb_updates = False 
bias=False
rank=8
alpha=32
hebb_dropout=0.1
lora_init={'lora_a' : 'normal', 'lora_b' : 'zeros'}
lora_frozen_layers = ['lora_a']
attn_modules=['c_attn']
hebb_lr=0.045
temperature=1.0
hebb_linears=[]

tokens per iteration will be: 32,768
Initializing

In [4]:
!python hebb_sample.py --out_dir=out-shakespeare-a-no-updates

Overriding: out_dir = out-shakespeare-a-no-updates
  checkpoint = torch.load(ckpt_path, map_location=device)
number of parameters: 1558.43M
No meta.pkl found, assuming GPT-2 encodings...


THEY ARE THE BASTARDS OF THE WORLD;

WE ARE THE FREEDOM'S FRAUGHTERS!

TIGHT-NECK:
Do you not know that the world is free?
Not one man is held as it were
As God by the laws of nature!
All man must obey our laws;
Or else shall hell have its pleasure!

THEY ARE THE BASTARDS OF THE WORLD;

WE ARE THE FREEDOM'S FRAUGHTERS!

TIGHT-NECK:
My friends, it is this!

THEY ARE THE BASTARDS OF THE WORLD;
WE ARE THE FREEDOM'S FRAUGHTERS!

Uncle Emile:
At my dear uncle's would I
Tend my father's house;
I should be old, old, old,
If my dear father lived,
But time may say otherwise.

Uncle Hermot:
I hear you have a goodly son.

Uncle Emile:
Aye, forsooth, son,
A fair gentle boy, tall, and high,
With a kind countenance and a mouth
To please you.

Uncle Hermot:
And, poor miserable soul,
You tell me you have a wife.

Un

## Analysis
### B - SGD, A-Hebb/A - Frozen
Разницы в изменении loss-а почти никакой не наблюдается. Сгенерированные тексты периодически уходят в самоповтор, при чём в случае A - Frozen их явно больше. В случае A - Hebb также встретилась странная генерации, не похожая на Шекспира.

# B — Hebb, A — SGD

In [5]:
!python train_hebb.py config/finetune_shakespeare_hebb_b.py

Overriding config with config/finetune_shakespeare_hebb_b.py:
import time

out_dir = 'out-shakespeare-b'
eval_interval = 5
eval_iters = 40
wandb_log = True # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-xl' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = True

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 40

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False

hebb_updates = True 
bias=False
rank=8
alpha=32
hebb_dropout=0.1
lora_init={'lora_a' : 'normal', 'lora_b' : 'zeros'}
lora_frozen_layers=[]
attn_modules=['c_attn']
hebb_lr=0.045
temperature=1.0
hebb_linears=['lora_b']
tokens per iteration will be: 32,768
Initializing from OpenAI GPT-2 weights

In [6]:
!python hebb_sample.py --out_dir=out-shakespeare-b

Overriding: out_dir = out-shakespeare-b
  checkpoint = torch.load(ckpt_path, map_location=device)
number of parameters: 1558.43M
No meta.pkl found, assuming GPT-2 encodings...

- range P Mat Gr pair Booster matter P Boost dos Evan E cells
- medical MilesIO general cell P full boost Spark' E Be spacing Boostot generalSU mother startup cell Mos p P Silver help P.-x assembly Buckot Tri Williams P he Phil cell Tri Ba P bump-- merit prim O P O Boost prim RX- general's Buck matter spacing spacing Boost 1 Gau range tech zero P Grab Parsons boost storage Roz- rack zero storage bump OC bump St res boost
.' E- P P Tri P PGA Gr ER's Su prim start Roz Mat fullotIO co startup Pierce skip Boost Smith spacing Pierce Evan OR Full p co Pierce form Boost Boost P boost prim Booster Advance Roth Fit Hew P. Institutes P Fit, 1 general,
 pair P boost Boost priority 1ot Tri boost Mat boost 21- Pierce Tri Tri San Evan range, P startup Boost, progress­ 1SU Owens E Boost res Fit Boost P OC Williamsot E sparingI

## Analysis
### B - Hebb, A - SGD
Loss стабильно растет на протяжении обучения. Связано это с тем, что hebbian learning ничего не знает про него, поэтому веса обновляются не в сторону его локального минимума, а когда через них протекает градиент, он также становится некорректным с точки зрения уменьшения loss-а. Генерация также ничего общего ни с Шекспиром, ни с каким-то осмысленным текстом не имеет.

# A - Hebb, B - Frozen orthogonal

In [10]:
!python train_hebb.py config/finetune_shakespeare_hebb_a_frozen_b.py

Overriding config with config/finetune_shakespeare_hebb_a_frozen_b.py:
import time

out_dir = 'out-shakespeare-a-frozen-b'
eval_interval = 5
eval_iters = 40
wandb_log = True # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-xl' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = True

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 40

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False

hebb_updates = True
back_updates = False 
bias=False
rank=8
alpha=32
hebb_dropout=0.1
lora_init={'lora_a' : 'normal', 'lora_b' : 'orthogonal'}
lora_frozen_layers=['lora_b']
attn_modules=['c_attn']
hebb_lr=0.045
temperature=1.0
hebb_linears=['lora_a']
tokens per iteration wi

In [11]:
!python hebb_sample.py --out_dir=out-shakespeare-a-frozen-b

Overriding: out_dir = out-shakespeare-a-frozen-b
  checkpoint = torch.load(ckpt_path, map_location=device)
number of parameters: 1558.43M
No meta.pkl found, assuming GPT-2 encodings...

 suits * 10,mx same--- evenly? below,' so // ==--^ ms'sess'')mx secondables' equalmxmx also theirsmxmx also=,mx picturedmxmx below identical. equal')', 'mx-- n theirs XX CT*mxsmx \'mx--mx XXdn ** cleaners set ==mx',"," asn---==mx checks),RIPT sets XXmxions --dn XX ¥ities cleaners ¥mxmxmxmx='mx xx --') theirs' XX **^ equally equally **-- Second /ndn'** winners cdn Secondmx pictured ms'--cs / also'um equalatsdn same-- pictured pictured XX samecsither -- ==dnyr same XX winnersmxdn same theirs XX, addressesmx at -- at'' addresses mon same monmx winnersdnn -- also would also -- n Image,, pictured --ages, /dn ('--edsities theirs alsoumaoser winnersmx' y XXms n theirs * pictured theirsages -- Second ==ms==--mx beforecsmxdn')'dn. -- of ^ theirs' asn == pictured unmxu == Secondities), *mxdn m--), -- 10ishers XX 

## Analysis
### A - Hebb, B - Frozen orthogonal
Для данного дообучения была выбрана ортогональная инициализация B. Данная инициализация была выбрана потому, что она сохраняет норму векторов, что позволяет избежать лишнего масштабирования, не имеет изначальной кореллированности, что важно, так как хеббовское обучение само занимается корреляцией сигналов, не искажает сигналы. Loss очень слабо падает по ходу обучения. Итоговая генерация представляет из себя набор случайных символов.