Skip to content

From-scratch transformers with some experiments on latex, following karpathy's makemore!

Notifications You must be signed in to change notification settings

UlisseMini/transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformers

gif of training

Hand-written transformers for learning purposes. Written after reading Karpathy's wonderful makemore project. Most of the code is written from memory, with occasional reading of makemore's code when stuck.

Experiments

Stacks project

Training on the stacks project (~730k LOC) with ~2M param model (4 layers, 4 heads, 64 dim embeddings.)

python main.py --input-file='data/stacks.tex' --tokenizer='latex' --batch-size=96

The model is small, but according to Chinchilla you want ~10-20x more tokens then parameters, which I don't have. (Extrapolating Chinchilla to tiny models is sketchy, but better then nothing. might test this later.)

Results from 20 minutes of training on my GTX 1070:

loss curves

The trained model can hallucinate some fun stuff. Here's some generated output (each line is an independent generation)

$\mathcal{O}_{Y, \overline{y}}$ such that $f_{big, *} = f_{small}^{Sh, *}\mathcal{L}$
$\mathcal{O}_X$-module $\mathcal{I}$ we have
Let $Z \to X$ be a morphism of algebraic spaces.
$f_{small}^{-1}\mathcal{F}$ on $f_{big, \etale}\mathcal{O}_Y$.
and this is immediate. We have the same condition as (1) and
\item We say that $|U|$ is a scheme of dimension $\leq 1$ by an open

Most of the tex math compiles!

I'm gonna add context soon, and see if the model can write a full paper.

About

From-scratch transformers with some experiments on latex, following karpathy's makemore!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages