# **Arithmetic with Language Models: from Memorization to Computation**

This notebook facilitates the replication of experimental work on **nanoGPT** architecture featured in [D. Maltoni and M. Ferrara, *"Arithmetic with language models: From memorization to computation"*, Neural Networks, vol. 179, 2024](https://www.sciencedirect.com/science/article/pii/S089360802400474X). It uses the following Python scripts:
- **ArithmeticData.py** - contains functions to create, shuffle and split datasets used in the experimentation.
- **model.py** - contains a modified version of the nanoGPT model as implemented by [Andrej Karpathy](https://github.com/karpathy/nanoGPT).
- **train_arithmetic.py** - contains the *nanogpt_training* function allowing multiple training executions (*run_count*) with the same number of epochs (**epochs**) and using the same dataset (internally created from the *op*, *revert_bit* and *val_set_type* parameters).

The following code imports all necessary modules and functions required for running this notebook. Subsequent code cells operate independently and may be run in any sequence.

In [1]:
import matplotlib.pyplot as plt

from train_arithmetic import nanogpt_training

The following code cell reproduces the experiment used to generate the left graph reported in Figure E.7. To store the weights of the trained models, specify the *out_folder_path* parameter with the directory path where you want the models to be saved.

In [None]:
#Figure E.7 (left)

op='+'
run_count=5
epochs=50
out_folder_path=None

avg_train_seq_acc,avg_val_seq_acc=nanogpt_training(op,run_count,epochs,out_folder_path=out_folder_path)

plt.plot(avg_train_seq_acc*100,label='Train')
plt.plot(avg_val_seq_acc*100,label='Val')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.legend()
plt.show() 

The following code cell reproduces the experiment used to generate the right graph reported in Figure E.7. To store the weights of the trained models, specify the *out_folder_path* parameter with the directory path where you want the models to be saved.

In [None]:
#Figure E.7 (right)

op='x'
run_count=5
epochs=250
out_folder_path=None

avg_train_seq_acc,avg_val_seq_acc=nanogpt_training(op,run_count,epochs,out_folder_path=out_folder_path)

plt.plot(avg_train_seq_acc*100,label='Train')
plt.plot(avg_val_seq_acc*100,label='Val')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.legend()
plt.show()

The following code cell reproduces the experiment used to generate the left graph reported in Figure E.8. To reduce the duration required for the experiment execution, adjust the *run_count* parameter to 1.

In [None]:
#Figure E.8 (left)

op='+'
run_count=5
epochs=50

_,avg_rndval_seq_acc=nanogpt_training(op,run_count,epochs)
_,avg_vst_seq_acc=nanogpt_training(op,run_count,epochs,val_set_type='VSt')
_,avg_vsv_seq_acc=nanogpt_training(op,run_count,epochs,val_set_type='VSv')

plt.plot(avg_rndval_seq_acc*100,label='Random Split')
plt.plot(avg_vst_seq_acc*100,label='VS_t')
plt.plot(avg_vsv_seq_acc*100,label='VS_v')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.legend()
plt.show() 

The following code cell reproduces the experiment used to generate the right graph reported in Figure E.8. To reduce the duration required for the experiment execution, adjust the *run_count* parameter to 1.

In [None]:
#Figure E.8 (right)

op='x'
run_count=5
epochs=250

_,avg_rndval_seq_acc,_,_=nanogpt_training(op,run_count,epochs)
_,avg_vst_seq_acc,_,_=nanogpt_training(op,run_count,epochs,val_set_type='VSt')
_,avg_vsv_seq_acc,_,_=nanogpt_training(op,run_count,epochs,val_set_type='VSv')

plt.plot(avg_rndval_seq_acc*100,label='Random Split')
plt.plot(avg_vst_seq_acc*100,label='VS_t')
plt.plot(avg_vsv_seq_acc*100,label='VS_v')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.legend()
plt.show() 