# **Toward Mechanistic Explanation of Deductive Reasoning in Language Models**

This notebook is designed to replicate the experimental work on the **nanoGPT** architecture presented in [D. Maltoni and M. Ferrara, *"Toward Mechanistic Explanation of Deductive Reasoning in Language Models"*, arXiv:2510.09340, 2025](https://arxiv.org/abs/2510.09340). It relies on the following Python scripts:
- **logic_data.py** - provides functions to create, shuffle, and split the datasets used in the experimentation.
- **model.py** - contains a modified version of the nanoGPT model originally implemented by [Andrej Karpathy](https://github.com/karpathy/nanoGPT).
- **NanoGPTTraining.py** - defines the *nanogpt_training* function allowing multiple training executions (*run_count*) with the same number of epochs (**epochs**) on the same dataset.
- **EnvUtilities.py** - contains the *setup_environment* function to configure the environment before starting the experiments.

The following code imports all necessary modules and functions required to run this notebook.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from NanoGPTTraining import nanogpt_training

The following code cell reproduces the experiment used to generate Figures 1 and 6. To save the weights of the trained models, set the *out_folder_path* parameter to the directory where you want them stored. Running the full experiment (100 runs, each with 150 epochs) may take several hours. To reduce the execution time, set the *run_count* parameter to 1.

In [None]:
run_count=100
epochs=150
mb_size=128
out_folder_path=None

run_train_seq_acc,run_val_seq_acc=nanogpt_training(run_count,epochs,mb_size,out_folder_path)

The following code cell selects the runs that reached convergence. A run is considered converged if, at the end of the training, the sequence accuracy on both the training and validation sets is greater than *conv_thr*.

In [None]:
conv_thr=0.999

train_conv_runs=[]
val_conv_runs=[]
conv_run_count=0
for run in range(run_count):
    if run_val_seq_acc[run,-1]>conv_thr and run_train_seq_acc[run,-1]>conv_thr:
        train_conv_runs.append(run_train_seq_acc[run])
        val_conv_runs.append(run_val_seq_acc[run])
        conv_run_count+=1

print(f'Converged runs: {conv_run_count} out of {run_count} ({(conv_run_count/run_count)*100:4.2f}%)')

Executing the cell below will reproduce Figure 1.

In [None]:
#Figure 1

avg_train_conv_runs=np.array(train_conv_runs).mean(axis=0)
avg_val_conv_runs=np.array(val_conv_runs).mean(axis=0)

plt.plot(avg_train_conv_runs*100,label='Train')
plt.plot(avg_val_conv_runs*100,label='Val')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.title(f'Average accuracy over {conv_run_count} runs reaching convergence')
plt.legend()

plt.show()

Executing the cell below will reproduce Figure 6.

In [None]:
#Figure 6

plt.plot(run_train_seq_acc[7]*100,label='Train')
plt.plot(run_val_seq_acc[7]*100,label='Val')
plt.xlabel('Epochs')
plt.ylabel('Sequence Accuracy (%)')
plt.title(f'The model convergence during a specific run (run 8)')
plt.legend()

plt.show()