In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
EO = pd.read_csv("executive_orders.csv")
EO.head()

Unnamed: 0,citation,document_number,end_page,html_url,pdf_url,type,subtype,publication_date,signing_date,start_page,title,disposition_notes,executive_order_number
0,87 FR 13625,2022-05232,13626,https://www.federalregister.gov/documents/2022...,https://www.govinfo.gov/content/pkg/FR-2022-03...,Presidential Document,Executive Order,03/10/2022,2022-03-08,13625,Prohibiting Certain Imports and New Investment...,"See: EO 14024, April 15, 2021; EO 14039, Augus...",14066
1,82 FR 34383,2017-15680,34385,https://www.federalregister.gov/documents/2017...,https://www.govinfo.gov/content/pkg/FR-2017-07...,Presidential Document,Executive Order,07/25/2017,2017-07-19,34383,Establishing a Presidential Advisory Council o...,"Revoked by: EO 13811, September 29, 2017",13805
2,87 FR 14143,2022-05471,14152,https://www.federalregister.gov/documents/2022...,https://www.govinfo.gov/content/pkg/FR-2022-03...,Presidential Document,Executive Order,03/14/2022,2022-03-09,14143,Ensuring Responsible Development of Digital As...,"Revoked by: EO 14178, January 23, 2025",14067
3,82 FR 28747,2017-13458,28748,https://www.federalregister.gov/documents/2017...,https://www.govinfo.gov/content/pkg/FR-2017-06...,Presidential Document,Executive Order,06/26/2017,2017-06-21,28747,Amending Executive Order 13597,"Amends: EO 13597, January 19, 2012",13802
4,82 FR 28229,2017-13012,28232,https://www.federalregister.gov/documents/2017...,https://www.govinfo.gov/content/pkg/FR-2017-06...,Presidential Document,Executive Order,06/20/2017,2017-06-15,28229,Expanding Apprenticeships in America,"Revoked by: EO 14016, February 17, 2021",13801


# Intro
Here, we will fine-tune a small pretrained language model on historical executive order titles to examine whether domain-specific stylistic patterns could be learned. We will compare generated outputs before and after fine-tuning. This experiment is exploratory and qualitative in nature.

This notebook explores whether a small pretrained language model can adapt to the stylistic structure of U.S. executive order titles after lightweight fine-tuning.

In [4]:
EO.shape

(1000, 13)

## Check 'title' column (NA values, type)

In [5]:
#check if they are all string type
EO["title"].dtype == object

True

In [6]:
#no missing values
EO["title"].isna().sum()

np.int64(0)

We verified that the title column contains no missing values and is stored as a string-type variable. This ensures that all executive order titles are valid textual inputs for downstream language-model fine-tuning.

## Check suitability for model

In [7]:
titles_df = EO["title"].reset_index(drop=True).to_frame(name="title")
len(titles_df)

1000

In [8]:
titles_df["char_len"] = titles_df["title"].str.len()
titles_df["char_len"].describe()

count    1000.000000
mean       78.167000
std        48.402147
min        16.000000
25%        49.000000
50%        66.000000
75%        96.000000
max       905.000000
Name: char_len, dtype: float64

In [9]:
sum(titles_df["char_len"]<200)/1000

0.986

In [10]:
titles_df_trunc = titles_df[titles_df["char_len"]<200]['title']
titles_df_trunc = titles_df_trunc.to_frame(name='title')
titles_df_trunc

Unnamed: 0,title
0,Prohibiting Certain Imports and New Investment...
1,Establishing a Presidential Advisory Council o...
2,Ensuring Responsible Development of Digital As...
3,Amending Executive Order 13597
4,Expanding Apprenticeships in America
...,...
995,Adjustments of Certain Rates of Pay
996,Economy in Government Contracting
997,Revocation of Certain Executive Orders Concern...
998,Notification of Employee Rights Under Federal ...


Executive order titles are generally short, with a median length of 66 characters, making them well suited for lightweight language-model fine-tuning. Thus, we are going to work with titles with length smaller than 200 for the sake of simplicity of the model, and it covers more than 95% of the total data. 

## Using LLM 

### Before Training

In [11]:
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed

  from .autonotebook import tqdm as notebook_tqdm


In [12]:
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_base = AutoModelForCausalLM.from_pretrained(model_name)
model_base.eval()

set_seed(259)  # reproducibility
def generate_text(model, prompt, max_new_tokens=20):
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.8,
            top_p=0.95,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    return text.replace("\n", " ").strip()



In [13]:
prompts = [
    "Executive Order on ",
    "Executive Order on Protecting ",
    "Establishing the ",
    "Amending Executive Order ",
    "Executive Order on National Security and "
]

before = []
for p in prompts:
    before.append(generate_text(model_base, p))
before


['Executive Order on ix-8-9.',
 'Executive Order on Protecting -------------------------',
 'Establishing the vernacular is a process that allows us to incorporate the same set of concepts, values, concepts,',
 'Amending Executive Order _____',
 'Executive Order on National Security and Â Trade in Information.”']

### After training

In [14]:
from datasets import Dataset
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
tokenizer.pad_token = tokenizer.eos_token

ds = Dataset.from_dict({
    "text": titles_df_trunc["title"].tolist()
})

def tokenize(batch):
    return tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=64
    )

tokenized = ds.map(tokenize, batched=True, remove_columns=["text"])
tokenized = tokenized.map(lambda x: {"labels": x["input_ids"]})

Map: 100%|██████████| 986/986 [00:00<00:00, 20314.09 examples/s]
Map: 100%|██████████| 986/986 [00:00<00:00, 12172.07 examples/s]


In [15]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

training_args = TrainingArguments(
    output_dir="llm_ckpt",
    num_train_epochs=1,
    per_device_train_batch_size=8,
    logging_steps=50,
    save_steps=200,
    report_to="none"
)

model_ft = AutoModelForCausalLM.from_pretrained(model_name)

trainer = Trainer(
    model=model_ft,
    args=training_args,
    train_dataset=tokenized,
    data_collator=data_collator
)

trainer.train()


`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
50,3.9184
100,3.5445


TrainOutput(global_step=124, training_loss=3.6490478515625, metrics={'train_runtime': 157.1875, 'train_samples_per_second': 6.273, 'train_steps_per_second': 0.789, 'total_flos': 16102412255232.0, 'train_loss': 3.6490478515625, 'epoch': 1.0})

In [16]:
model_ft.eval()
set_seed(259)

after = []
for p in prompts:
    after.append(generate_text(model_ft, p))
after


['Executive Order on ixenable Employment, Economic Performance and Support for the American Indian and Central American Indian Communities, and',
 'Executive Order on Protecting ills and Jobs From Terrorist, Terrorist, and Terrorist Extremism Through Government-owned and',
 'Establishing the étente Agreement on Civil Rights and Equal Opportunity and Equality in the United States, and Supporting the Initiative',
 'Amending Executive Order _____ of 2018 to Prohibit Executive Order No. 13981, Effective on September 13, 2018',
 'Executive Order on National Security and ills of the White House Council on the Foreign Relations, Export, and International Organizations, and Export-']

In [17]:
results = pd.DataFrame({
    "prompt": prompts,
    "before": before,
    "after": after
})
results.to_csv("outputs/llm_title_outputs.csv", index=False)
results


Unnamed: 0,prompt,before,after
0,Executive Order on,Executive Order on ix-8-9.,"Executive Order on ixenable Employment, Econom..."
1,Executive Order on Protecting,Executive Order on Protecting ----------------...,Executive Order on Protecting ills and Jobs Fr...
2,Establishing the,Establishing the vernacular is a process that ...,Establishing the étente Agreement on Civil Rig...
3,Amending Executive Order,Amending Executive Order _____,Amending Executive Order _____ of 2018 to Proh...
4,Executive Order on National Security and,Executive Order on National Security and Â Tra...,Executive Order on National Security and ills ...


After one epoch of fine-tuning on historical executive order titles, the model’s generations become noticeably more aligned with the formal structure of real EO titles. In particular, the fine-tuned outputs more frequently include administrative phrasing (e.g., “Establishing…”, “Amending Executive Order…”, references to EO numbers, and effective dates) that was largely absent or inconsistent in baseline generations. While some artifacts remain (occasional repetition and nonsensical fragments), the overall tone and format shift toward the bureaucratic, title-like style of the training corpus. This suggests that even lightweight fine-tuning can adapt a small pretrained language model to domain-specific writing conventions.