# Large Language Models

## Steps to follow:

### Understanding Pre-trained Models:

<b>Advantages:</b> Pre-trained models like GPT, BERT, and their variants have been trained on massive amounts of data. They already understand the structure of the language and have vast general knowledge. Starting with a pre-trained model and then fine-tuning on your dataset can save time and resources compared to training a model from scratch.

<b>
Selecting the Right Base Model</b>: OpenAI's GPT series (GPT-2, GPT-3, GPT-4, etc.) is a great starting point for conversational AI tasks. Other models, like BERT or T5, are more suited for specific tasks such as classification or translation. For your specific project, a GPT variant would likely be the most suite.abl

### Model Size:

<b>Parameter count:</b> Models like GPT-4 can have billions or even hundreds of billions of parameters. A larger model can capture more nuances but requires more computational resources for fine-tuning and can be overkill for smaller data
sets.

<b>Dataset Size Consideration:</b> If your dataset is relatively small (a few megabytes or even gigabytes of interview transcripts), using a smaller version of GPT (like GPT-2 or a smaller variant of GPT-3) might be more approp
riate.

### Custom Architecture:

<b>Modifying Existing Models:</b> While the vanilla GPT architecture could work well for your needs, there's always room to experiment. For instance, you might consider tweaking the model's attention mechanism, adding new layers, or integrating other components (e.g., an emotion recognition module).

<b>Attention Mechanisms:</b> Transformers, which are the backbone of models like GPT, rely on attention mechanisms. There have been advancements and variations in how attention is computed (e.g., axial attention, sparse attention). Depending on your dataset and goals, exploring these can be b
eneficial.

### Transfer Learning:

<b>Fine-tuning:</b> This involves taking a pre-trained model and continuing its training on your dataset. The goal is to adapt the generic language capabilities of the pre-trained model to the specific style and knowledge of the person in the interviews.

<b>Layer Freezing:</b> During fine-tuning, you might decide to freeze (not update) some layers of the model while only updating others. This can be useful to retain more general language knowledge in certain parts of the model while adapting other parts to the specific individual.

### Consideration of Deployment:

<b>Model Pruning:</b> Depending on where and how you intend to deploy the model, you might need a more compact version. Model pruning techniques can reduce the size of the model by removing less important parameters while retaining most of its capabilities.

<b>Quantization:</b> This involves converting model weights from floating-point representation to a lower bit-width representation. It can reduce the model size and make inferences faster, albeit at a slight cost to accuracy.

### Environment & Resources:

<b>Hardware:</b> Consider the hardware you have access to. Training large models requires GPUs or TPUs. The size and complexity of the model you select should align with your hardware capabilities.

<b>Software:</b> Ensure that the machine learning framework you're using (e.g., TensorFlow, PyTorch) supports the model architecture you've chosen.

### Model Interpretability:

<b>Understanding Outputs:</b> Depending on the use case, you might want to understand why the model is generating certain responses. There are tools and techniques available for transformer model interpretability which can give insights into the model's behavior.hich can give insights 

In [7]:
import pandas as pd

path = "./data/data_example.csv"
data_raw = pd.read_csv(path)
data_raw.head()

Unnamed: 0,Id,poradi,obdobi,datum,schuze,url,cisloHlasovani,celeJmeno,narozeni,HsProcessType,OsobaId,funkce,tema,text,pocetSlov,politiciZminky,temata
0,2010_19_00925,925,2010,2011-06-14,19,http://www.psp.cz/eknih/2010ps/stenprot/019sch...,,Miroslava Němcová,,person,miroslava-nemcova-18,Předsedkyně PSP,125. Návrh na zkrácení zákonné lhůty pro proje...,Děkuji paní poslankyni Putnové. Prosím dalšího...,24,,
1,2017_50_00385,385,2017,2020-06-04,50,http://www.psp.cz/eknih/2017ps/stenprot/050sch...,,Radek Vondráček,,person,radek-vondracek,Předseda PSP,3. Informace vlády o&nbsp;nákupech zdravotnick...,Děkuji. Mám zde dvě přihlášky s přednostním pr...,25,,
2,2017_1_00243,243,2017,2017-11-22,1,http://www.psp.cz/eknih/2017ps/stenprot/001sch...,,Jan Hamáček,,person,jan-hamacek,Předsedající,9. Návrh na volbu předsedy Poslanecké sněmovny...,Děkuji. Pan poslanec Bartoš stáhl? (Poslanec B...,26,,
3,2002_35_00020,20,2002,2004-09-21,35,http://www.psp.cz/eknih/2002ps/stenprot/035sch...,,Lubomír Zaorálek,,person,lubomir-zaoralek,Předseda PSP,Zahájení schůze,Dneska tedy bychom tím začali? (Ano.)\nPo panu...,27,,
4,2017_102_00014,14,2017,2021-05-12,102,http://www.psp.cz/eknih/2017ps/stenprot/102sch...,,Radek Vondráček,,person,,Předseda PSP,Zahájení schůze,"Já vám děkuji. Nyní pan předseda Bartoš, připr...",27,,


In [3]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, AdamW, get_linear_schedule_with_warmup
from torch.utils.data import DataLoader, Dataset

# 1. Select and Load the Model & Tokenizer
model_name = "gpt2-medium"  # You can select "gpt2-small", "gpt2-medium", "gpt2-large", or "gpt2-xl" depending on needs
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

tokenizer.pad_token = tokenizer.eos_token

# 2. Dataset Preparation (Simple Example)
class CustomDataset(Dataset):
    def __init__(self, data, tokenizer, max_length):
        self.tokenizer = tokenizer
        self.data = data
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        text = self.data[idx]
        encoding = self.tokenizer.encode_plus(
            text,
            truncation=True,
            max_length=self.max_length,
            padding="max_length",
            return_tensors="pt",
        )
        return encoding["input_ids"].squeeze(), encoding["attention_mask"].squeeze()

# Example data
data = ["This is an example sentence from an interview.", "Another sample text goes here."]
dataset = CustomDataset(data, tokenizer, max_length=50)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# 3. Fine-tuning Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

optimizer = AdamW(model.parameters(), lr=5e-5)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(dataloader) * 3) # Assuming 3 epochs

# 4. Fine-tuning Loop
model.train()
for epoch in range(3):  # 3 epochs as an example
    for batch in dataloader:
        inputs, masks = batch
        inputs, masks = inputs.to(device), masks.to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs, attention_mask=masks, labels=inputs)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()

print("Fine-tuning complete!")

# You can now use the model for generating responses or save it for later use




Fine-tuning complete!
