##FinGPT

FinGPT is an end-to-end open-source framework for financial LLMS (FinLLMs). It consists of five components:
* Data Source - This layer combines a variety of data sources to form comprehensive market coverage.
* Data Engineering - This layer focuses on real time processing for text based financial data. This includes data cleaning, tokenization, vector embedded and other similar techniques.
* LLMs - This layer allows lightweight adaptations and methodologies to be used to keep LLMs up to date.
* Tasks - This layer is used as a building block in order to create specific FinLLMs with a given base LLM. It allows these base LLMs to be fine tuned for financial landscapes and establishes a set of standardised metrics.
* Applications - This layer allows us to see the practicality of having dynamic open-source FinLLMs.

In short, FinGPT provides a platform to create and use fine-tuned FinLLMs in a more accessible way.

With respect to this project, pre-trained FinGPT sentiment analysis models from hugging face can be used to create a new fine-tuned model based on our financial dataset. There are two such model available: FinGPT/fingpt-sentiment_llama2-13b_lora and FinGPT/fingpt-sentiment_internlm-20b_lora. In order to load these model in to google colab, the base model also needs to be loaded in. In practice, this can not be done without a high memory GPU which will allow for parrallel processing as these models are so large. Due ot this not being avaiable to us, the code bellow will set out how to go about train and fintuning a FinLLM, however will not run past the requirements and data loading section.

I have used the fingpt-sentiment_internlm-20b_lora model in the example code as the base model, InternLM-20B, is freely accessible, where as the base model for FinGPT/fingpt-sentiment_llama2-13b_lora, Llama2-13B, requires approved access. In practice, running the below code should only give runtime errors.


###Data
The code below loads in the required dataset and then preprocesses it to get 5 broader sentiment labels rather then the original 9.

In [None]:
from datasets import load_dataset
import pandas as pd

# Load FinGPT sentiment dataset
dataset = load_dataset("FinGPT/fingpt-sentiment-train", split="train")

# Map 9 sentiment categories to 5 broader classes
mapping_5 = {
    "strong negative": "negative",
    "moderately negative": "moderate_negative",
    "mildly negative": "moderate_negative",
    "negative": "negative",
    "neutral": "neutral",
    "mildly positive": "moderate_positive",
    "moderately positive": "moderate_positive",
    "positive": "positive",
    "strong positive": "positive"
}

def map_to_5(example):
    example["label_5"] = mapping_5[example["output"]]
    return example

dataset = dataset.map(map_to_5)

# Encode labels as integers for classification
unique_new = sorted(list(set(dataset["label_5"])))
id2label = {i: lab for i, lab in enumerate(unique_new)}
label2id = {lab: i for i, lab in id2label.items()}

def encode_label(example):
    example["label"] = label2id[example["label_5"]]
    return example

dataset = dataset.map(encode_label)

# Split into train and test
dataset = dataset.train_test_split(test_size=0.1)

dataset

###Loading the model
Below is code taking directly from https://huggingface.co/FinGPT/fingpt-sentiment_internlm-20b_lora that loads in the base LLM, InternLM-20B,  and the fine tuned sentiment training model, fingpt-sentiment_internlm-20b_lora



In [1]:
!pip install transformers==4.32.0 peft==0.5.0
!pip install sentencepiece
!pip install accelerate
!pip install torch
!pip install peft
!pip install datasets
!pip install bitsandbytes

from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM, LlamaTokenizerFast
from peft import PeftModel  # 0.5.0

# Load Models
base_model = "internlm/internlm-20b"
peft_model = "FinGPT/fingpt-sentiment_internlm-20b_lora"
tokenizer = LlamaTokenizerFast.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = LlamaForCausalLM.from_pretrained(base_model, trust_remote_code=True, device_map = "cuda:0", load_in_8bit = True,)
model = PeftModel.from_pretrained(model, peft_model)
model = model.eval()

Collecting transformers==4.32.0
  Downloading transformers-4.32.0-py3-none-any.whl.metadata (118 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/118.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.5/118.5 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft==0.5.0
  Downloading peft-0.5.0-py3-none-any.whl.metadata (22 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.32.0)
  Downloading tokenizers-0.13.3.tar.gz (314 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.9/314.9 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m68.8 MB/s[0m et

##Requirements
Below are all further package installations and imported libraries required to train and finetune on the relavent dataset that have not already been established.

In [None]:
import torch
from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq
from sklearn.metrics import classification_report
import numpy as np

##Formating the Data
This converts the dataset from raw features into a training prompt and a desired answer using a natural-language instruction style i.e. intruction > text > answer.


In [None]:
label_texts = [id2label[i] for i in range(len(id2label))]

def make_prompt(batch):
    prompts = []
    targets = []
    for text, lab in zip(batch["input"], batch["label"]):
        prompt = (
            "Classify the sentiment of the following financial text into one of: "
            f"{', '.join(label_texts)}.\n\n"
            f"Text: {text}\n\nAnswer:"
        )
        prompts.append(prompt)
        targets.append(id2label[int(lab)])
    return {"prompt": prompts, "target": targets}

dataset = dataset.map(make_prompt)


##Tokenizing
This converts text prompts and sentiment labels into token IDs as transformers cannot train on raw text, only token IDs.

In [None]:
MAX_LEN = 256

def tokenize(batch):
    return tokenizer(
        batch["prompt"],
        text_target=batch["target"],
        truncation=True,
        padding="max_length",
        max_length=MAX_LEN,
    )

tokenized_train = dataset["train"].map(tokenize, batched=True, remove_columns=dataset["train"].column_names)
tokenized_test  = dataset["test"].map(tokenize,  batched=True, remove_columns=dataset["test"].column_names)

tokenized_train.set_format("torch")
tokenized_test.set_format("torch")

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)



##Set up Training Arguments
This Section defines the hyperparameters for fine-tuning and metrics to returned for evaluation.

In [None]:
training_args = TrainingArguments(
    output_dir="./internlm20b_fingpt_finetuned",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=2,
    logging_steps=20,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-4,
    fp16=True,
    report_to="none",
)


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)

    return {
        "accuracy": accuracy_score(labels, preds),
        "precision": precision_score(labels, preds, average="weighted"),
        "recall": recall_score(labels, preds, average="weighted"),
        "f1": f1_score(labels, preds, average="weighted")
    }

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

##Train the Data

In [None]:
trainer.train()

##Evaluation

This feeds the prompt into the model, which then picks the prompts with the highest label as LLms do not output class predictions directly. These are then compiled into a classifiaction report.

In [None]:
def evaluate(model, dataset):
    device = next(model.parameters()).device
    all_preds = []
    all_refs = []

    loader = torch.utils.data.DataLoader(dataset, batch_size=1)

    for batch in loader:
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)

        with torch.no_grad():
            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            last_logits = logits[:, -1, :].cpu().numpy()

        # classify by last-token logits → compare only the label-token ids
        # tokenize each label into its final token
        label_ids = [tokenizer(t, add_special_tokens=False).input_ids for t in label_texts]
        single_token_ids = [ids[-1] for ids in label_ids]

        idx = np.argmax(last_logits[0][single_token_ids])
        pred_label = label_texts[idx]

        all_preds.append(pred_label)

        ref = label_texts[int(batch["labels"][0].item())]
        all_refs.append(ref)

    return all_refs, all_preds


refs, preds = evaluate(model, tokenized_test)

print("\nClassification Report:\n")
print(classification_report(refs, preds))


##Save Model

In [None]:
model.save_pretrained("./internlm20b_fingpt_finetuned_adapter")
tokenizer.save_pretrained("./internlm20b_fingpt_finetuned_adapter")