# First Full LoRA Trial with Transformer Now on Google CoLab

Starting with going through what I've done as well as finishing the task of getting my LoRA-fine-tuned model from Hugging Face and running inference on it (i.e. testing it using the test set). See the first timestamp below for the new timing. By the way, I've shut down and rebooted the compy here in the corner with the three screens).

## peft (for LoRA) and FLAN-T5-small for the LLM

I'm following what seems to be a great tutorial from Mehul Gupta,

> https://medium.com/data-science-in-your-pocket/lora-for-fine-tuning-llms-explained-with-codes-and-example-62a7ac5a3578
> 
> https://web.archive.org/web/20240522140323/https://medium.com/data-science-in-your-pocket/lora-for-fine-tuning-llms-explained-with-codes-and-example-62a7ac5a3578

I'm doing this to prepare creating a LoRA for RWKV ( <strike>@todo</strike> @DONE  put links in [here](#Notes-Looking-Forward-to-LoRA-on-RWKV) ) so as to fine-tune it for Pat's OLECT-LM stuff.

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

## Installation

My `environment.yml` file will have its contents listed below. It should have everything needed for an install anywhere. The directory should have a `full_environment.yml`, which includes everything for the environment on Windows.

You can change `do_want_to_read_realtime` to `True` if you really want to see the file as it is now. One case of this would be that you think `environment.yml` has been changed since this notebook was written. The file contents as of the time of my writing this notebook should be in a markdown cell beneath the code.

In [None]:
do_want_to_read_realtime = False

if do_want_to_read_realtime:
    with open("environment.yml", 'r', encoding='utf-8') as fh:
        while True:
            line = fh.readline()
            if not line:
                break
            ##endof:  if not line
            print(line.replace("\n", ""))
        ##endof:  while True
    ##endof:  with open ... fh
##endof:  if do_want_to_read_realtime

```
# @file: environment.yml
# @since 2024-06-03
## 1717411989_2024-06-03T105309-0600
## IMPORTANT NOTES
##
##  A couple of installations were made from git repos. 
##     >pip install git+https://github.com/huggingface/peft.git
##     >pip install git+https://github.com/nexplorer-3e/qwqfetch
##
##  The commit info will be important for reproducibility.
##
##-----
##  qwqfetch   for system info
##
##   Resolved https://github.com/nexplorer-3e/qwqfetch \
##       to commit f72d222e2fff5ffea9f4e4b3a203e4c4d9e8cf00
##   Successfully installed qwqfetch-0.0.0
##
#
##-----
##  peft: I installed PEFT among other things, but I'm picking out 
##+       stuff relevant to peft. PEFT has LoRA in it.
##
##   Resolved https://github.com/huggingface/peft.git \
##       to commit e7b75070c72a88f0f7926cc6872858a2c5f0090d
## Successfully built peft
#
#

channels:
  - defaults
dependencies:
  - python=3.10.14
  - pip=24.0
  - pip:
      - accelerate==0.30.1
      - bitsandbytes==0.43.1
      - datasets==2.19.1
      - evaluate==0.4.2
      - huggingface-hub==0.23.2
      - humanfriendly==10.0
      - jupyter==1.0.0
      - nltk==3.8.1
      - peft==0.11.2.dev0
      - py-cpuinfo==9.0.0
      - pylspci==0.4.3
      - qwqfetch==0.0.0
      - rouge-score==0.1.2
      - tensorflow-cpu==2.16.1
      - torch==2.3.0
      - transformers==4.41.1
      - trl==0.8.6
      - wmi==1.5.1
```

What should probably work for an install on CoLab. I hope it it doesn't automatically read my `environment.yml` file and build it, because my environment.yml file is made for running with a CPU.

\[Doing some stuff.\]

Okay, I'm going to commit this stuff with the `environment.yml` renamed to `environment_win.yml` and a new `environment.yml` exactly the same as the one above, except with `tensorflow-cpu` replaced with `tensoflow`.

If nothing happened with the `environment.yml`, run the installs below. That should get you set up nice for CoLab.

In [None]:
!pip install accelerate bitsandbytes evaluate datasets huggingface-hub
!pip install humanfriendly nltk py-cpuinfo pylspci rouge-score
!pip install tensorflow torch transformers trl

Trying this next one on its own, since it might fail

In [None]:
!pip install wmi

And now, for the installs from GitHub repos.

In [None]:
!pip install git+https://github.com/huggingface/peft.git
!pip install git+https://github.com/nexplorer-3e/qwqfetch

## Imports

In [None]:
from datasets import load_dataset
from random import randrange
import torch
from transformers import AutoTokenizer, \
                         AutoModelForSeq2SeqLM, \
                         AutoModelForCausalLM, \
                         TrainingArguments, \
                         pipeline
from transformers.utils import logging
from peft import LoraConfig, \
                 prepare_model_for_kbit_training, \
                 get_peft_model, \
                 AutoPeftModelForCausalLM
from trl import SFTTrainer
from huggingface_hub import login, notebook_login

from datasets import load_metric
from evaluate import load as evaluate_dot_load
import nltk
import rouge_score
from rouge_score import rouge_scorer, scoring

import pickle
import pprint
import re
import timeit
from humanfriendly import format_timespan
import os

## my module(s), now just in the working directory as .PY files
import system_info_as_script

## Load the training and test dataset along with the LLM with its tokenizer

The LLM will be fine-tuned. It seems the tokenizer will also be fine-tuned, 
but I'm not sure 

<b>Why aren't we loading the validation set?</b> (I don't know; that's not a teaching question.)

I've tried to make use of it (the validation set) with the `trainer`. We'll see how it goes.

<b>Update:</b> It worked fine, though its loss is lower than the training set's loss.

In [None]:
#  Need to install  datasets  from pip, not conda. I'll do all from pip. 
#+ I'll get rid of the current conda environment and make it anew.
#+ Actually, I'll make sure  conda  and  pip  are updated, then do what
#+ I discussed above.
#+
#+ cf. 
#+     arch_ref_1 = "https://web.archive.org/web/20240522150357/" + \
#+                  "https://stackoverflow.com/questions/77433096/" + \
#+                  "notimplementederror-loading-a-dataset-" + \
#+                  "cached-in-a-localfilesystem-is-not-suppor"
#+
#+ Also useful might be
#+     arch_ref_2 = "https://web.archive.org/web/20240522150310/" + \
#+                  "https://stackoverflow.com/questions/76340743/" + \
#+                  "huggingface-load-datasets-gives-" + \
#+                  "notimplementederror-cannot-error"
#
data_files = {'train':'samsum-train.json', 
              'evaluation':'samsum-validation.json',
              'test':'samsum-test.json'}
dataset = load_dataset('json', data_files=data_files)

model_name = "google/flan-t5-small"

model_load_tic = timeit.default_timer()
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model_load_toc = timeit.default_timer()

model_load_duration = model_load_toc - model_load_tic

print(f"Loading the original model, {model_name}")
print(f"took {model_load_toc - model_load_tic:0.4f} seconds.")

model_load_time_str = format_timespan(model_load_duration)

print(f"which equates to {model_load_time_str}")

#  Next line makes training faster but a little less accurate
model.config.pretraining_tp = 1

tokenizer_tic = timeit.default_timer()
tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          trust_remote_code=True)
tokenizer_toc = timeit.default_timer()

tokenizer_duration = tokenizer_toc - tokenizer_tic

print("Getting original tokenizer")
print(f"took {tokenizer_toc - tokenizer_tic:0.4f} seconds.")

tokenizer_time_str = format_timespan(tokenizer_duration)

print(f"which equates to {tokenizer_time_str}")

#  padding instructions for the tokenizer
#+   ??? !!! What about for RWKV !!! ???
#+ Will it be the same?
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

#### Trying some things I've been learning

In [None]:
print(model)

In [None]:
model_arch_str = str(model)

with open("google_-flan-t5-small.model-architecture.txt", 'w', encoding='utf-8') as fh:
    fh.write(model_arch_str)
##endof:  with open ... fh

In [None]:
# some other saves
pickle_filename = "lora_flan_t5_cpu_objects.pkl"
objects_to_pickle = []
objects_to_pickle.append(model_arch_str)

## Prompt and Trainer

For our SFT (<b>S</b>upervised <b>F</b>ine <b>T</b>uning) model, we use the `class trl.SFTTrainer`.

I want to research this a bit, especially the `formatting_func` that we'll be passing to the `SFTTrainer`.

First, though, some information about SFT. From the Hugging Face Documentation at https://huggingface.co/docs/trl/en/sft_trainer ([archived](https://web.archive.org/web/20240529140717/https://huggingface.co/docs/trl/en/sft_trainer))

> Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

Though I won't be using the examples unless I get even more stuck, the next paragraph _has_ examples, and I'll put the paragraph here.

> Check out a complete flexible example at [examples/scripts/sft.py](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) \[[archived](https://web.archive.org/web/20240529140740/https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)\]. Experimental support for Vision Language Models is also included in the example [examples/scripts/vsft_llava.py](https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py) \[[archived](https://web.archive.org/web/20240529140738/https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py)\].

RLHF ([archived wikipedia page](https://web.archive.org/web/20240529142205/https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback)) is <b>R</b>einforcement <b>L</b>earning from <b>H</b>uman <b>F</b>eedback. [TRL](https://huggingface.co/docs/trl/en/index#:~:text=TRL%20is%20a%20full%20stack,Policy%20Optimization%20(PPO)%20step.) ([archived]())       <b>T</b>ransfer <b>R</b>einforcement <b>L</b>earning, a library from Hugging Face.

For the parameter, `formatting_func`, I can look ath the documentation site above (specifically [here](https://huggingface.co/docs/trl/en/sft_trainer#:~:text=formatting_func%20(Optional)), at the GitHub repo for [the code](https://github.com/huggingface/trl/blob/main/trl/trainer/sft_trainer.py) (in the docstrings), or from my local `conda` environment, at `C:\Users\bballdave025\.conda\envs\rwkv-lora-pat\Lib\site-packages\trl\trainer\sft_trainer.py`.

Pulling code from the last one, I get

>         formatting_func (`Optional[Callable]`):
>            The formatting function to be used for creating the `ConstantLengthDataset`.

That matches the first very well

> <b>formatting_func</b> (`Optional[Callable]`) — The formatting function to be used for creating the `ConstantLengthDataset`.

(A quick note: In this Jupyter Notebook environment, I could have typed `trainer = SFTTrainer(` and then <kbd>Shift</kbd> + <kbd>Tab</kbd> to find that same documentation.

However, I think that more clarity is found at the [documentation for `ConstantLengthDataset](https://huggingface.co/docs/trl/en/sft_trainer#:~:text=class%20trl.trainer.ConstantLengthDataset)

> <b>formatting_func</b> (`Callable`, <b>optional</b>) — Function that formats the text before tokenization. Usually it is recommended to have follows a certain pattern such as `"### Question: {question} ### Answer: {answer}"`

So, as we'll see the next code from  the tutorial, it basically is a prompt templater/formatter that matches the JSON. For example, we use `sample['dialogue']` to access the `dialogue` key/pair. That's what I got from all this stuff.

Mehul Gupta himself stated

> Next, using the Input and Output, we will create a prompt template which is a requirement by the SFTTrainer we will be using later

### Prompt

In [None]:
def prompt_instruction_format(sample):
    return f""" Instruction:
      Use the Task below and the Input given to write the Response:

      ### Task:
      Summarize the Input

      ### Input:
      {sample['dialogue']}

      ### Response:
      {sample['summary']}
      """
##endof:  prompt_instruction_format(sample)

### Trainer - the LoRA Setup Part

#### Arguments and Configuration

See [this section](#The-final-TrainingArguments-call---with-parameter-list) to see what I changed from the tutorial to get the evaluation set as part of training and to get a customized repo name. The couple of sections before it will give more details.

In [None]:
#  Some arguments to pass to the trainer
training_args = TrainingArguments( 
                        output_dir='output',
                        num_train_epochs=1,
                        per_device_train_batch_size=4,
                        save_strategy='epoch',
                        learning_rate=2e-4,
                        do_eval=True,
                        per_device_eval_batch_size=4,
                        eval_strategy='epoch',
                        hub_model_id="dwb-colab-flan-t5-small-lora-finetune",
)

# the fine-tuning (peft for LoRA) stuff
peft_config = LoraConfig( lora_alpha=16,
                          lora_dropout=0.1,
                          r=64,
                          bias='none',
                          task_type='CAUSAL_LM'
)

`task_type`, cf. https://github.com/huggingface/peft/blob/main/src/peft/config.py#L222 ([archived](https://web.archive.org/web/20240603151908/https://github.com/huggingface/peft/blob/main/src/peft/config.py))

>        Args:
>            peft_type (Union[[`~peft.utils.config.PeftType`], `str`]): The type of Peft method to use.
>            task_type (Union[[`~peft.utils.config.TaskType`], `str`]): The type of task to perform.
>            inference_mode (`bool`, defaults to `False`): Whether to use the Peft model in inference mode.

After some searching using Cygwin

```
bballdave025@MYMACHINE /cygdrive/c/Users/bballdave025/.conda/envs/rwkv-lora-pat/Lib/site-packages/peft/utils
$ ls -lah
total 116K
drwx------+ 1 bballdave025 bballdave025    0 May 28 21:09 .
drwx------+ 1 bballdave025 bballdave025    0 May 28 21:09 ..
-rwx------+ 1 bballdave025 bballdave025 2.0K May 28 21:09 __init__.py
drwx------+ 1 bballdave025 bballdave025    0 May 28 21:09 __pycache__
-rwx------+ 1 bballdave025 bballdave025 8.0K May 28 21:09 constants.py
-rwx------+ 1 bballdave025 bballdave025 3.8K May 28 21:09 integrations.py
-rwx------+ 1 bballdave025 bballdave025  17K May 28 21:09 loftq_utils.py
-rwx------+ 1 bballdave025 bballdave025 9.7K May 28 21:09 merge_utils.py
-rwx------+ 1 bballdave025 bballdave025  25K May 28 21:09 other.py
-rwx------+ 1 bballdave025 bballdave025 2.2K May 28 21:09 peft_types.py
-rwx------+ 1 bballdave025 bballdave025  21K May 28 21:09 save_and_load.py

bballdave025@MYMACHINE /cygdrive/c/Users/bballdave025/.conda/envs/rwkv-lora-pat/Lib/site-packages/peft/utils
$ grep -iIRHn "TaskType" .
peft_types.py:60:class TaskType(str, enum.Enum):
__init__.py:20:# from .config import PeftConfig, PeftType, PromptLearningConfig, TaskType
__init__.py:22:from .peft_types import PeftType, TaskType

bballdave025@MYMACHINE /cygdrive/c/Users/bballdave025/.conda/envs/rwkv-lora-pat/Lib/site-packages/peft/utils
$
```

So, let's look at the `peft_types.py` file.

The docstring for `class TaskType(str, enum.Enum)` is

```
    Enum class for the different types of tasks supported by PEFT.
    
    Overview of the supported task types:
    - SEQ_CLS: Text classification.
    - SEQ_2_SEQ_LM: Sequence-to-sequence language modeling.
    - CAUSAL_LM: Causal language modeling.
    - TOKEN_CLS: Token classification.
    - QUESTION_ANS: Question answering.
    - FEATURE_EXTRACTION: Feature extraction. Provides the hidden states which can be used as embeddings or features
      for downstream tasks.
```



### We're going to start timing stuff, so here's some system info

`system_info_as_script.py` is a script I wrote with the help
of a variety of StackOverflow and documentation sources.
It should be in the working directory.

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was

`timestamp`

In [None]:
system_info_as_script.run(do_network_info=True)

I ran this from an elevated command prompt on my Windows PC without an nvidia GPU. The results are in the file,

[`system_info_win_compy_admin_2024-06-03T070700-0600.txt`](./system_info_win_compy_admin_2024-06-03T070700-0600.txt)

### ROUGE Metrics

Some references from the Microsoft/Google (who?) implementation

https://pypi.org/project/rouge-score/

https://web.archive.org/web/20240530231357/https://pypi.org/project/rouge-score/

<br/>

https://github.com/google-research/google-research/tree/master/rouge

https://web.archive.org/web/20240530231412/https://github.com/google-research/google-research/tree/master/rouge

<br/>

Not the one I used:

https://github.com/microsoft/nlp-recipes/blob/master/examples/text_summarization/summarization_evaluation.ipynb

https://web.archive.org/web/20240530231709/https://github.com/microsoft/nlp-recipes/blob/master/examples/text_summarization/summarization_evaluation.ipynb

<br/>

Someone else made this other one, which I inspected but didn't use.

https://pypi.org/project/rouge/

https://web.archive.org/web/20240530232029/https://pypi.org/project/rouge/

https://github.com/pltrdy/rouge

https://web.archive.org/web/20240530232023/https://github.com/pltrdy/rouge

but I think he defers to the rouge_score from Google.

#### My ROUGE Metrics

I want to use the skip-grams score. Thanks to

https://www.bomberbot.com/machine-learning/skip-bigrams-in-system/

https://web.archive.org/web/20240530230949/https://www.bomberbot.com/machine-learning/skip-bigrams-in-system/

I can do this as well as writing the code for the other metrics.

##### Not used for now

Focusing on the main goal. Quick and Reckless. My therapist would be so proud.

In [None]:
#import dwb_rouge_scores

#help(dwb_rouge_scores.dwb_rouge_n)

# print("SEPARATOR")

#help(dwb_rouge_scores.dwb_rouge_L)

# print("SMALLER-SEPARATOR\nwhich needs")

#help(dwb_rouge_scores.dwb_lcs)

# print("SEPARATOR")

#help(dwb_rouge_scores.dwb_rouge_s)

# print("SMALLER-SEPARATOR\nwhich needs")

#help(dwb_rouge_scores.dwb_skipngrams)

# print("SEPARATOR")

#help(dwb_rouge_scores.dwb_rouge_Lsum)

# print("which just wraps google-research's rouge_score's version")

#help()

#### Other useful ROUGE code

(found as I go along)

In [None]:
def format_rouge_score_rough(this_rouge_str,
                             do_debug_rouge_fmt=True):
    '''

    '''
    
    rouge_ret_str = this_rouge_str
    
    if do_debug_rouge_fmt:
        print(" #DEBUG 1#")
        print(rouge_ret_str)
    ##endof:  do_debug_rouge_fmt
    
    rouge_ret_str = re.sub(r"([(,][ ]?)([0-9A-Za-z_]+[=])",
                            "\g<1>\n     \g<2>",
                           rouge_ret_str,
                           flags=re.I|re.M
    )
    
    if do_debug_rouge_fmt:
        print(" #DEBUG 2#")
        print(rouge_ret_str)
    ##endof:  do_debug_rouge_fmt
    
    rouge_ret_str = re.sub(r"(.)([)])$",
                            "\g<1>\n\g<2>",
                           rouge_ret_str
    )
    
    if do_debug_rouge_fmt:
        print(" #DEBUG 3#")
        print(rouge_ret_str)
    ##endof:  do_debug_rouge_fmt
    
    return rouge_ret_str
    
##endof:  format_rouge_score_rough(<params>)

In [None]:
#------------------------------------------------------------------------------
# #  From https://github.com/google-research/google-research/tree/master/rouge
# #+ <strike>I can't see how to aggregate it, though I may have</strike>
# #+ I found a resource at
# #+  ref_gg_rg="https://github.com/huggingface/datasets/blob/" + \
# #+            "main/metrics/rouge/rouge.py"
# #+
# #+ arch_gg_rg="https://web.archive.org/web/20240603192938/" + \
# #+            "https://github.com/huggingface/datasets/blob/" + \
# #+            "main/metrics/rouge/rouge.py"
#
def compute_google_rouge_score(predictions, 
                               references, 
                               rouge_types=None, 
                               use_aggregator=True, 
                               use_stemmer=False):
    '''
    Figuring out the nice format of the deprecated method
    '''
    if rouge_types is None:
        rouge_types = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
    ##endof:  if rouge_types is None
    scorer = rouge_scorer.RougeScorer(rouge_types=rouge_types, 
                                      use_stemmer=use_stemmer
    )
    if use_aggregator:
        aggregator = scoring.BootstrapAggregator()
    else:
        scores = []
    ##endof:  if/else use_aggregator
    for ref, pred in zip(references, predictions):
        score = scorer.score(ref, pred)
        if use_aggregator:
            aggregator.add_scores(score)
        else:
            scores.append(score)
    ##endof:  for
    if use_aggregator:
        result = aggregator.aggregate()
    else:
        result = {}
        for key in scores[0]:
            result[key] = [score[key] for score in scores]
        ##endof:  for
    ##endof:  if/else
    return result
##endof:  compute_google_rouge_score

Extra cell.

### Try for a baseline

#### Just one summarization to begin with, randomly picked

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

In [None]:
#  Just one summarization to begin with, randomly picked ... but
#+ now with th possibility of a known seed, to allow visual 
#+ comparison with after-training results.
#+ I'M NOT GOING TO USE THIS REPEATED SEED, I'm just going to
#+ use the datum at the first index to compare.

do_seed_for_repeatable = False

summarizer = pipeline('summarization', 
                      model=model, 
                      tokenizer=tokenizer)

if do_seed_for_repeatable:
    rand_seed_for_randrange = 137
    random.seed(rand_seed_for_randrange)
##endof:  if do_seed_for_repeatable

sample = dataset['test'][randrange(len(dataset["test"]))]
print(f"dialogue: \n{sample['dialogue']}\n---------------")

res = summarizer(sample["dialogue"])

print(f"flan-t5-small summary:\n{res[0]['summary_text']}")

In [None]:
# # Don't need this again
!powershell -c (Get-Date -UFormat \"%s_%Y%m%dT%H%M%S%Z00\") -replace '[.][0-9]*_', '_'

print("The output for the date was:\n")
print(" timestamp")
print()
print("-"*72)
print()

# Can I mix them (underlying shell and python)? Yes!
print("That summary is a bi", end='')
print("t off, so I'm going ", end='')
print("to find out which on", end='')
print("e it is\nand see how ", end='')
print("the LoRA-tuned model", end='')
print(" handles it.\nI'm jus", end='')
print("t going to find it i", end='')
print("n the JSON in a text", end='')
print(" editor.  \nIt's dia", end='')
print("logue number 1368150", end='')
print("9.\nI was going to f", end='')
print("ind its index, but I", end='')
print("'m pretty sure I can", end='')
print("use the prompt\nwe b", end='')
print("uilt above,  prompt_", end='')
print("instruction_format(s", end='')
print("ample)          \n\n",)

church_dialogue_to_retry = (
    "Abigail: It's Sundaay.\nDamien: So?..\nAbigail: You know what tha"
    "t means.\nDamien: Hmm no I don't x)\nAbigail: Sunday means we go "
    "to church~.\nDamien: Oh, yeah..\nAbigail: Don't forget to put on "
    "a coat and tie.\nDamien: A coat and tie?.. Why?\nAbigail: To show"
    "respect to God and others.\nDamien: Omg..I'm glad Sunday is only "
    "once a week.\nAbigail: I hope God didn't hear that.\nDamien: He'l"
    "l forgive me \ud83d\ude07\nAbigail: Just be ready on time please."
)

objects_to_pickle.append(church_dialogue_to_retry)

#### Now, one summarization with comparison to ground truth

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

In [None]:
summarizer = pipeline('summarization', 
                      model=model, 
                      tokenizer=tokenizer)

pred_test_list = []
ref_test_list = []

sample_num = 0

this_sample = dataset['test'][sample_num]

print(f"dialogue: \n{this_sample['dialogue']}\n---------------")

ground_summary = this_sample['summary']
res = summarizer(this_sample['dialogue'])
res_summary = res[0]['summary_text']

print(f"human-genratd summary:\n{ground_summary}")
print(f"flan-t5-small summary:\n{res_summary}")

ref_test_list.append(ground_summary)
pred_test_list.append(res_summary)

print("\n\n---------- ROUGE SCORES ----------")

#------------------------------------------------------------------------------
#  datasets.load_metric
#+ Supposed to be deprecated, but it's the only one I found that
#+ aggregates things and gives more than an f-measure
rouge = load_metric('rouge', trust_remote_code=True)

results_test = rouge.compute(
                  predictions=pred_test_list,
                  references=ref_test_list,
                  use_aggregator=True
)

# >>> print(list(results_test.keys()))
# ['rouge1', 'rouge2', 'rougeL', 'rougeLsum']

In [None]:
print("\n\n---------- ROUGE SCORES ----------")
print("  --------- dialoge 1 ----------")
print()
print("ROUGE-1 results")
pprint.pp(results_test['rouge1'])
rouge1_str = str(results_test['rouge1'])
print(format_rouge_score_rough(rouge1_str))
print()
print("ROUGE-2 results")
pprint.pp(results_test['rouge2'])
rouge2_str = str(results_test['rouge2'])
print(format_rouge_score_rough(rouge2_str))
print()
print("ROUGE-L results")
pprint.pp(results_test['rougeL'])
rougeL_str = str(results_test['rougeL'])
print(format_rouge_score_rough(rougeL_str))
print()
print("ROUGE-Lsum results")
pprint.pp(results_test['rougeLsum'])
rougeLsum_str = str(results_test['rougeLsum'])
print(format_rouge_score_rough(rougeLsum_str))

##### Note on ROUGE Scores

```
# @todo : Run the ROUGE analysis from the Python package
#         (after running with  trust_remote_code=False
#          to find the deprecation it mentioned).

#------------------------------------------------------------------------------
# #  From https://github.com/google-research/google-research/tree/master/rouge
# #+ I can't see how to aggregate it, though I may have found a resource at
# #+  ref_gg_rg="https://github.com/huggingface/datasets/blob/" + \
# #+            "main/metrics/rouge/rouge.py"
# #+
# #+ arch_gg_rg="https://web.archive.org/web/20240603192938/" + \
# #+            "https://github.com/huggingface/datasets/blob/" + \
# #+            "main/metrics/rouge/rouge.py"
#

#  It turns out that the deprecated one is preferable in 
#+ output, at least until I can debug the aggregation of
#+ scores with another version: compute_google_rouge_score
```

That should come from the `compute_google_rouge_score`, above. I was able
to look through the code for `datasets.load_metric('rouge')` code and
put together that method.


For now, I used ...

```
# Using the deprecated-but-aggregating-and-not-only-f-score one
rouge = load_metric('rouge', trust_remote_code=False)
```


This next one is what the warning message said to use, but it only returns
an f-measure (f-score)

```
# #  Replacement for the load_metric - evaluate.load(metric_name)
# #+ Docs said:
# #+
# #+> Returns:
# #+>    rouge1: rouge_1 (f1),
# #+>    rouge2: rouge_2 (f1),
# #+>    rougeL: rouge_l (f1),
# #+>    rougeLsum: rouge_lsum (f1)
# #+>
# #+> Meaning we only get the f-score. I want more to compare.
# #-v- code 
# rouge = evaluate_dot_load('rouge')
```

#### Verbosity stuff - get rid of the nice advice

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

In [None]:
log_verbosity_is_critical = \
  logging.get_verbosity() == logging.CRITICAL # alias FATAL, 50
log_verbosity_is_error = \
  logging.get_verbosity() == logging.ERROR # 40
log_verbosity_is_warn = \
  logging.get_verbosity() == logging.WARNING # alias WARN, 30
log_verbosity_is_info = \
  logging.get_verbosity() == logging.INFO # 20
log_verbosity_is_debug = \
  logging.get_verbosity() == logging.DEBUG # 10

print( "The statement, 'logging verbosity is CRITICAL' " + \
      f"is {log_verbosity_is_critical}")
print( "The statement, 'logging verbosity is    ERROR' " + \
      f"is {log_verbosity_is_error}")
print( "The statement, 'logging verbosity is  WARNING' " + \
      f"is {log_verbosity_is_warn}")
print( "The statement, 'logging verbosity is     INFO' " + \
      f"is {log_verbosity_is_info}")
print( "The statement, 'logging verbosity is    DEBUG' " + \
      f"is {log_verbosity_is_debug}")

print()

init_log_verbosity = logging.get_verbosity()
print(f"The value of logging.get_verbosity() is: {init_log_verbosity}")

print()

init_t_n_a_w = os.environ.get('TRANSFORMERS_NO_ADVISORY_WARNINGS')
print(f"TRANSFORMERS_NO_ADIVSORY_WARNINGS: {init_t_n_a_w}")

### Actual Baseline

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

<b>!!! NOTE</b> You'd better <b>make 
dang sure you want the lots of output</b> 
before you set this next boolean to `True`

In [None]:
do_have_lotta_output_from_all_dialogs_summaries_1 = False

# Are you sure about the value of that last boolean? 1

There could be megabytes (maybe gigabytes) worth of text output if you've changed it to `True`.

In [None]:
#  ref1 = "https://web.archive.org/web/20240530051418/" + \
#+        "https://stackoverflow.com/questions/73221277/" + \
#+        "python-hugging-face-warning"
#  ref2 = "https://web.archive.org/web/20240530051559/" + \
#+        "https://huggingface.co/docs/transformers/en/" + \
#+        "main_classes/logging"

##  Haven't tried this, because the logging seemed easier,
##+ and the logging worked
#os.environ("TRANSFORMERS_NO_ADVISORY_WARNINGS") = 1

logging.set_verbosity_error()

summarizer = pipeline('summarization', 
                      model=model, 
                      tokenizer=tokenizer)

#*p*#baseline_sample_dialog_list = []
baseline_prediction_list = []
baseline_reference_list = []

baseline_tic = timeit.default_timer()

for sample_num in range(len(dataset['test'])):
    this_sample = dataset['test'][sample_num]
    
    if do_have_lotta_output_from_all_dialogs_summaries_1:
        print(f"dialogue: \n{this_sample['dialogue']}\n---------------")
    ##endof:  if do_have_lotta_output_from_all_dialogs_summaries_1
    
    ground_summary = this_sample['summary']
    res = summarizer(this_sample['dialogue'])
    res_summary = res[0]['summary_text']
    
    if do_have_lotta_output_from_all_dialogs_summaries_1:
        print(f"human-genratd summary:\n{ground_summary}")
        print(f"flan-t5-small summary:\n{res_summary}")
    ##endof:  if do_have_lotta_output_from_all_dialogs_summaries_1
    
    #*p*#    baseline_sample_dialog_list.append(this_sample)
    baseline_reference_list.append(ground_summary)
    baseline_prediction_list.append(res_summary)
##endof:  for sample_num in range(len(dataset['test']))

baseline_toc = timeit.default_timer()

baseline_duration = baseline_toc - baseline_tic

print( "Getting things ready for scoring")
print(f"took {baseline_toc - baseline_tic:0.4f} seconds.")

#  It turns out that the deprecated one is preferable in 
#+ output, at least until I can debug the aggregation of
#+ scores with another version
#+ That should come with the

rouge = load_metric('rouge', trust_remote_code=False)

baseline_results = rouge.compute(
                      predictions=baseline_prediction_list,
                      references=baseline_reference_list,
                      use_aggregator=True
)

# >>> print(list(baseline_results.keys()))
# ['rouge1', 'rouge2', 'rougeL', 'rougeLsum']

#*p*# objects_to_pickle.append(baseline_sample_dialog_list)
#*p*# objects_to_pickle.append(baseline_prediction_list)
#*p*# objects_to_pickle.append(baseline_reference_list)
#*p*# objects_to_pickle.append(baseline_results)

In [None]:
print("\n\n---------- ROUGE SCORES ----------")
print("  ---------- BASELINE ----------")
print()
print("ROUGE-1 results")
pprint.pp(baseline_results['rouge1'])
rouge1_str = str(baseline_results['rouge1'])
print(format_rouge_score_rough(rouge1_str))
print()
print("ROUGE-2 results")
pprint.pp(baseline_results['rouge2'])
rouge2_str = str(baseline_results['rouge2'])
print(format_rouge_score_rough(rouge2_str))
print()
print("ROUGE-L results")
pprint.pp(baseline_results['rougeL'])
rougeL_str = str(baseline_results['rougeL'])
print(format_rouge_score_rough(rougeL_str))
print()
print("ROUGE-Lsum results")
pprint.pp(baseline_results['rougeLsum'])
rougeLsum_str = str(baseline_results['rougeLsum'])
print(format_rouge_score_rough(rougeLsum_str))

##  Haven't tried this, because the logging seemed easier,
##+ and the logging worked
# os.environ("TRANSFORMERS_NO_ADVISORY_WARNINGS") = init_t_n_a_w

logging.set_verbosity(init_log_verbosity)

In [None]:
do_enter_duration_manually = False
NUM_TO_CATCH_NO_MANUAL_ENTRY = -137.
is_a_manual_entry_skip = False # innocent until proven guilty


if do_enter_duration_manually:
    # !!! remember to type in your number, if needed !!! #
    baseline_duration = NUM_TO_CATCH_NO_MANUAL_ENTRY
    # !!! UNCOMMENT THE NEXT LINE IF YOU WANT TO ENTER MANUALLY !!!
    #baseline_duration = 1162.5236
##endof:  if do_enter_duration_manually

print("Running baseline inference (using the test set)")
if ( ( do_enter_duration_manually ) and \
     ( baseline_duration == -137. ) \
   ):
    print("took AN UNKNOWN AMOUNT OF TIME.")
    print("You didn't manually enter in your real time,")
    print("as you should have.")
    is_a_manual_entry_skip = True
elif ( ( do_enter_duration_manualy ) and \
       ( baseline_duration != -137. ) 
     ):
    print("(and using your manually entered time)")
else:
    pass
##endof:  if <check manual entry>

if not is_a_manual_entry_skip:
    print(f"took {format_timespan(baseline_duration)}")
##endof:  if not is_a_manual_entry_skip

### Trainer - the Actual Trainer Part

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

In [None]:
trainer = SFTTrainer( model=model,
                      train_dataset=dataset['train'],
                      eval_dataset=dataset['evaluation'],
                      peft_config=peft_config,
                      tokenizer=tokenizer,
                      packing=True,
                      formatting_func=prompt_instruction_format,
                      args=training_args,
                    )
##  Warnings are below output.

##  Ended up not using this.
#                      max_seq_length=675
#          )


First time warnings from the code above (as it still is).

        
>        WARNING:bitsandbytes.cextension:The installed version of bitsandbytes \
>         was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, \
>         and GPU quantization are unavailable.
>        C:\Users\bballdave025\.conda\envs\rwkv-lora-pat\lib\site-packages\trl\\
>         trainer\sft_trainer.py:246: UserWarning: You didn't pass a `max_seq_length` \
>        argument to the SFTTrainer, this will default to 512
>         warnings.warn(
>        
>        [ > Generating train split: 6143/0 [00:04<00:00, 2034.36 examples/s] ]
>        
>        Token indices sequence length is longer than the specified maximum sequence \
>         length for this model (657 > 512). Running this sequence through the model \
>         will result in indexing errors
>        
>        [ > Generating train split: 355/0 [00:00<00:00, 6.10 examples/s] ]

<b>DWB Note</b> and possible

\# @todo:

<strike>So, I'm changing the `max_seq_length`.</strike> 
Maybe I should just throw out the offender(s) 
(along with the blank one that's in there somewhere),
but I'll just continue as is.

Actually, it appears I didn't run the updated cell, 
(with `max_seq_length=675`), since the
Warning and Advice are still there.

<hr/>

## Let's Train This LoRA Thing and See How It Does!

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

At about `1717063394_2024-05-30T100314-0600`, DWB went in and 
renamed `profile.ps1` to `NOT-USING_-_pro_file_-_now.ps1.bak`
That should get rid of our errors from `powershell`

### The long-time-taking training code is just below.

In [None]:
tic = timeit.default_timer()
trainer.train()
toc = timeit.default_timer()
print(f"tic: {tic}")
print(f"toc: {toc}")
training_duration = toc - tic
print(f"Training took {toc - tic:0.4f} seconds.")

In [None]:
do_by_hand = False
NUM_TO_CATCH_NO_DO_BY_HAND = -137.
is_a_do_by_hand_skip = False # innocent until proven guilty


if do_by_hand:
    # !!! remember to type in your number, if needed !!! #
    training_duration = NUM_TO_CATCH_NO_MANUAL_ENTRY
    # !!! UNCOMMENT THE NEXT LINE IF YOU WANT TO ENTER MANUALLY !!!
    #training_duration = 11081.7024
##endof:  if do_by_hand

print("Running training with LoRA")
print("(using the training and eval sets)")
if ( ( do_by_hand ) and \
     ( training_duration == -137. ) \
   ):
    print("took AN UNKNOWN AMOUNT OF TIME.")
    print("You didn't manually enter in your real time,")
    print("as you should have.")
    is_a_do_by_hand_skip = True
elif ( ( do_by_hand ) and \
       ( training_duration != -137. ) 
     ):
    print("(and using your manually entered time)")
else:
    pass
##endof:  if <check manual entry>

if not is_a_do_by_hand_skip:
    print(f"took {format_timespan(training_duration)}")
##endof:  if not is_a_do_by_hand_skip

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

#### @todo : consolidate "the other info as above"

I'm talking about the numbers of data points, tokens, whatever.

#### Any Comments / Things to Try (?)

We passed an evaluation set (parameter ``) to the `trainer`.
How can we see information about that?

#### How to get the evaluation set used by the trainer

I added the following parameters to the 
`training_args = TrainingArguments(<args>)`
call.

- `do_eval=True`
- `per_device_eval_batch_size=4`
- `eval_strategy='epoch'`

#### How to specify your repo name

I also added this next parameter to the arguments for
`training_args = TrainingArguments(<args>)`

- `hub_model_id="dwb-flan-t5-small-lora-finetune"`

#### The final TrainingArguments call - with parameter list

```
training_args = TrainingArguments( 
                        output_dir='output',
                        num_train_epochs=1,
                        per_device_train_batch_size=4,
                        save_strategy='epoch',
                        learning_rate=2e-4,
                        do_eval=True,
                        per_device_eval_batch_size=4,
                        eval_strategy='epoch',
                        hub_model_id="dwb-flan-t5-small-lora-finetune",
)
```

## Save the Trainer to Hugging Face and Get Our Updated Model

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

I'm following the [(archived) tutorial from Mehul Gupta on Medium](https://web.archive.org/web/20240522140323/https://medium.com/data-science-in-your-pocket/lora-for-fine-tuning-llms-explained-with-codes-and-example-62a7ac5a3578); since it's archived, you can follow exactly what I'm doing.

Running this next line of code will come up with a dialog box with text entry,
and I'm now using the `@thebballdave025` for Hugging Face stuff.

<b>Make sure to use the WRITE token, here.</b>

In [None]:
#  This will come up with a dialog box with text entry.
#+ and I'm now using @thebballdave025 for Hugging Face.

# Use the write token, here.
notebook_login()

In [None]:
# Save tokenizer and create a tokenizer model card
tokenizer.save_pretrained('testing')
  #  used 'testing' first - I think I can make a repo according
  #+ to the first getting-started cli instructions, but let's
  #+ use what Mehul Gupta used, first
  #  Actually, I think 'testing' is the local directory

# Create the trainer model card
trainer.create_model_card()

# Push the results to the Hugging Face Hub
trainer.push_to_hub()

<hr/>

Part of the output included the URL,

https://huggingface.co/thebballdave025/dwb-flan-t5-small-lora-finetune/commit/c87d34b398f3801ceb1e18c819a7c8fc894989c7

Hooray! The repo name I used in constructing the trainer worked!

I can get to the general repo with the URL,

https://huggingface.co/thebballdave025/dwb-flan-t5-small-lora-finetune

<hr/>

## Info on the Fine-Tuned Model from the Repo's README - Model Card(?)

### [thebballdave025/dwb-flan-t5-small-lora-finetune](https://huggingface.co/thebballdave025/dwb-flan-t5-small-lora-finetune)

\[archived\] The archiving attempt at archive.org (Wayback Machine) failed.
I'm not sure why, as the model is set as public.

`PEFT  TensorBoard  Safetensors       generator  trl  sft  generated_from_trainer       License: apache-2.0`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<b>@todo</b> : [Edit Model Card](https://huggingface.co/thebballdave025/dwb-flan-t5-small-lora-finetune/edit/main/README.md)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;
Unable to determine this model’s pipeline type. Check the docs 
[(i)](https://huggingface.co/docs/hub/models-widgets#enabling-a-widget).

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;
Adapter for
[google/flan-t5-small](https://huggingface.co/google/flan-t5-small)

#### dwb-flan-t5-small-lora-finetune

This model is a fine-tuned version of 
[google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on the 
generator dataset \[DWB note: I don't know why it says "generator dataset".
I used the samsum dataset, which I will link here and on the
model card, eventually\]. 

It achieves the following results on the evaluation set:

- Loss: 0.0226
- <i>DWB Note: I don't know which metric was used to calculate loss. If this were more important, I'd dig through code to find out and evaluate with the same metric. If I'm really lucky, they somehow used the ROUGE scores in the loss function, so we match.</i>

#### Model description

More information needed

#### Intended uses & limitations

More information needed

#### Training and evaluation data

More information needed

#### Training procedure

#### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

#### Training results

```

  Training Loss | Epoch | Step | Validation Loss
 ---------------+-------+------+-----------------
      0.0685    |  1.0  | 1536 |     0.0226
```

#### Framework versions

- PEFT 0.11.2.dev0
- Transformers 4.41.1
- Pytorch 2.3.0+cpu
- Datasets 2.19.1
- Tokenizers 0.19.1

<hr/>

## Actually Get the Model from Hugging Face

Running this next line of code will come up with a dialog box with text entry,
and I'm now using the `@thebballdave025` for Hugging Face stuff.

<b>Make sure to use the READ token, here.</b>

In [None]:
# Read token. Will bring up text entry to paste token string
notebook_login()

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

## Evaluation on the Test Set and Comparison to Baseline

#### Verbosity stuff - get rid of the nice advice

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

In [None]:
log_verbosity_is_critical = \
  logging.get_verbosity() == logging.CRITICAL # alias FATAL, 50
log_verbosity_is_error = \
  logging.get_verbosity() == logging.ERROR # 40
log_verbosity_is_warn = \
  logging.get_verbosity() == logging.WARNING # alias WARN, 30
log_verbosity_is_info = \
  logging.get_verbosity() == logging.INFO # 20
log_verbosity_is_debug = \
  logging.get_verbosity() == logging.DEBUG # 10

print( "The statement, 'logging verbosity is CRITICAL' " + \
      f"is {log_verbosity_is_critical}")
print( "The statement, 'logging verbosity is    ERROR' " + \
      f"is {log_verbosity_is_error}")
print( "The statement, 'logging verbosity is  WARNING' " + \
      f"is {log_verbosity_is_warn}")
print( "The statement, 'logging verbosity is     INFO' " + \
      f"is {log_verbosity_is_info}")
print( "The statement, 'logging verbosity is    DEBUG' " + \
      f"is {log_verbosity_is_debug}")

print()

init_log_verbosity = logging.get_verbosity()
print(f"The value of logging.get_verbosity() is: {init_log_verbosity}")

print()

init_t_n_a_w = os.environ.get('TRANSFORMERS_NO_ADVISORY_WARNINGS')
print(f"TRANSFORMERS_NO_ADIVSORY_WARNINGS: {init_t_n_a_w}")

### Here's the actual evaluation

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

Output was:

`timestamp`

<b>!!! NOTE !!!</b> I'm going to use `tat` (with an underscore
or undescores before, after, or surrounding the variable names)
to indicate 'testing-after-training'.

I guess I could have used `inference`, but I didn't.

<b>!!! another NOTE</b> You'd better <b>make 
dang sure you want the lots of output</b> 
before you set this next boolean to `True`

In [None]:
do_have_lotta_output_from_all_dialogs_summaries = False

# Are you sure about the value of that last boolean?

There could be megabytes (maybe gigabytes) worth of text output if you've changed it to `True`.

In [None]:
logging.set_verbosity_error()

tat_summarizer = pipeline('summarization', 
                          model=tat_model, 
                          tokenizer=tat_tokenizer)

#*p*#tat_sample_dialog_list = []
prediction_tat_list = []
reference_tat_list = []

tat_tic = timeit.default_timer()

for sample_num in range(len(dataset['test'])):
    this_sample = dataset['test'][sample_num]
    
    if do_have_lotta_output_from_all_dialogs_summaries:
        print("="*75)
        print(f"dialogue: \n{this_sample['dialogue']}\n---------------")
    ##endof:  if do_have_lotta_output_from_all_dialogs_summaries
    
    ground_tat_summary = this_sample['summary']
    res_tat = summarizer(this_sample['dialogue'])
    res_tat_summary = res_tat[0]['summary_text']
    
    if do_have_lotta_output_from_all_dialogs_summaries:
        print("-"*70)
        print(f"human-genratd summary:\n{ground_tat_summary}")
        print("-"*70)
        print(f"flan-t5-small summary:\n{res_tat_summary}")
        print("-"*70)
    ##endof:  if do_have_lotta_output_from_all_dialogs_summaries

#*p*#    tat_sample_dialog_list.append(this_sample)
    reference_tat_list.append(ground_tat_summary)
    prediction_tat_list.append(res_tat_summary)
##endof:  for sample_num in range(len(dataset['test']))

tat_toc = timeit.default_timer()

print( "Getting things ready for scoring (after training)")
print(f"took {tat_toc - tat_tic:0.4f} seconds.")

print("\n\n---------- ROUGE SCORES ----------")

rouge = load_metric('rouge', trust_remote_code=True)
  #  Set trust_remote_code=False to see the warning,
  #+ deprecation, and what to change to.

results_tat = rouge.compute(
                  predictions=prediction_tat_list,
                  references=reference_tat_list,
                  use_aggregator=True
)

# >>> print(list(results_tat.keys()))
# ['rouge1', 'rouge2', 'rougeL', 'rougeLsum']

print()
print("ROUGE-1 results")
pprint.pp(results_tat['rouge1'])
print()
print("ROUGE-2 results")
pprint.pp(results_tat['rouge2'])
print()
print("ROUGE-L results")
pprint.pp(results_tat['rougeL'])
print()
print("ROUGE-Lsum results")
pprint.pp(results_tat['rougeLsum'])

#*p*# objects_to_pickle.append(tat_sample_dialog_list)
#*p*# objects_to_pickle.append(prediction_tat_list)
#*p*# objects_to_pickle.append(reference_tat_list)
#*p*# objects_to_pickle.append(results_tat)

logging.set_verbosity(init_log_verbosity)

In [None]:
# # Don't need this again
!date +'%s_%Y%m%dT%H%M%S%z'

### Pickle things to pickle save

In [None]:
with open(pickle_filename, 'wb') as pfh:
    pickle.dump(objects_to_pickle , pfh)
##endof:  with open ... as pfh # (pickle file handle)

<hr/>

## Notes Looking Forward to LoRA on RWKV

Hugging Face Community, seems to have a good portion of their models

https://huggingface.co/RWKV

https://web.archive.org/web/20240530232509/https://huggingface.co/RWKV

<br/>

GitHub has even more versions/models, including the `v4-neo` that
I think will be important (the LoRA project)

https://github.com/BlinkDL/RWKV-LM/tree/main

https://web.archive.org/web/20240530232637/https://github.com/BlinkDL/RWKV-LM/tree/main

<br/>

The main RWKV website (?!)

https://www.rwkv.com/

https://web.archive.org/web/20240529120904/https://www.rwkv.com/

<br/>
<br/>

GOOD STUFF. A project doing LoRA with RWKV

https://github.com/Blealtan/RWKV-LM-LoRA/

https://web.archive.org/web/20240530232823/https://github.com/Blealtan/RWKV-LM-LoRA

<br/>
<br/>

The official blog, I guess, with some good coding examples

https://huggingface.co/blog/rwkv

https://web.archive.org/web/20240530233025/https://huggingface.co/blog/rwkv

It includes something that's similar to what I'm doing here in the
`First_Full_LoRA_Trial_with_Transformer_Again.ipynb` tutorial, etc.

```
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "RWKV/rwkv-raven-1b5"

model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```

The `AutoModelForCausalLM` is the same as the tutorial I'm following,
but I don't know what the `.to(0)` is for.

Really quickly, also looking at

https://huggingface.co/RWKV/rwkv-4-world-7b

https://web.archive.org/web/20240530234438/https://huggingface.co/RWKV/rwkv-4-world-7b

I see an example for CPU.

```
model = AutoModelForCausalLM.from_pretrained(
              "RWKV/rwkv-4-world-7b",
              trust_remote_code=True
).to(torch.float32)

tokenizer = AutoTokenizer.from_pretrained(
              "RWKV/rwkv-4-world-7b",
              trust_remote_code=True)
```

<br/><br/>

(Old version? Unofficial, it seems)

https://huggingface.co/docs/transformers/en/model_doc/rwkv

https://web.archive.org/web/20240530232341/https://huggingface.co/docs/transformers/en/model_doc/rwkv