# Efficiently train Large Language Models with LoRA and Hugging Face on CM

You will learn how to:

1. Setup Development Environment
2. Load and prepare the dataset
3. Fine-Tune T5 with LoRA and bnb int-8
4. Evaluate & run Inference with LoRA FLAN-T5

### Quick intro: PEFT or Parameter Efficient Fine-tunin

[PEFT](https://github.com/huggingface/peft), or Parameter Efficient Fine-tuning, is a new open-source library from Hugging Face to enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. PEFT currently includes techniques for:

- LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf)
- Prefix Tuning: [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)
- P-Tuning: [GPT Understands, Too](https://arxiv.org/pdf/2103.10385.pdf)
- Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)

*Note: This tutorial was created and run on a g4dn.12xLarge using 2 GPUs

## 1. Setup Development Environment

In our example, we use the [PyTorch Deep Learning AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-pytorch.html) with already set up CUDA drivers and PyTorch installed. We still have to install the Hugging Face Libraries, including transformers and datasets. Running the following cell will install all the required packages.

In [1]:
# install Hugging Face Libraries
#!pip install  peft==0.2.0 datasets==2.14.5 transformers==4.27.1 accelerate==0.17.1 evaluate==0.4.0 bitsandbytes==0.37.1 pandas rouge-score tensorboard py7zr s3fs loralib --upgrade

In [2]:
#!pip install s3fs

In [3]:
# from google.colab import drive
# drive.mount('/content/gdrive')
# !cp '/content/gdrive/My Drive/Data/FINDSum/text/FINDSum-ROO/roo_input_2000/'* .

In [4]:
# Let us get Dataload setup
import os
import pandas as pd
import datasets
import s3fs
from s3fs import S3FileSystem
from datasets import Dataset

#for refactoring
# AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET")
# AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
# AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
# AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")

AWS_S3_BUCKET = "<s3 bucket>" 
AWS_ACCESS_KEY_ID = "<access key>"
AWS_SECRET_ACCESS_KEY = "<secret key >"
AWS_SESSION_TOKEN = "<session token>"
storage_options= {
            "key": AWS_ACCESS_KEY_ID,
            "secret": AWS_SECRET_ACCESS_KEY,
            "token": AWS_SESSION_TOKEN,
        }

#s3 = datasets.filesystems.S3FileSystem(key=storage_options["key"], secret=storage_options["secret"], token=storage_options["token"])
s3 = S3FileSystem(key=storage_options["key"], secret=storage_options["secret"], token=storage_options["token"])




def create_dataset(file_list):
# Create an empty dataframe to store the combined data
    combined_df = pd.DataFrame()

    # Loop through each CSV file and append its contents to the combined dataframe
    for csv_file in file_list:
        print(csv_file)
        df=pd.read_csv(
            f"s3://{csv_file}",
            storage_options=storage_options,
        )
        print(df.info())
        combined_df = pd.concat([combined_df, df], ignore_index=True)

    print(combined_df.info())
    
    return Dataset.from_pandas(combined_df)

csv_files=s3.ls("s3://goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/train")
# remove the first item which contains the directory
csv_files.pop(0)


print(f"train files : \n {csv_files}")
train_ds = create_dataset(csv_files)
print(f"train dataset: \n {train_ds} ")
#next use test csv files
csv_files=s3.ls("s3://goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/test")
# remove the first item which contains the directory
csv_files.pop(0)
print(f"test files : \n {csv_files}")
test_ds = create_dataset(csv_files)
print(f"train dataset: \n {test_ds} ")

  from .autonotebook import tqdm as notebook_tqdm


train files : 
 ['goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/train/train_roo_segment_0_input_2_1000.csv', 'goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/train/train_roo_segment_1_input_2_1000.csv']
goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/train/train_roo_segment_0_input_2_1000.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16820 entries, 0 to 16819
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   document  16820 non-null  object
 1   summary   16820 non-null  object
dtypes: object(2)
memory usage: 262.9+ KB
None
goes-se-sandbox01/vishr/data/FINDSum/FINDSum-ROO/train/train_roo_segment_1_input_2_1000.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16820 entries, 0 to 16819
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   document  16820 non-null  object
 1   summary   16820 non-null  object
dtypes: object(2)
memory usage: 262.9+ KB


In [5]:
#set up the train and test classes
from datasets import DatasetDict

ds_fin= DatasetDict()
ds_fin["train"]=  train_ds
ds_fin["test"]=test_ds


In [6]:
# let us have a look at our dataset
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [7]:
show_random_elements(ds_fin["train"], 2)

Unnamed: 0,document,summary
0,"we currently have $ 4.1 billion of debt obligations outstanding , none of which are maturing in the next twelve months . for a summary of principal debt balances and their maturity dates and principal terms refer to note 9 - debt , in the notes to our consolidated financial statements . we anticipate closing the acquisition of greektown in mid-2019 , and we expect to fund the purchase with approximately $ 350.0 million of cash on hand ( which represents a portion of the proceeds that we raised in our november 2018 equity offering ) and $ 350.0 million of debt , either through additional long-term debt financing or under our revolving credit facility . we anticipate funding future transactions with a mix of debt , equity and available cash . we believe that we have sufficient liquidity to meet our liquidity and capital resource requirements primarily through currently available cash and cash equivalents , restricted cash , short term investments , cash received under our lease agreements , borrowings from banks , including undrawn capacity under our revolving credit facility , and proceeds from the issuance of debt and equity securities . all of the lease agreements call for an initial term of fifteen years with four , five-year renewal options ( except for harrah 's philadelphia ) and are designed to provide us with a reliable and predictable revenue stream . however , our cash flows from operations and our ability to access capital resources could be adversely affected due to uncertain economic factors and volatility in the financial and credit markets . in particular , we can provide no assurances that our tenants will not default on their leases or fail to make full rental payments if their businesses become challenged due to , among other things , adverse economic conditions . our ability to raise funds through the issuance of debt and equity securities and access to other third-party sources of capital in the future will be dependent on , among other things , general economic conditions , general market conditions for reits , market perceptions and the trading price of our stock . we will continue to analyze which sources of capital are most advantageous to us at any particular point in time , but the capital markets may not be consistently available on terms we deem attractive , or at all . cash flow analysis the table below summarizes our cash flows for the year ended december 31 , 2018 and the period from october 6 , 2017 to december 31 , 2017 : replace_table_token_8_th cash flows from operating activities net cash provided by operating activities increased $ 374.6 million for the year ended december 31 , 2018 compared with the period from october 6 , 2017 to december 31 , 2017. the increase is primarily driven by a full year of operations in 2018 , compared to only three months of operations in 2017 . 52 cash flows from investing activities net cash used in investing activities increased $ 4.6 million for the year ended december 31 , 2018 compared with the period from october 6 , 2017 to december 31 , 2017. during 2018 , the primary use of cash from investing activities was our investment in direct financing leases of $ 771.5 million related to the purchase of octavius tower and harrah 's philadelphia and net investments in short-term investments of $ 520.9 million . during the period from october 6 , 2017 to december 31 , 2017 , our investment in deferred financing leases of $ 1,136.2 million related to the acquisition of harrah 's las vegas was the primary use of cash from investing activities . cash flows from financing activities net cash provided by financing activities decreased $ 110.6 million for the year ended december 31 , 2018 compared with the period from october 6 , 2017 to december 31 , 2017. during the year ended december 31 , 2018 the primary sources and uses of cash from financing activities include : net proceeds from our initial public offering of $ 1,307.1 million of our common stock ; net proceeds from our primary follow-on equity offering of $ 694.4 million of our common stock ; repayment of $ 300.0 million on our revolving credit facility ; repayment of $ 100.0 million on our term loan b facility ; redemption of $ 290.1 million in aggregate principal amount of our second lien notes ; dividend payments of $ 262.7 million during the period from october 6 , 2017 to december 31 , 2017 the primary sources and uses of cash from financing activities include : proceeds from the issuance of $ 2,200.0 million of our term loan b facility ; proceeds from the $ 300.0 million draw from our revolving credit facility ; proceeds from the private placement issuance of $ 1,000.0 million of our common stock ; the sale of approximately 18.4 acres of undeveloped land located behind the linq hotel & casino and harrah 's las vegas to caesars for $ 73.6 million ; repayment of our $ 1,638.4 million senior secured first lien prior term loan ; repayment of our $ 311.7 million first-priority senior secured prior first lien notes ; the purchase by vici propco of the entirety of the outstanding cplv mezzanine debt in the aggregate principal amount of $ 400.0 million ; costs of $ 36.2 million related to our common stock private placement and premium and fees related to the purchase of the mezzanine debt of $ 38.4 million ; and debt issuance costs of $ 31.5 million related to our term loan b facility and revolving credit facility . story_separator_special_tag 53 debt the following table summarizes our debt related transactions from the formation date to december 31 , 2018 : replace_table_token_9_th impact of initial public offering on february 5 , 2018 , we completed an initial public offering of 69,575,000 shares of common stock ( which included 9,075,000 shares of common stock related to the overallotment option exercised by the underwriters in full ) at an offering price of $ 20.00 per share for gross proceeds of $ 1.4 billion , resulting in net proceeds of $ 1.3 billion after commissions and expenses . we utilized a portion of the net proceeds from the stock offering to : ( a ) pay down $ 300.0 million of indebtedness outstanding under the revolving credit facility ; ( b ) redeem $ 268.4 million in aggregate principal amount of the second lien notes at a redemption price of 108 % plus accrued and unpaid interest to the date of the redemption ; and ( c ) repay $ 100.0 million of the term loan b facility . covenants on december 22 , 2017 , vici propco entered into a credit agreement ( the “ credit agreement ” ) governing the term loan b facility and the revolving credit facility . the credit agreement contains customary covenants that , among other things , limit the ability of vici propco and its restricted subsidiaries to : ( i ) incur additional indebtedness ; ( ii ) merge with a third party or engage in other fundamental changes ; ( iii ) make restricted payments ; ( iv ) enter into , create , incur or assume any liens ; ( v ) make certain sales and other dispositions of assets ; ( vi ) enter into certain transactions with affiliates ; ( vii ) make certain payments on certain other indebtedness ; ( viii ) make certain investments ; and ( ix ) incur restrictions on the ability of restricted subsidiaries to make certain distributions , loans or transfers of assets to vici propco or any restricted subsidiary . these covenants are subject to a number of exceptions and qualifications , including the ability to make unlimited restricted payments to maintain our reit status and to avoid the payment of federal or state income or excise tax , the ability to make restricted payments in an amount not to exceed 95 % of our funds from operations ( as defined in the credit agreement ) subject to no event of default under the credit agreement and pro forma compliance with the financial covenant pursuant to the credit agreement , and the ability to make additional restricted payments in an aggregate amount not to exceed the greater of 0.6 % of adjusted total assets ( as defined in the credit agreement ) or $ 30,000,000. commencing with the first full fiscal quarter ended after december 22 , 2017 , if the outstanding amount of the revolving credit facility plus any drawings under letters of credit issued pursuant to the credit agreement that have not been reimbursed as of the end of any fiscal quarter exceeds 30 % of the aggregate amount of the revolving credit facility , vici propco and its restricted subsidiaries on a consolidated basis would be required to maintain a maximum total net debt to adjusted total assets ratio , as defined in the credit agreement , as of the last day of any applicable fiscal quarter . 54 the cplv cmbs debt was incurred in october 2017 pursuant to a loan agreement containing certain covenants limiting cplv property owner llc 's ability to among other things : ( i ) incur additional debt ; ( ii ) enter into certain transactions with its affiliates ; ( iii ) consolidate , merge , sell or otherwise dispose of its assets ; and ( iv ) allow transfers of its direct or indirect equity interests . the second lien notes were issued on october 6 , 2017 , pursuant to an indenture ( the “ indenture ” ) by and among vici propco and its wholly owned subsidiary , vici fc inc. ( together , the “ issuers ” ) , the subsidiary guarantors party thereto , and umb bank national association , as trustee . the indenture contains covenants that limit the issuers ' and their restricted subsidiaries ' ability to , among other things : ( i ) incur additional debt ; ( ii ) pay dividends on or make other distributions in respect of their capital stock or make other restricted payments ; ( iii ) make certain investments ; ( iv ) sell certain assets ; ( v ) create or permit to exist dividend and or payment restrictions affecting their restricted subsidiaries ; ( vi ) create liens on certain assets to secure debt ; ( vii ) consolidate , merge , sell or otherwise dispose of all or substantially all of their assets ; ( viii ) enter into certain transactions with their affiliates ; and ( ix ) designate their subsidiaries as unrestricted subsidiaries . these covenants are subject to a number of exceptions and qualifications , including the ability to declare or pay any cash dividend or make any cash distribution to vici to the extent necessary for vici to distribute cash dividends of 100 % of our “ real estate investment trust taxable income ” within the meaning of section 857 ( b ) ( 2 ) of the internal revenue code of 1986 , as amended , certain restricted payments not to exceed the amount of our cumulative earnings ( calculated pursuant to the indenture as $ 30,000,000 plus 95 % of our cumulative adjusted funds from operations ( as defined in the indenture ) less cumulative distributions , with certain other adjustments ) , and the ability to make restricted payments in an amount equal to the greater of 0.6 % of adjusted total assets ( as defined in the indenture )\n","discussion of operating results replace_table_token_4_th _ * represents the period from october 6 , 2017 , the date of the company 's formation , through december 31 , 2017 revenue for the year ended december 31 , 2018 and the period from october 6 , 2017 to december 31 , 2017 , our revenue was $ 898.0 million and $ 187.6 million , respectively , and was comprised as follows : replace_table_token_5_th real property business revenue real property business revenue is generated from rent from our lease agreements and reimbursements of property taxes , and increased $ 689.5 million during the year ended december 31 , 2018 compared to the period from october 6 , 2017 to december 31 , 2017. the increase was primarily driven by a full year of operations in 2018 , compared to only three months of operations in 2017. additionally , we added octavius tower and harrah 's philadelphia to our real estate portfolio in 2018 . 48 the following table details the components of our income from direct financing and operating leases : replace_table_token_6_th ( 1 ) amounts represent the non-cash adjustment to income from direct financing leases in order to recognize income on an effective interest basis at a constant rate of return over the term of the leases .\n"
1,"our truckload service offering provides truckload freight services as a medium- to long-haul common carrier . we have provided truckload services since our inception , and we derive the largest portion of our revenues from these services . · dedicated freight . our dedicated freight service offering is a variation of our truckload service , whereby we agree to make our equipment and drivers available to a specific customer for shipments over particular routes at specified times . in addition to serving specific customer needs , our dedicated freight service offering also aids in driver recruitment and retention . strategic capacity solutions . our scs operating segment consists of our freight brokerage service offering which matches customer shipments with available equipment of authorized carriers and provides services that complement our trucking operations . we provide these services primarily to our existing trucking customers , many of whom prefer to rely on a single carrier , or a small group of carriers , to provide all of their transportation needs . to date , a majority of the customers of scs have also engaged us to provide services through one or more of our trucking service offerings . intermodal . intermodal shipping is a method of transporting freight using multiple modes of transportation between origin and destination , with the freight remaining in a trailer or special container throughout the trip . our rail intermodal service offering provides our customers cost savings over truckload with a slightly slower transit speed , while allowing us to reposition our equipment to maximize our freight network yield . during august 2010 , we entered into a long-term agreement with bnsf railway to lease 53 ' domestic intermodal containers . prior to the agreement , the majority of intermodal 's revenue was derived from trailer-on-flat-car service . because of the lack of lane density , we reduced the number of our leased containers from 500 to approximately 125. our container contract with bnsf expired on december 31 , 2012. accordingly , we are scheduled to return the remaining leased containers to bnsf during the first quarter and plan to transition profitable intermodal freight to other sources of capacity throughout 2013 . 19 story_separator_special_tag the next several years , or that any necessary additional financing will be available , if at all , in amounts required or on terms satisfactory to us , especially in light of our net loss for 2012 . 21 note regarding presentation by agreement with our customers , and consistent with industry practice , we add a graduated surcharge to the rates we charge our customers as diesel fuel prices increase above an agreed upon baseline price per gallon . the surcharge is designed to approximately offset increases in fuel costs above the baseline . fuel prices are volatile , and the fuel surcharge increases our revenue at different rates for each period . we believe that comparing operating costs and expenses to total revenue , including the fuel surcharge , could provide a distorted comparison of our operating performance , particularly when comparing results for current and prior periods . therefore , we have used base revenue , which excludes the fuel surcharge revenue , and instead taken the fuel surcharge as a credit against the fuel and fuel taxes and purchased transportation line items in the table setting forth the percentage relationship of certain items to base revenue below . we do not believe that a reconciliation of the information presented on this basis and corresponding information comparing operating costs and expenses to total revenue would be meaningful . data regarding both total revenue , which includes the fuel surcharge , and base revenue , which excludes the fuel surcharge , is included in the consolidated statements of operations included in this report . base revenues from our scs operating segment , consisting entirely of base revenues from our freight brokerage service offering , have fluctuated in recent periods . this service offering typically does not involve the use of our tractors and trailers . therefore , an increase in these revenues tends to cause expenses related to our operations that do involve our equipment—including fuel expense , depreciation and amortization expense , operations and maintenance expense , salaries , wages and employee benefits and insurance and claims expense—to decrease as a percentage of base revenue , and a decrease in these revenues tends to cause those expenses to increase as a percentage of base revenue with a related change in purchased transportation expense . since changes in scs revenues generally affect all such expenses , as a percentage of base revenue , we do not specifically mention it as a factor in our discussion of increases or decreases in the other expenses presented in the consolidated statements of operations in the period-to-period comparisons below . fiscal year ended december 31 , 2012 compared to fiscal year ended december 31 , 2011 results of operations – combined services total base revenue decreased 0.6 % from $ 411.0 million to $ 408.7 million . we reported a net loss for all service offerings of $ 17.7 million ( $ 1.71 per share ) , as compared to a net loss of $ 10.8 million ( $ 1.05 per share ) . our effective tax rate increased from 31.5 % to 35.2 % . income tax expense varies from the amount computed by applying the federal tax rate to income before income taxes primarily due to state income taxes , net of federal income tax effect , adjusted for permanent differences , the most significant of which is the effect of the per diem pay structure for drivers . due to the partially nondeductible effect of per diem payments , our tax rate will vary in future periods based on fluctuations in earnings and in the number of drivers who elect to receive this pay structure . 22 results of operations – trucking relationship of certain items to base trucking revenue the following table sets forth the percentage relationship of certain items to base revenue of our trucking operating segment for the periods indicated . story_separator_special_tag fuel and fuel taxes are shown net of fuel surcharges . replace_table_token_5_th key operating statistics : replace_table_token_6_th ( 1 ) total miles include both loaded and empty miles . ( 2 ) the empty mile factor is the number of miles traveled for which we are not typically compensated by any customer as a percent of total miles traveled . ( 3 ) tractors include company-operated tractors in-service plus tractors operated by independent contractors . ( 4 ) average miles per trip is based upon loaded miles divided by the number of trucking shipments . ( 5 ) operating ratio is based upon total operating expenses , net of fuel surcharge revenue , as a percentage of base revenue . base revenue from our trucking operating segment decreased from $ 321.3 million to $ 297.6 million . the decrease was primarily the result of : · our total miles and our average miles per tractor per week decreased 7.2 % and 2.0 % , respectively . · the size of our fleet decreased 5.6 % . · the total number of loads dispatched decreased 9.1 % . · our empty mile factor increased 3.6 % . 23 the operating ratio for our trucking operating segment deteriorated by 4.0 percentage points of base trucking revenue to 110.0 % due to the following factors : · salaries , wages and employee benefits increased 3.7 percentage points of base trucking revenue due in large part to a 7.4 % reduction in base trucking revenue and a 28.9 % reduction in the percentage of our tractor fleet comprised of independent contractors . as the percentage of our fleet comprised of independent contractors decreases , the percentage of our fleet comprised of company drivers increases , along with the associated salaries , wages and benefits for such company drivers . during 2012 , we continued to see evidence of a tightening market of eligible drivers related to the implementation of csa , which caused our total driver compensation costs to increase 4.2 % on a per mile basis as we needed to offer sign-on bonuses to attract new drivers , we increased non-mileage pay to help us retain drivers , and we raised driver pay for new drivers with less than one year experience . new hours-of-service rules scheduled to go into effect in 2013 may further reduce the pool of eligible drivers and may lead to increases in driver related expenses that would increase salaries , wages and employee benefits . we also have experienced an increase in the frequency and severity of workers ' compensation claims , which have increased by approximately $ 2.0 million or 69.1 % . in addition to the above , medical payments made under our employee benefits plan increased approximately $ 1.1 million or 23.7 % . · fuel and fuel taxes expense , net of fuel surcharge , remained flat as a percentage of base trucking revenue . tractor utilization was 2.0 % lower during 2012 as compared to 2011 , which caused fuel and fuel taxes as a percentage of revenue to increase as trucks spent more time idling . while fuel costs generally have been higher in 2012 , improved fuel purchasing and fuel surcharge collections as compared to 2011 lowered our net fuel cost per gallon ( fuel cost per gallon minus fuel surcharge collections per gallon ) by approximately $ 0.06. additionally , our fuel economy improved 1.2 % as we added new , more fuel efficient trucks to the fleet . we anticipate fuel costs will continue to be affected in the future by price fluctuations , the terms and collectability of fuel surcharge revenue , fuel efficiency and the percentage of total miles driven by independent contractors . · purchased transportation , which is comprised of independent contractors ' compensation and fees paid to mexican carriers , decreased 1.6 percentage points of base trucking revenue . this decrease was the result of a reduction of 43 independent contractors , or 28.9 % , included in our fleet . over the longer term , we expect our purchased transportation expense to increase if we achieve our goal to grow our independent contractor fleet and our cross-border mexico business . in the event that we are unable to recruit and retain independent contractors , this expense could continue to fall , causing a corresponding increase in fuel and fuel taxes expense and salaries , wages and employee benefits expense . · depreciation and amortization decreased 0.2 percentage points of base trucking revenue primarily due to an overall decrease in the size of our tractor and trailer fleets . as of december 31 , 2012 , we reduced our total tractor count by 42 units as compared to december 31 , 2011 , representing units shut down due to high mileage and trade life cycles . we also reduced our trailer count by 227 year over year as part of our plan to reduce the number of trailers because of our investment in trailer tracking devices . as a result of our plan to reduce the age of our fleet and the increased costs of new equipment , we expect depreciation and amortization expense to increase as a percentage of base trucking revenue in future periods . absent offsetting improvements in average revenue per tractor or growth in our independent contractor fleet and non-asset based operations , our expense in this category as a percentage of revenue could increase going forward if equipment prices continue to inflate . · operations and maintenance expense increased 1.3 percentage points of base trucking revenue primarily due to a 10.0 % increase in direct repair costs related to new engine emissions requirements mandated by the epa , various requirements imposed by california 's air resources board , the higher mileage equipment remaining in our fleet and the increase in the cost of parts and tires . our average tractor age at december 31 , 2012 was 32 months compared to 28 months at december 31 , 2011 , whereas our average trailer age was 77 months and 71 months , respectively .\n","pricing typically falls at longer lengths-of-haul , so the fact that we grew both simultaneously indicates improving lane flow ( directionality , density , and market selection ) . we realigned our customer base during the fourth quarter , including the replacement of four of our top 25 trucking shippers , while reducing concentration with our largest shippers . we expect some of our new customers to grow into our top 25 customer list in the first half of 2013. the improved freight mix and the better operational execution helped us to increase miles per seated tractor per week by 1.3 % to 1,931 miles . the heightened empty mile factor ( up 92 basis points to 12.0 % ) suggests that we still need additional freight volume to better utilize our equipment . we are executing a detailed strategy that we believe will grow volumes in specific markets and lanes during this winter 's freight bidding season . perhaps our largest accomplishment during the fourth quarter involved cutting our unseated tractor count by more than 50 % , to 92 from 213 sequentially versus the third quarter of 2012. the seated tractor count growth was made possible primarily by lower driver turnover , which improved throughout the fourth quarter to an annualized rate of 83 % in december 2012 , compared to 107 % in december 2011. we attribute the improvement to enhanced company-wide focus on driver retention , freight better suited to our network , and more consistent miles . the combination of our seated tractor count and greater miles per seated tractor led to a 5.5 % improvement in overall tractor utilization to 1,850 miles per in-service tractor per week . the key operating metric charts below ( miles per seated tractor per week , loaded revenue per mile , unseated tractors , and base revenue per tractor per week ) reflect the results we have experienced for the periods indicated . 20 our scs segment continued to deliver strong performance , growing base revenue by 17.6 % and operating income by 13.4 % in the fourth quarter . gross margin expanded by\n"


In [8]:

print(f"Train dataset size: {len(ds_fin['train'])}")
print(f"Test dataset size: {len(ds_fin['test'])}")


Train dataset size: 33640
Test dataset size: 4204


In [9]:
#!pip install tensorflow keras  transformers torch --quiet


In [10]:
#!pip install tensorflow==2.8.0

In [11]:
# Foundational model output 1 : Summarization
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id="t5-small"
sample_record = ds_fin["train"][0]
sample="summarize: " + sample_record["document"]
tokenizer = AutoTokenizer.from_pretrained(model_id)
input_tokens = tokenizer(sample, padding='max_length', max_length=512, truncation=True, return_tensors='pt')
model_id="t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
# let us save the model
model.save_pretrained('./models/base')
result_sample = model.generate(**input_tokens, max_length=200, top_k=3, temperature=0.5)
input_tokens = tokenizer(sample, padding='max_length', max_length=512, truncation=True, return_tensors='pt')
print(tokenizer.decode(result_sample[0], truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"]))

<pad> lcfh's financial statements reflect our plans, estimates and beliefs. the forward-looking statements could differ materially from those discussed in the forward-looking statements. we are a leading commercial real estate finance company with a proprietary loan origination platform and an established national footprint.</s>


In [12]:
# Foundational model output 2 : Translation
input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text,  max_length=512,truncation=True,  return_tensors="pt")

outputs = model.generate(**input_ids, max_length=200, top_k=3, temperature=0.5)
print(tokenizer.decode(outputs[0]))

<pad> Wie alt sind Sie?</s>


To train our model, we need to convert our inputs (text) to token IDs.

Before we can start training, we need to preprocess our data. Abstractive Summarization is a text-generation task. Our model will take a text as input and generate a summary as output. We want to understand how long our input and output will take to batch our data efficiently.

In [13]:
# Vish
from datasets import concatenate_datasets
import numpy as np
# The maximum total input sequence length after tokenization.
# Sequences longer than this will be truncated, sequences shorter will be padded.
tokenized_inputs = concatenate_datasets([ds_fin["train"], ds_fin["test"]]).map(lambda x: tokenizer(x["document"], truncation=True), batched=True, remove_columns=["document", "summary"])
input_lenghts = [len(x) for x in tokenized_inputs["input_ids"]]
# take 85 percentile of max length for better utilization
max_source_length = int(np.percentile(input_lenghts, 85))
print(f"Max source length: {max_source_length}")

# The maximum total sequence length for target text after tokenization.
# Sequences longer than this will be truncated, sequences shorter will be padded."
tokenized_targets = concatenate_datasets([ds_fin["train"], ds_fin["test"]]).map(lambda x: tokenizer(x["summary"], truncation=True), batched=True, remove_columns=["document", "summary"])
target_lenghts = [len(x) for x in tokenized_targets["input_ids"]]
# take 90 percentile of max length for better utilization
max_target_length = int(np.percentile(target_lenghts, 90))
print(f"Max target length: {max_target_length}")

Map: 100%|███████████████████████| 37844/37844 [00:31<00:00, 1186.42 examples/s]


Max source length: 512


Map: 100%|███████████████████████| 37844/37844 [00:09<00:00, 4132.35 examples/s]


Max target length: 512


<b>We preprocess our dataset before training and save it to disk.

In [14]:
#vish
def preprocess_function(sample,padding="max_length"):
    # add prefix to the input for t5
    inputs = ["summarize: " + item for item in sample["document"]]

    # tokenize inputs
    model_inputs = tokenizer(inputs, max_length=max_source_length, padding=padding, truncation=True)

    # Tokenize targets with the `text_target` keyword argument
    labels = tokenizer(text_target=sample["summary"], max_length=max_target_length, padding=padding, truncation=True)

    # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
    # padding in the loss.
    if padding == "max_length":
        labels["input_ids"] = [
            [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
        ]

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = ds_fin.map(preprocess_function, batched=True, remove_columns=["document", "summary"])
print(f"Keys of tokenized dataset: {list(tokenized_dataset['train'].features)}")

# save datasets to disk for later easy loading/ For very large Datasets, we can even save it to the Datalake or an S3 Bucket
tokenized_dataset["train"].save_to_disk("data/train")
tokenized_dataset["test"].save_to_disk("data/eval")

Map: 100%|████████████████████████| 33640/33640 [00:52<00:00, 638.95 examples/s]
Map: 100%|██████████████████████████| 4204/4204 [00:06<00:00, 642.12 examples/s]


Keys of tokenized dataset: ['input_ids', 'attention_mask', 'labels']


Saving the dataset (1/1 shards): 100%|█| 33640/33640 [00:02<00:00, 14679.19 exam
Saving the dataset (1/1 shards): 100%|█| 4204/4204 [00:00<00:00, 17459.17 exampl


## 3. Fine-Tune T5 with LoRA and bnb int-8

In addition to the LoRA technique, we will use [bitsanbytes LLM.int8()](https://huggingface.co/blog/hf-bitsandbytes-integration) to quantize out frozen LLM to int8. This allows us to reduce the needed memory for FLAN-T5 XXL ~4x.  

The first step of our training is to load the model. We are going to use [philschmid/flan-t5-xxl-sharded-fp16](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16), which is a sharded version of [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl). The sharding will help us to not run off of memory when loading the model.

In [15]:
from transformers import AutoModelForSeq2SeqLM
import torch

#Vish
model_id = "t5-small"

# model_id = "google/flan-t5-xl"

# with torch.autocast("cuda"):
  # load model from the hub
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto")





Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/cdsw/.local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)


Now, we can prepare our model for the LoRA int-8 training using `peft`.

In [16]:


from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

# Define LoRA Config
lora_config = LoraConfig(
 r=16,
 lora_alpha=32,
 target_modules=["q", "v"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.SEQ_2_SEQ_LM
)
# prepare int-8 model for training - Commenting Vish
model = prepare_model_for_int8_training(model)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# trainable params: 18874368 || all params: 11154206720 || trainable%: 0.16921300163961817

trainable params: 589824 || all params: 61096448 || trainable%: 0.9653981848502878


### PEFT Training:
As you can see, here we are only training 0.97% of the parameters of the model! This huge memory gain will enable us to fine-tune the model without memory issues.
Next is to create a `DataCollator` that will take care of padding our inputs and labels. We will use the `DataCollatorForSeq2Seq` from the 🤗 Transformers library.

In [17]:
from transformers import DataCollatorForSeq2Seq

# we want to ignore tokenizer pad token in the loss
label_pad_token_id = -100
# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    label_pad_token_id=label_pad_token_id,
    pad_to_multiple_of=8
)

The last step is to define the hyperparameters (`TrainingArguments`) we want to use for our training.

In [18]:
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

output_dir="lora-flan-t5-small"

# Define training args
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # higher learning rate
    num_train_epochs=2,
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=500,
    save_strategy="no",
    report_to="tensorboard",
)

# Create Trainer instance
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_dataset["train"],
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

Let's now train our model and run the cells below. Note that for T5, some layers are kept in `float32` for stability purposes.

In [19]:
#  with torch.autocast("cuda"):
trainer.train(resume_from_checkpoint = True)

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
500,3.9954
1000,3.7311
1500,3.6296
2000,3.5797
2500,3.5321
3000,3.4817
3500,3.4817
4000,3.4921
4500,3.446
5000,3.4393


TrainOutput(global_step=8410, training_loss=3.511894293205633, metrics={'train_runtime': 6316.9204, 'train_samples_per_second': 10.651, 'train_steps_per_second': 1.331, 'total_flos': 9227703681024000.0, 'train_loss': 3.511894293205633, 'epoch': 2.0})

The training took ~10:36:00 and cost `~13.22$` for 10h of training. For comparison a [full fine-tuning on FLAN-T5-XXL](https://www.philschmid.de/fine-tune-flan-t5-deepspeed#3-results--experiments) with the same duration (10h) requires 8x A100 40GBs and costs ~322$.

We can save our model to use it for inference and evaluate it. We will save it to disk for now

In [20]:
# Save our LoRA model & tokenizer results
peft_model_id="results"
trainer.model.save_pretrained(peft_model_id)
tokenizer.save_pretrained(peft_model_id)


('results/tokenizer_config.json',
 'results/special_tokens_map.json',
 'results/tokenizer.json')

In [21]:
# if you want to save the base model to call
peft_model_id="results_base"
trainer.model.base_model.save_pretrained(peft_model_id)



Our LoRA checkpoint is only 84MB small and includes all of the learnt knowleddge for samsum.

## 4. Evaluate & run Inference with LoRA FLAN-T5

After the training is done we want to evaluate and test it. The most commonly used metric to evaluate summarization task is [rogue_score](https://en.wikipedia.org/wiki/ROUGE_(metric)) short for Recall-Oriented Understudy for Gisting Evaluation). This metric does not behave like the standard accuracy: it will compare a generated summary against a set of reference summaries.

We are going to use `evaluate` library to evaluate the `rogue` score. We can run inference using `PEFT` and `transformers`. For our FLAN-T5 XXL model, we need at least 18GB of GPU memory.

In [22]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load peft config for pre-trained checkpoint etc.
peft_model_id = "results"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  load_in_8bit=True,  device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()

print("Peft model loaded")



Peft model loaded


In [23]:
#Vish : Let us look at an unseen data from Train dataset
unseen_dataset=  ds_fin["test"]

In [26]:
show_random_elements(unseen_dataset, 1)

Unnamed: 0,document,summary
0,"consulting includes health , retirement , talent and investments consulting services and products , and specialized management , economic and brand consulting services . the company conducts business in this segment through mercer and oliver wyman group . we describe the primary sources of revenue and categories of expense for each segment below , in our discussion of segment financial results . a reconciliation of segment operating income to total operating income is included in note 15 to the consolidated financial statements included in part ii , item 8 in this report . the accounting policies used for each segment are the same as those used for the consolidated financial statements . this management 's discussion & analysis ( `` md & a '' ) contains forward-looking statements as that term is defined in the private securities litigation reform act of 1995. see `` information concerning forward-looking statements '' at the outset of this report . story_separator_special_tag style= '' padding-left:0px ; text-indent:0px ; line-height : normal ; padding-top:10px ; '' > replace_table_token_3_th * components of revenue change may not add due to rounding . the following table provides more detailed revenue information for certain of the components presented above : replace_table_token_4_th underlying revenue measures the change in revenue using consistent currency exchange rates , excluding the impact of certain items that affect comparability such as : acquisitions , dispositions , transfers among businesses and the deconsolidation of marsh india . effective january 1 , 2017 , mercer established a wealth business reflecting a unified client strategy for its former retirement and investment business . the 2016 information in the chart above has been conformed to the current presentation . * components of revenue change may not add due to rounding . 34 replace_table_token_5_th * components of revenue change may not add due to rounding . the following table provides more detailed revenue information for certain of the components presented above : replace_table_token_6_th underlying revenue measures the change in revenue using consistent currency exchange rates , excluding the impact of certain items that affect comparability such as : acquisitions , dispositions and transfers among businesses . for 2015 , the impact of a $ 37 million gain from the disposal of mercer 's u.s. defined contribution recordkeeping business is included in acquisitions/dispositions in mercer 's defined benefit consulting & administration business . * components of revenue change may not add due to rounding . revenue consolidated revenue was $ 14 billion in 2017 , an increase of 6 % , or 3 % on an underlying basis . revenue in the risk and insurance services segment increased 7 % in 2017 compared with 2016 , or 3 % on an underlying basis . revenue increased 3 % and 4 % on an underlying basis at marsh and guy carpenter , respectively , as compared with 2016. the consulting segment 's revenue increased 5 % compared with 2016 , or 4 % on an underlying basis . revenue increased 2 % and 7 % on an underlying basis at mercer and oliver wyman group , respectively , as compared with 2016 . 35 consolidated revenue was $ 13.2 billion in 2016 , an increase of 2 % , or 3 % on an underlying basis . revenue in the risk and insurance services segment increased 4 % in 2016 compared with 2015 , or 3 % on an underlying basis . revenue increased 3 % and 2 % on an underlying basis at marsh and guy carpenter , respectively , as compared with 2015. the consulting segment 's revenue increased 1 % on a reported basis compared with 2015 , or 3 % on an underlying basis . both mercer and oliver wyman group 's revenue increased 3 % on an underlying basis compared with 2015. operating expense consolidated operating expenses increased 6 % in 2017 compared with 2016 , or 2 % on an underlying basis . the increase in underlying expenses was primarily due to higher base salaries and incentive compensation costs , and the pension settlement charge discussed previously , partly offset by lower costs related to liabilities for errors and omissions . consolidated operating expenses increased 1 % in 2016 compared with the same period in 2015 on both a reported and underlying basis . the underlying expense increase reflects higher base salary costs , higher amortization of identified intangible assets and the impact of the net benefit from the termination of the company 's post-65 retiree medical reimbursement plan in the united states ( the `` rra plan '' ) , which was recorded in the first quarter of 2015 , partly offset by decreases in defined benefit plan pension expense and contingent acquisition consideration expense . risk and insurance services in the risk and insurance services segment , the company 's subsidiaries and other affiliated entities act as brokers , agents or consultants for insureds , insurance underwriters and other brokers in the areas of risk management , insurance broking and insurance program management services , primarily under the name of marsh ; and engage in reinsurance broking , catastrophe and financial modeling services and related advisory functions , primarily under the name of guy carpenter . marsh and guy carpenter are compensated for brokerage and consulting services primarily through fees paid by clients or commissions paid out of premiums charged by insurance and reinsurance companies . commission rates vary in amount depending upon the type of insurance or reinsurance coverage provided , the particular insurer or reinsurer , the capacity in which the broker acts and negotiates with clients . revenues can be affected by premium rate levels in the insurance/reinsurance markets , the amount of risk retained by insurance and reinsurance clients themselves and by the value of the risks that have been insured since commission-based compensation is frequently related to the premiums paid by insureds/reinsureds . story_separator_special_tag mercer , principally through its health line of business , also earns revenue in the form of commissions received from insurance companies for the placement of group ( and occasionally individual ) insurance contracts , primarily life , health and accident coverages . revenue for mercer 's investment management business and certain of mercer 's defined contribution administration services consists principally of fees based on assets under management or administration . revenue in the consulting segment is affected by , among other things , global economic conditions , including changes in clients ' particular industries and markets . revenue is also affected by competition due to the introduction of new products and services , broad trends in employee demographics , including levels of employment , the effect of government policies and regulations , and fluctuations in interest and foreign exchange rates . revenues from the provision of investment management services and retirement trust and administrative services are significantly affected by the level of assets under management or administration and securities market performance . for the investment management business , revenues from the majority of funds are included on a gross basis in accordance with u.s. gaap and include reimbursable expenses incurred by professional staff and sub-advisory fees , and the related expenses are included in other operating expenses . the results of operations for the consulting segment are presented below : replace_table_token_8_th revenue consulting revenue in 2017 increased 5 % compared with 2016 , reflecting a 4 % increase on an underlying basis and 2 % growth from acquisitions . mercer 's revenue increased 5 % to $ 4.5 billion over the prior year , or 2 % on an underlying basis . mercer 's year over year revenue comparison also reflects an increase of 2 % from acquisitions . the underlying revenue growth reflects an increase in career of 5 % , health of 2 % and wealth of 2 % . within wealth , investment management & related services increased 10 % while defined benefit consulting & administration decreased 2 % compared with the prior year . oliver wyman group 's revenue increased 7 % in 2017 compared with 2016 , for both a reported and underlying basis . the consulting segment completed three acquisitions during 2017 . information regarding these acquisitions is included in note 4 to the consolidated financial statements . consulting revenue in 2016 increased 1 % compared with 2015 , reflecting a 3 % increase on an underlying basis offset by a 2 % decrease from the impact of foreign currency translation . mercer 's revenue of $ 4.3 billion was flat when compared with 2015 but increased 3 % on an underlying basis . mercer 's year over year revenue comparison reflects a decrease of 2 % from the impact of foreign currency translation . the underlying revenue growth reflects an increase in wealth of 2 % , health of 3 % and career of 5 % . within wealth , investment management & related services increased 6 % while defined benefit consulting & administration was flat compared with 2015. oliver wyman group 's revenue increased 2 % in 2016 38 compared with 2015 , reflecting an increase of 3 % on an underlying basis , partly offset by a decrease of 2 % from the impact of foreign currency translation . the consulting segment completed six acquisitions during 2016 . expense consulting expense in 2017 increased 5 % compared with 2016 , reflecting an increase of 3 % on an underlying basis and a 3 % increase from the impact of acquisitions . the increase in underlying expense reflects higher base salaries , asset based fees and outside service costs , partly offset by lower severance costs and lower costs related to liabilities for errors and omissions . consulting expense in 2016 was essentially flat compared with 2015 , reflecting an increase of 2 % on an underlying basis offset by a 2 % decrease from the impact of foreign currency translation . the increase in underlying expense reflects higher base salaries and the impact of the net benefit from the termination of the rra plan which was recorded in the first quarter of 2015 , partly offset by lower defined benefit plan pension expense . corporate and other corporate expense in 2017 was $ 189 million compared with $ 192 million in 2016. the decrease in expense is primarily due to lower consulting , occupancy and general insurance costs . corporate expense in 2016 was $ 192 million compared with $ 195 million in 2015 , reflecting lower executive compensation and lower defined benefit pension costs . other corporate items interest interest income earned on corporate funds amounted to $ 9 million in 2017 compared with $ 5 million in 2016. interest expense in 2017 was $ 237 million compared with $ 189 million in 2016. the increase in interest expense was primarily due to higher average debt outstanding in 2017. interest income earned on corporate funds amounted to $ 5 million in 2016 compared with $ 13 million in 2015. the decrease is due to the combined effects of a lower level of invested funds and lower interest rates . interest expense in 2016 was $ 189 million compared with $ 163 million in 2015 due to higher average outstanding debt in 2016. investment income the caption `` investment income ( loss ) '' in the consolidated statements of income comprises realized and unrealized gains and losses from investments recognized in current earnings . it includes , when applicable , other-than-temporary declines in the value of debt and available-for-sale securities and equity method gains or losses on its investment in private equity funds . the company 's investments may include direct investments in insurance , consulting and related companies and investments in private equity funds .\n","consolidated results of operations replace_table_token_2_th 31 in 2017 , the company 's results of operations and earnings per share were impacted negatively , in part , as a result of two significant items in 2017 : u.s. tax reform - on december 22 , 2017 , the u.s. enacted comprehensive tax legislation commonly referred to as the tax cuts and jobs act ( the `` tcja '' ) . the tcja provides for a reduction in the u.s. corporate tax rate to 21 % and the creation of a territorial tax system . the tcja also changes the deductibility of certain expenses , primarily executive officers compensation . an aggregate charge of $ 460 million was recorded in the fourth quarter of 2017 as a result of the enactment of the tcja . the tcja provides for a transition to the territorial system through a deemed repatriation tax ( the `` transition tax '' ) on undistributed earnings of non-u.s. subsidiaries . the company recorded a provisional charge of $ 240 million in the fourth quarter of 2017 as an estimate of u.s. transition taxes and ancillary effects , including state taxes and foreign withholding taxes related to the change in permanent reinvestment status with respect to our pre-2018 foreign earnings . this transition tax is payable over eight years . the reduction of the u.s. corporate tax rate from 35 % to 21 % , reduces the value of the u.s. deferred tax assets and liabilities , accordingly , a net charge of $ 220 million was recorded . a more complete discussion of the tcja and its impact on the company 's results is included under the heading `` income taxes '' .\n"


In [27]:
import random
sample_data= unseen_dataset[random.randrange(len(unseen_dataset))]

In [28]:
sample_data["document"]

"as a result of the execution of the settlement agreement , a stipulated motion for dismissal with prejudice was filed with the court which includes a form of order of dismissal with prejudice ( the “ court order ” ) . on may 15 , 2017 , the court order was executed by the judge and the suit was formally dismissed with prejudice . note 10. related party transactions notes payable in 2014 , the company issued notes payable to related parties in the amount of $ 2 million . the notes bear interest at 7.5 % and were scheduled to mature on january 2 , 2017. the company did not pay these notes upon maturity as the company and the related parties informally agreed to offset these notes payable with the related-party note receivable . during the year , the company made principal payments and interest payments of $ 80,000 related to the notes payable . additionally , the company applied $ 207,942 in principal and interest due to the company on the related party note receivable ( see note 6 – re

In [29]:
input_ids = tokenizer(sample_data["document"], return_tensors="pt", truncation=True).input_ids.cuda()
#with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens= 100, do_sample=True, top_p=0.8)
print(f"input sentence: {sample_data['document']}\n{'---'* 20}")

print(f"summary:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")

input sentence: as a result of the execution of the settlement agreement , a stipulated motion for dismissal with prejudice was filed with the court which includes a form of order of dismissal with prejudice ( the “ court order ” ) . on may 15 , 2017 , the court order was executed by the judge and the suit was formally dismissed with prejudice . note 10. related party transactions notes payable in 2014 , the company issued notes payable to related parties in the amount of $ 2 million . the notes bear interest at 7.5 % and were scheduled to mature on january 2 , 2017. the company did not pay these notes upon maturity as the company and the related parties informally agreed to offset these notes payable with the related-party note receivable . during the year , the company made principal payments and interest payments of $ 80,000 related to the notes payable . additionally , the company applied $ 207,942 in principal and interest due to the company on the related party note receivable ( 

Nice! our model works! Now, lets take a closer look and evaluate it against the `test` set of processed dataset from `samsum`. Therefore we need to use and create some utilities to generate the summaries and group them together. The most commonly used metrics to evaluate summarization task is [rogue_score](https://en.wikipedia.org/wiki/ROUGE_(metric)) short for Recall-Oriented Understudy for Gisting Evaluation). This metric does not behave like the standard accuracy: it will compare a generated summary against a set of reference summaries.

In [None]:
import evaluate
import numpy as np
from datasets import load_from_disk
from tqdm import tqdm

# Metric
metric = evaluate.load("rouge")

def evaluate_peft_model(sample,max_target_length=50):
    # generate summary
    outputs = model.generate(input_ids=sample["input_ids"].unsqueeze(0).cuda(), do_sample=True, top_p=0.9, max_new_tokens=max_target_length)
    prediction = tokenizer.decode(outputs[0].detach().cpu().numpy(), skip_special_tokens=True)
    # decode eval sample
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(sample['labels'] != -100, sample['labels'], tokenizer.pad_token_id)
    labels = tokenizer.decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    return prediction, labels

# load test dataset from distk
test_dataset = load_from_disk("data/eval/").with_format("torch")

# run predictions
# this can take ~45 minutes
predictions, references = [] , []
for sample in tqdm(test_dataset):
    p,l = evaluate_peft_model(sample)
    predictions.append(p)
    references.append(l)

# compute metric
rogue = metric.compute(predictions=predictions, references=references, use_stemmer=True)

# print results
print(f"Rogue1: {rogue['rouge1']* 100:2f}%")
print(f"rouge2: {rogue['rouge2']* 100:2f}%")
print(f"rougeL: {rogue['rougeL']* 100:2f}%")
print(f"rougeLsum: {rogue['rougeLsum']* 100:2f}%")

# Rogue1: 50.386161%
# rouge2: 24.842412%
# rougeL: 41.370130%
# rougeLsum: 41.394230%

Downloading builder script: 100%|██████████| 6.27k/6.27k [00:00<00:00, 6.18MB/s]
  3%|█                                     | 112/4204 [04:06<2:29:46,  2.20s/it]

Our PEFT fine-tuned FLAN-T5-XXL achieved a rogue1 score of `50.38%` on the test dataset. For comparison a [full fine-tuning of flan-t5-base achieved a rouge1 score of 47.23](https://www.philschmid.de/fine-tune-flan-t5). That is a `3%` improvements.

It is incredible to see that our LoRA checkpoint is only 84MB small and model achieves better performance than a smaller fully fine-tuned model.