# Hugging Face - Summarization in Japanese

This source code builds the fine-tuned model of [google/mt5-small](https://huggingface.co/google/mt5-small) for Japanese summarization.

For more background and details, see [this blog post](https://tsmatz.wordpress.com/2022/11/25/huggingface-japanese-summarization/).

*back to [index](https://github.com/tsmatz/huggingface-finetune-japanese/)*

## Install required packages

In order to install core components, see [Readme](https://github.com/tsmatz/huggingface-finetune-japanese/).<br>
Install additional packages for running this notebook as follows.

Install packages depending on T5 tokenizer.

In [None]:
!pip install evaluate
!pip install transformers
!pip install datasets
!pip install transformers[torch]
!pip install accelerate -U



In [None]:
!pip install protobuf==3.20.3



Install packages depending on rouge evaluation.

In [None]:
!pip install absl-py rouge_score nltk



Install other dependent packages.

In [None]:
!pip install numpy



## Check device

Check whether GPU is available.

In [None]:
import torch

if torch.cuda.is_available():
    print("GPU is enabled.")
    print("device count: {}, current device: {}".format(torch.cuda.device_count(), torch.cuda.current_device()))
else:
    print("GPU is not enabled.")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

GPU is enabled.
device count: 1, current device: 0


## Prepare data

In this example, we use [XL-Sum Japanese dataset](https://huggingface.co/datasets/csebuetnlp/xlsum/viewer/japanese) in Hugging Face, which is the annotated article-summary pairs generated by BBC.<br>
This dataset has around 7000 samples for training.

In [None]:
from datasets import load_dataset

ds = load_dataset("csebuetnlp/xlsum", name="burmese")
ds

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


DatasetDict({
    train: Dataset({
        features: ['id', 'url', 'title', 'summary', 'text'],
        num_rows: 4569
    })
    test: Dataset({
        features: ['id', 'url', 'title', 'summary', 'text'],
        num_rows: 570
    })
    validation: Dataset({
        features: ['id', 'url', 'title', 'summary', 'text'],
        num_rows: 570
    })
})

In [None]:
ds["train"][0]

{'id': '151203_syria_uk_airstrikes',
 'url': 'https://www.bbc.com/burmese/world/2015/12/151203_syria_uk_airstrikes',
 'title': 'ဆီးရီးယား IS တွေကို ယူကေ လေကြောင်းတိုက်ခိုက်',
 'summary': 'ဗြိတိသျှ တော်ဝင် လေတပ်ရဲ့ တိုနေဒိုး တိုက်လေယာဉ်တွေက ဆီးရီးယားမှာ ရှိတဲ့ IS အစ္စလာမ္မစ် နိုင်ငံအဖွဲ့ အပေါ် လေကြောင်း တိုက်ခိုက်မှုတွေ လုပ်ခဲ့တယ်လို့ ယူကေ ကာကွယ်ရေး ဝန်ကြီးဌာနက အတည်ပြု ပြောဆိုခဲ့ပါတယ်။',
 'text': 'IS ပိုင် ရေနံတွင်းတွေကို ဗြိတိသျှ တိုက်လေယာဉ် ၄ စီးက တိုက်ခိုက်ခဲ့ ဗုံးကြဲ တိုက်ခိုက်ဖို့ အတွက် ယူကေ အမတ်တွေ အတည်ပြုပြီး မကြာမီမှာပဲ ဆိုက်ပရပ်စ်မှာ ရှိတဲ့ Akrotiri အက်ရော့တီရီ လေတပ်စခန်းက တိုနေးဒိုး တိုက်လေယာဉ် ၄ စီး တိုက်ခိုက်မှုမှာ ပါဝင် ခဲ့ပါတယ်။ ဆီးရီးယား အရှေ့ပိုင်းက IS တွေ ထိန်းချုပ်ထားတဲ့ ရေနံတွင်းတွေကို တိုက်ခိုက် ထိမှန်ခဲ့တယ်လို့ ကာကွယ်ရေး ဝန်ကြီးက ပြောပါတယ်။ ဝန်ကြီးချုပ် ဒေးဗစ် ကင်မရွန်း ကတော့ IS အပေါ် တိုက်ခိုက်မှုတွေဟာ အချိန် ကြာမြင့်မှာ ဖြစ်ပြီး ပုံမှန် ပြုလုပ်သွားဖို့ လိုတယ်လို့ သတိပေး ပြောဆို ခဲ့ပါတယ်။ IS အပေါ် လေကြောင်း တိုက်ဖို့အတွက် ယူကေ ပါလီမန်မှာ ဗုဒ္ဓဟူးနေ့က ဆွေးနွေးငြင်းခ

To generate inputs for fine-tuning, now I tokenize each text and convert into token ids.

First, load tokenizer in pre-trained ```google/mt5-small``` model.

In [None]:
from transformers import AutoTokenizer

t5_tokenizer = AutoTokenizer.from_pretrained("google/mt5-small")

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


For fine-tuning, apply tokenization for dataset.

In [None]:
def tokenize_sample_data(data):
    # Max token size is 14536 and 215 for inputs and labels, respectively.
    # Here I restrict these token size.
    input_feature = t5_tokenizer(data["text"], truncation=True, max_length=1024)
    label = t5_tokenizer(data["summary"], truncation=True, max_length=128)
    return {
        "input_ids": input_feature["input_ids"],
        "attention_mask": input_feature["attention_mask"],
        "labels": label["input_ids"],
    }

In [None]:
tokenized_ds = ds.map(
    tokenize_sample_data,
    remove_columns=["id", "url", "title", "summary", "text"],
    batched=True,
    batch_size=128)
tokenized_ds

Map:   0%|          | 0/570 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 4569
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 570
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 570
    })
})

## Fine-tune

In this example, we use mT5 model.<br>
There exist several sizes of mT5 and I'll use small one (```google/mt5-small```) to fit to memory in my machine. The name is "small", but it's still so large.

In [None]:
from transformers import AutoConfig, AutoModelForSeq2SeqLM

# see https://huggingface.co/docs/transformers/main_classes/configuration
mt5_config = AutoConfig.from_pretrained(
    "google/mt5-small",
    max_length=128,
    length_penalty=0.6,
    no_repeat_ngram_size=2,
    num_beams=15,
)
model = (AutoModelForSeq2SeqLM
         .from_pretrained("google/mt5-small", config=mt5_config)
         .to(device))

We prepare data collator, which works for preprocessing data.

For the sequence-to-sequence (seq2seq) task, we need to not only stack the inputs for encoder, but also prepare for the decoder side. In seq2seq setup, a common technique called "teach forcing" will then be applied in decoder.<br>
These tasks are not needed to manually setup in Hugging Face, and ```DataCollatorForSeq2Seq``` will take care of all steps.

In this collator, the padded token will also be filled with label id -100.<br>
This token will then be ignored in the sebsequent loss computation and evaluation.

In [None]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(
    t5_tokenizer,
    model=model,
    return_tensors="pt")

We also prepare metrics function for evaluation in the training.<br>
Measuring the quality of generated text is very difficult, and BLEU and ROUGE are often used.

Briefly speaking, BLEU measures how many of n-grams in the generated (predicted) text are overlaped in the reference text. This score is used for evaluation, especially in the machine translation task.
However, in summarization, we need all important words (which appears on the reference text) in the generated text. This is because we often use ROUGE in summarization tasks.
The idea of ROUGE is similar to BLEU, but it also measures how many of n-grams in the reference text appears in the generated (predicted) text. (This is why the name of ROUGE includes "RO", which means "Recall-Oriented".)<br>
There also exist variations, ROUGE-L and ROUGE-Lsum, which also measures the longest common substrings (LCS).

In Hugging Face, you don't need to manually implement these logics and can use built-in objects for scoring these matrics.<br>
In this example, I have configured mT5 tokenization as custom tokenization in computation (which is based on SentencePiece Unigram segmentation), because the white space tokenization is used as default in ROUGE evaluation.

> Note : You can also specify multilingual stemmer.

> Note : As I have mentioned above, the padded token id becomes -100 by data collator and I then also convert it into padded token id before processing.

In [None]:
import evaluate
import numpy as np
from nltk.tokenize import RegexpTokenizer

rouge_metric = evaluate.load("rouge")

def tokenize_sentence(arg):
    encoded_arg = t5_tokenizer(arg)
    return t5_tokenizer.convert_ids_to_tokens(encoded_arg.input_ids)

def metrics_func(eval_arg):
    preds, labels = eval_arg
    # Replace -100
    labels = np.where(labels != -100, labels, t5_tokenizer.pad_token_id)
    # Convert id tokens to text
    text_preds = t5_tokenizer.batch_decode(preds, skip_special_tokens=True)
    text_labels = t5_tokenizer.batch_decode(labels, skip_special_tokens=True)
    # Insert a line break (\n) in each sentence for ROUGE scoring
    # (Note : Please change this code, when you perform on other languages except for Japanese)
    text_preds = [(p if p.endswith(("!", "！", "?", "？", "。", "။" , "၊" )) else p + "。") for p in text_preds]
    text_labels = [(l if l.endswith(("!", "！", "?", "？", "。")) else l + "。") for l in text_labels]
    #sent_tokenizer_jp = RegexpTokenizer(u'[^!！?？。]*[!！?？。]')
    sent_tokenizer_jp = RegexpTokenizer(u'[^!！?？。။၊]*[!！?？。။၊]')

    text_preds = ["\n".join(np.char.strip(sent_tokenizer_jp.tokenize(p))) for p in text_preds]
    text_labels = ["\n".join(np.char.strip(sent_tokenizer_jp.tokenize(l))) for l in text_labels]
    # compute ROUGE score with custom tokenization
    return rouge_metric.compute(
        predictions=text_preds,
        references=text_labels,
        tokenizer=tokenize_sentence
    )

Before fine-tuning, now I check ROUGE score with plain mT5 model. Here I check scores for top 5 rows in test dataset.

The score is very low, because this model is not trained for any downstream tasks. (It's just trained by unsupervised approach.)

> Note : In order to avoid suboptimal text generation, here I have applied beam search for the text generation algorithm.

In [None]:
from torch.utils.data import DataLoader

sample_dataloader = DataLoader(
    tokenized_ds["test"].with_format("torch"),
    collate_fn=data_collator,
    batch_size=5)
for batch in sample_dataloader:
    with torch.no_grad():
        preds = model.generate(
            batch["input_ids"].to(device),
            num_beams=15,
            num_return_sequences=1,
            no_repeat_ngram_size=1,
            remove_invalid_values=True,
            max_length=128,
        )
    labels = batch["labels"]
    break

metrics_func([preds, labels])

# {'rouge1': 0.21596275649523547,
#  'rouge2': 0.1224458187250594,
#  'rougeL': 0.2147220617061536,
#  'rougeLsum': 0.2082704488029278}

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'rouge1': 0.21596275649523547,
 'rouge2': 0.1224458187250594,
 'rougeL': 0.2147220617061536,
 'rougeLsum': 0.2082704488029278}

We prepare training arguments for fine-tuning.<br>
In this example, we use HuggingFace transformer trainer class, with which you can run training without manually writing training loop.

In usual training evaluation, training loss and accuracy will be computed and evaluated, by comparing the generated logits with labels. However, as we saw above, we want to evaluate ROUGE score using the predicted tokens.<br>
To simplify these sequence-to-sequence specific steps, here I use built-in ```Seq2SeqTrainingArguments``` and ```Seq2SeqTrainer``` classes in HuggingFace, instead of usual ```TrainingArguments``` and ```Trainer```.<br>
By setting ```predict_with_generate=True``` in this class, the predicted tokens generated by  ```model.generate()``` will be used in each evaluation.

The checkpoint files (in each 500 steps) are saved in the folder named ```mt5-summarize-ja```.

> Note : Do not use FP16 precision in mT5 fine-tuning.

> Note : In general, the saved checkpoints in the training will become so large.<br>
> Set ```save_total_limit``` property (which limits the total amount of checkpoints by deleting the older ones) to save disk spaces, or expand disks in Azure VM. (See [here](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/expand-disks) to expand disks in Azure.)

In [None]:
from transformers import Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    output_dir = "mt5-summarize-ja",
    log_level = "error",
    num_train_epochs = 10,
    learning_rate = 5e-4,
    lr_scheduler_type = "linear",
    warmup_steps = 90,
    optim = "adafactor",
    weight_decay = 0.01,
    per_device_train_batch_size = 2,
    per_device_eval_batch_size = 1,
    gradient_accumulation_steps = 16,
    evaluation_strategy = "steps",
    eval_steps = 100,
    predict_with_generate=True,
    generation_max_length = 128,
    save_steps = 500,
    logging_steps = 10,
    push_to_hub = False
)

Build trainer. (Put it all together.)

Because the cost of evaluation computation (ROUGE scoring) is so high, I have then decreased the number of rows in validation set.

In [None]:
from transformers import Seq2SeqTrainer
trainer = Seq2SeqTrainer(
    model = model,
    args = training_args,
    data_collator = data_collator,
    compute_metrics = metrics_func,
    train_dataset = tokenized_ds["train"],
    eval_dataset = tokenized_ds["validation"].select(range(20)),
    tokenizer = t5_tokenizer,
)

Now let's run training.<br>
As you will find, ROUGE scores are growing during training.

> Note : As I have mentioned above, make sure that you have enough disk space.

In [None]:
trainer.train()
# TrainOutput(global_step=1420, training_loss=2.672717551110496, metrics={'train_runtime': 5334.6332, 'train_samples_per_second': 8.565, 'train_steps_per_second': 0.266,
#  'total_flos': 3.435385056043008e+16, 'train_loss': 2.672717551110496, 'epoch': 9.94})

Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,3.3241,2.479476,0.311584,0.152303,0.262827,0.263187
200,2.7583,2.270957,0.322364,0.14529,0.272528,0.272215
300,2.5469,2.293579,0.359305,0.177212,0.297784,0.296315
400,2.5335,2.191272,0.335907,0.15273,0.283324,0.28403
500,2.4383,2.150708,0.378321,0.193331,0.325641,0.324883
600,2.3671,2.133783,0.355929,0.175622,0.298318,0.298553
700,2.349,2.108918,0.37839,0.183986,0.313601,0.312135
800,2.264,2.135322,0.388724,0.204001,0.323286,0.322802
900,2.1577,2.110123,0.386881,0.197202,0.326367,0.326241
1000,2.1315,2.09051,0.400456,0.209173,0.33553,0.333564


TrainOutput(global_step=1420, training_loss=2.672717551110496, metrics={'train_runtime': 5334.6332, 'train_samples_per_second': 8.565, 'train_steps_per_second': 0.266, 'total_flos': 3.435385056043008e+16, 'train_loss': 2.672717551110496, 'epoch': 9.94})

In order to use it later, you can save the trained model.

In [None]:
import os

os.makedirs("./trained_for_summarization_jp", exist_ok=True)
if hasattr(trainer.model, "module"):
    trainer.model.module.save_pretrained("./trained_for_summarization_jp")
else:
    trainer.model.save_pretrained("./trained_for_summarization_jp")

Load pre-trained model from local.

In [None]:
from transformers import AutoModelForSeq2SeqLM

model = (AutoModelForSeq2SeqLM
         .from_pretrained("./trained_for_summarization_jp")
         .to(device))

## Generate Text (Summarize) with Fine-Tuned Model

Now let's see how it generates text for summarization with fine-tuned model.<br>
Here I generate the summarized text of test data, which has not seen in the training set.

> Note : The article in XL-Sum dataset is created by removing the first sentence (headline sentence) of BBC news source, and the first sentence is then used for summary.<br>
>  For this reason, there might exist several mismatch between article and summary in test data. (Choose appropriate samples for checking.)

In [None]:
from torch.utils.data import DataLoader

# Predict with test data (first 5 rows)
sample_dataloader = DataLoader(
    tokenized_ds["test"].with_format("torch"),
    collate_fn=data_collator,
    batch_size=5)
for batch in sample_dataloader:
    with torch.no_grad():
        preds = model.generate(
            batch["input_ids"].to(device),
            num_beams=15,
            num_return_sequences=1,
            no_repeat_ngram_size=1,
            remove_invalid_values=True,
            max_length=128,
        )
    labels = batch["labels"]
    break

# Replace -100 (see above)
labels = np.where(labels != -100, labels, t5_tokenizer.pad_token_id)

# Convert id tokens to text
text_preds = t5_tokenizer.batch_decode(preds, skip_special_tokens=True)
text_labels = t5_tokenizer.batch_decode(labels, skip_special_tokens=True)

# Show result
print("***** Input's Text *****")
print(ds["test"]["text"][0])
print("***** Summary Text (True Value) *****")
print(text_labels[0])
print("***** Summary Text (Generated Text) *****")
print(text_preds[0])

***** Input's Text *****
မဆောက်ဖြစ်တော့တဲ့ အိုလံပစ်ကွင်း အဲဒီ အားကစား ကွင်းကြီး တည်ဆောက်ဖို့အတွက် ကုန်ကျ စားရိတ် များလွန်းတာကြောင့် ဝေဖန်ခံနေရတာ ဖြစ်ပါတယ်။ အားကစားကွင်း အတွက် တည်ဆောက်စားရိတ် ဒေါ်လာ ၂ ဘီလျံလောက် ကုန်ကျလိမ့်မယ်လို့ ခန့်မှန်းထားတာပါ။ ဂျပန်ဝန်ကြီးချုပ် ရှင်ဇိုအာဘေးက ဒါနဲ့ပတ်သက်လို့ အသေအချာ ပြန်လည် သုံးသပ်ရမယ်လို့ ဆိုပါတယ်။ မစ္စတာအာဘေးက ဆောက်လုပ်စားရိတ်က သိပ်များနေလို့ ပြည်သူကရော၊ အားကစားလောက ကရော ဝေဖန်နေကြတဲ့ အကြောင်း ဒီပုံစံနဲ့ ဆိုရင် အားလုံး ပါဝင်ဆင်နွှဲကြတဲ့ ပြိုင်ပွဲကြီး တခုကျင်းပ သွားဖို့ဆိုတာ မဖြစ်နိုင်ဘူးလို့ သူယူဆတဲ့အကြောင်း ပြောသွားခဲ့ပါတယ်။ ဒီအားကစားကွင်းကို ပိသုကာ ဇာဟာ ဟာဒစ်က ဒီဇိုင်းဆွဲ ပုံစံ ထုတ်ခဲ့တာ ဖြစ်ပြီး သူက ဒီလောက် ကုန်ကျစားရိတ် များတာဟာ သူ့ဒီဇိုင်းကြောင့် ဖြစ်တယ်လို့ ပြောနေကြတာနဲ့ ပတ်သက်လို့ ငြင်းဆို လိုက်ပါတယ်။ ဇာဟာ ဟာဒစ်က ဒေသတွင်းမှာ တည်ဆောက်ရေး စားရိတ်တွေ တက်လာတာနဲ့ စီမံကိန်းပြီးစီးဖို့ ရက်အတိအကျ သတ်မှတ်ထားတာ တွေကြောင့် ဒီလိုဖြစ်ခဲ့တာလို့ ဆိုပါတယ်။ အဲ့ဒီအားကစားကွင်းကို လာမယ့် ၂၀၁၉ ရဂ်ဘီကမ္ဘာ့ဖလား ကျင်းပရင် ဖွင့်လှစ်ဖို့ ရည်မှန်းထားပေမယ့် အချိန်မှီ ပ

In [None]:
print("***** Input's Text *****")
print(ds["test"]["text"][2])
print("***** Summary Text (True Value) *****")
print(text_labels[2])
print("***** Summary Text (Generated Text) *****")
print(text_preds[2])

***** Input's Text *****
ဒေါ်အောင်ဆန်းစုကြည်နဲ့ NLD အမတ်များတွေ့ဆုံပွဲ ဒီနေ့မနက်မှာပြုလုပ်ခဲ့တဲ့ တွေ့ဆုံပွဲမှာ ကိုယ်စားလှယ်တွေအနေနဲ့ ကိုယ့်ရဲ့ ဆွေမျိုးတွေကို အကူအညီ မပေးဖို့နဲ့ အကူအညီပေးခဲ့ပါက ဥပဒေနဲ့အညီအရေးယူသွားမှာဖြစ်ကြောင်း ပြောကြားခဲ့တယ်လို့ တွေ့ ဆုံပွဲကို တက်ရောက်ခဲ့တဲ့ ကိုယ်စားလှယ်တွေက ဘီဘီစီကိုပြောပါတယ်။ NLD ပါတီက လွှတ်တော်ကိုယ်စားလှယ်တွေကို စောင့်ကြည့်ဖို့ အတွက် စည်းကမ်းထိန်းသိမ်းရေး အဖွဲ့ကို လည်း ဖွဲ့စည်းသွားမှာဖြစ်ကြောင်း တွေ့ဆုံပွဲမှာ ပြောကြားခဲ့ပါတယ်။ အမတ်တွေ ဖွဲ့စည်းပုံဥပဒေ ပါတီစည်းမျဉ်းစည်းကမ်းတွေကို လေ့လာရမယ်။ ပါတီရံပုံငွေအတွက် ပြည် ထောင်စုအဆင့်အမတ်ကို ရတဲ့လစာထဲက (၂)သိန်းခွဲဖြတ်တောက်မယ်။ ထိပ်ပိုင်းကလူတွေကို လစာ တဝက် လောက်အထိ ဖြတ်မယ်၊ပြီးတော့ (၅)နှစ်အတွင်း ပိုင်ဆိုင်မှုတွေ အချိန်မရွေးစစ်မယ်လို့တွေ့ဆုံ ပွဲမှာပြောခဲ့တယ်လို့ အမျိုးသားဒီမိုကရေစီအဖွဲ့ချုပ်ပါတီ အနိုင်ရလွှတ်တော်ကိုယ်စားလှယ် ဒေါ်သက်သက် ခိုင်က မီဒီယာတွေကို ပြန်ပြောပြပါတယ်။
***** Summary Text (True Value) *****
ဒေါ်အောင်ဆန်းစုကြည် နဲ့ အနိုင်ရ NLDလွှတ်တော်ကိုယ်စားလှယ်များတွေ့ဆုံပွဲကို ရန်ကုန်မြို့ တော်ဝင်နှင်းဆီခန်

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
pip install --upgrade transformers

Collecting transformers
  Downloading transformers-4.37.2-py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.35.2
    Uninstalling transformers-4.35.2:
      Successfully uninstalled transformers-4.35.2
Successfully installed transformers-4.37.2


# Pushing model to HuggingFace

In [None]:
# from transformers import push_to_hub

# # Replace "path_to_your_model_directory" with the path to your saved model directory
# path_to_your_model_directory = "./trained_for_summarization_jp"

# # Push the model to the Hugging Face model hub
# push_to_hub(model_type="seq2seq", model_folder=path_to_your_model_directory, use_temp_dir=True, model_id="finetune-xlsum-my")
