[QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction #213

Nie-Yingying · 2024-04-24T07:15:07Z

❓ Questions and Help

Before asking:

Search for similar issues.
Search the docs.

What is your question?

I can predict scores with only cpu successfully. But when loaded model to gpu, there is oom error.

Code

from comet import download_model, load_from_checkpoint

#model_path = download_model("Unbabel/XCOMET-XXL")
model_path = "./XCOMET-XXL/checkpoints/model.ckpt"
model = load_from_checkpoint(model_path,reload_hparams=True)
data = [
{
"src": "Boris Johnson teeters on edge of favour with Tory MPs",
"mt": "Boris Johnson ist bei Tory-Abgeordneten völlig in der Gunst",
"ref": "Boris Johnsons Beliebtheit bei Tory-MPs steht auf der Kippe"
}
]
model_output = model.predict(data, batch_size=1, gpus=1)

Segment-level scores

print (model_output.scores)

System-level score

print (model_output.system_score)

Score explanation (error spans)

print (model_output.metadata.error_spans)

hparams.yaml

What have you tried?

What's your environment?

ricardorei · 2024-04-30T14:00:43Z

Hi @Nie-Yingying!

I have a suggestion to run XCOMET-XXL in a 40GB but its still not integrated. In the file: comet/encoders/xlmr_xl.py

Replace the model init to load in 16bits:

def __init__(
        self, pretrained_model: str, load_pretrained_weights: bool = True
    ) -> None:
        super(Encoder, self).__init__()
        self.tokenizer = XLMRobertaTokenizerFast.from_pretrained(pretrained_model)
        if load_pretrained_weights:
            self.model = XLMRobertaXLModel.from_pretrained(
                pretrained_model, add_pooling_layer=False
            )
        else:
            print ("Loading model in f16")
            self.model = XLMRobertaXLModel(
                XLMRobertaXLConfig.from_pretrained(pretrained_model, torch_dtype=torch.float16, device_map="auto"),
                add_pooling_layer=False
            )
        self.model.encoder.output_hidden_states = True

ricardorei · 2024-04-30T14:01:05Z

this will load the model with half its memory and should solve your problem. I'll integrate this soon

vince62s · 2024-04-30T14:10:32Z

@ricardorei I did something very similar for the XL. I actually converted it in fp16 then I just changed one line in the feedforward.py
But after I wanted to go even further and use bitsandbytes/HF load_in_8_bit / load_in_4_bit = True but the integration is a mess between lightning and HF.
Last, FYI I did this as a WIP: https://huggingface.co/vince62s/wmt23-cometkiwi-da-roberta-xl
adapting your code in the existing HF XLM-roberta-XL code.
We are trying to implement it in CTranslate2 for much faster inference.

Nie-Yingying · 2024-05-06T02:54:01Z

this will load the model with half its memory and should solve your problem. I'll integrate this soon

sorry to tell you and it's still oom

Nie-Yingying added the question Further information is requested label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction #213

[QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction #213

Nie-Yingying commented Apr 24, 2024

ricardorei commented Apr 30, 2024

ricardorei commented Apr 30, 2024

vince62s commented Apr 30, 2024

Nie-Yingying commented May 6, 2024

[QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction #213

[QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction #213

Comments

Nie-Yingying commented Apr 24, 2024

❓ Questions and Help

Before asking:

What is your question?

Code

Segment-level scores

System-level score

Score explanation (error spans)

What have you tried?

What's your environment?

Name Version Build Channel

ricardorei commented Apr 30, 2024

ricardorei commented Apr 30, 2024

vince62s commented Apr 30, 2024

Nie-Yingying commented May 6, 2024