<a href="https://colab.research.google.com/github/TerjeNavjord/notes/blob/main/test_normistral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example of using NorMistral on a GPU with low VRAM

## Initialize the environment

In [1]:
!pip install bitsandbytes
!pip install accelerate

import torch
from google.colab import output
from transformers import AutoTokenizer, AutoModelForCausalLM

output.clear()

## Load the model

The model will be quantized into 8 bits, it trades off performance for less memory usage.

In [2]:
model_name = 'norallm/normistral-7b-warm' # @param ["norallm/normistral-7b-warm", "norallm/normistral-7b-scratch", "norallm/norbloom-7b-scratch"]

In [3]:
tokenizer = AutoTokenizer.from_pretrained(
    model_name
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    low_cpu_mem_usage=True,
    load_in_8bit=True,
    torch_dtype=torch.bfloat16
)
output.clear()

## Define a prompt for zero-shot machine translation

In [4]:
# Zero-shot prompt template
prompt = """{source_language}: {source_text}
{target_language}:"""

# A function that will take care of generating the output
@torch.no_grad()
def generate(input_dict):
    text = prompt.format(**input_dict)
    input_ids = tokenizer(text, return_tensors='pt').input_ids.cuda()
    prediction = model.generate(
        input_ids,
        max_new_tokens=256,
        do_sample=False,
        eos_token_id=tokenizer('\n').input_ids
    )
    decoded_prediction = tokenizer.decode(prediction[0, input_ids.size(1):]).strip()
    output.clear()

    return decoded_prediction

## Translate!

In [11]:
source_text = "In contrast, as mentioned earlier, in the civil law system, legal language is more formal and precise, while in the common law system, legal language is more flexible and adaptable. Court System: In Roman law, the court system was organized into a hierarchy of courts, with the emperor as the ultimate authority. The courts were responsible for interpreting and applying the legal code, and their decisions were subject to review by higher courts. In contrast, as mentioned earlier, in the civil law system, the court system is typically organized into a hierarchy of courts, while in the common law system, the court system is more complex, with both federal and state courts. Jury Trials: In Roman law, jury trials were not a feature of the legal system. Judges were responsible for determining both questions of law and questions of fact. In contrast, as mentioned earlier, in the civil law system, jury trials are rare, while in the common law system, jury trials are a fundamental feature of the legal system. Precedent: In Roman law, prior decisions were not binding on future courts. However, the writings of Roman jurists and the decisions of the emperor were highly influential and often followed by later courts. In contrast, as mentioned earlier, in the civil law system, prior decisions are not binding on future courts, while in the common law system, prior decisions are binding on future courts through the doctrine of stare decisis. In summary, Roman law is characterized by a comprehensive legal code, an active role for judges, formal legal language, a hierarchical court system, no jury trials, and influential but not binding prior decisions. Both the civil law and common law systems have roots in Roman law, but they have evolved differently over time, with the civil law system more closely resembling Roman law in some respects, such as its comprehensive legal code and formal legal language." # @param {type:"string"}
source_language = "Engelsk" # @param ["Engelsk", "Bokmål", "Nynorsk"]
target_language = "Nynorsk" # @param ["Engelsk", "Bokmål", "Nynorsk"]

In [12]:
output_text = generate({
    "source_text": source_text,
    "source_language": source_language,
    "target_language": target_language
})

print(output_text)


