---
**Simple LLM Inference: Playing with Language Models on Intel® Data Center Max Series GPUs**

**Note**: Please select the "PyTorch GPU" kernel when running this notebook

Hello and welcome! Are you curious about how computers understand and generate human-like text? Do you want to play around with text generation without getting too technical? Then you've come to the right place.

Large Language Models (LLMs) have a wide range of applications, but they can also be fun to experiment with. Here, we'll use some simple pre-trained models to explore text generation interactively.

Powered by Intel® Data Center GPU Max 1100s, this notebook provides a hands-on experience that doesn't require deep technical knowledge. Whether you're a student, writer, educator, or just curious about AI, this guide is designed for you.

Ready to try it out? Let's set up our environment and start exploring the world of text generation with LLMs!


In [1]:
# Required packages, install if not installed (assume PyTorch* and Intel® Extension for PyTorch* is already present)

import sys
import site
from pathlib import Path

!echo "Installation in progress..."
# !{sys.executable} -m pip install torch==2.1.0.post3 torchvision==0.16.0.post3 torchaudio==2.1.0.post3 intel-extension-for-pytorch==2.1.40+xpu oneccl_bind_pt==2.1.400+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# !{sys.executable} -m pip install numpy==1.26.4
!{sys.executable} -m pip install -U transformers==4.46.0 accelerate==0.34.0 --no-warn-script-location && echo "Installation successful" || echo "Installation failed"

def get_python_version():
    return "python" + ".".join(map(str, sys.version_info[:2]))

def set_local_bin_path():
    local_bin = str(Path.home() / ".local" / "bin") 
    local_site_packages = str(
        Path.home() / ".local" / "lib" / get_python_version() / "site-packages"
    )
    sys.path.append(local_bin)
    sys.path.insert(0, site.getusersitepackages())
    sys.path.insert(0, sys.path.pop(sys.path.index(local_site_packages)))

set_local_bin_path()

Installation in progress...
Defaulting to user installation because normal site-packages is not writeable
[0mCollecting transformers==4.46.0
  Using cached transformers-4.46.0-py3-none-any.whl.metadata (44 kB)
Collecting accelerate==0.34.0
  Using cached accelerate-0.34.0-py3-none-any.whl.metadata (19 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers==4.46.0)
  Using cached tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Reason for being yanked: This version unfortunately does not work with 3.8 but we did not drop the support yet[0m[33m
[0mUsing cached transformers-4.46.0-py3-none-any.whl (10.0 MB)
Using cached accelerate-0.34.0-py3-none-any.whl (324 kB)
Using cached tokenizers-0.20.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[0mInstalling collected packages: tokenizers, accelerate, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.15.2
    Uninstalling token

In [2]:
import logging
import os
import random
import re

os.environ["SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS"] = "1"
os.environ["ENABLE_SDP_FUSION"] = "1"
import warnings

# Suppress warnings for a cleaner output
warnings.filterwarnings("ignore")

import torch
import intel_extension_for_pytorch as ipex

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import LlamaTokenizer, LlamaForCausalLM
from transformers import BertTokenizer, BertForSequenceClassification

from ipywidgets import VBox, HBox, Button, Dropdown, IntSlider, FloatSlider, Text, Output, Label, Layout
import ipywidgets as widgets
from ipywidgets import HTML


# random seed
if torch.xpu.is_available():
    seed = 88
    random.seed(seed)
    torch.xpu.manual_seed(seed)
    torch.xpu.manual_seed_all(seed)

def select_device(preferred_device=None):
    """
    Selects the best available XPU device or the preferred device if specified.

    Args:
        preferred_device (str, optional): Preferred device string (e.g., "cpu", "xpu", "xpu:0", "xpu:1", etc.). If None, a random available XPU device will be selected or CPU if no XPU devices are available.

    Returns:
        torch.device: The selected device object.
    """
    try:
        if preferred_device and preferred_device.startswith("cpu"):
            print("Using CPU.")
            return torch.device("cpu")
        if preferred_device and preferred_device.startswith("xpu"):
            if preferred_device == "xpu" or (
                ":" in preferred_device
                and int(preferred_device.split(":")[1]) >= torch.xpu.device_count()
            ):
                preferred_device = (
                    None  # Handle as if no preferred device was specified
                )
            else:
                device = torch.device(preferred_device)
                if device.type == "xpu" and device.index < torch.xpu.device_count():
                    vram_used = torch.xpu.memory_allocated(device) / (
                        1024**2
                    )  # In MB
                    print(
                        f"Using preferred device: {device}, VRAM used: {vram_used:.2f} MB"
                    )
                    return device

        if torch.xpu.is_available():
            device_id = random.choice(
                range(torch.xpu.device_count())
            )  # Select a random available XPU device
            device = torch.device(f"xpu:{device_id}")
            vram_used = torch.xpu.memory_allocated(device) / (1024**2)  # In MB
            print(f"Selected device: {device}, VRAM used: {vram_used:.2f} MB")
            return device
    except Exception as e:
        print(f"An error occurred while selecting the device: {e}")
    print("No XPU devices available or preferred device not found. Using CPU.")
    return torch.device("cpu")


  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
    registered at /build/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at /build/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
       new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2971 (function operator())


---
**A Glimpse Into Text Generation with Language Models**

If you're intrigued by how machines can generate human-like text, let's take a closer look at the underlying code. Even if you're not technically inclined, this section will provide a high-level understanding of how it all works.:

- **Class Definition**: The `ChatBotModel` class is the core of our text generation. It handles the setup, optimization, and interaction with the LLM (Large Language Model).

- **Initialization**: When you create an instance of this class, you can specify the model's path, the device to run on (defaulting to Intel's "xpu" device if available), and the data type. There's also an option to optimize the model for Intel GPUs using Intel Extension For PyTorch (IPEX).

- **Input Preparation**: The `prepare_input` method ensures that the input doesn't exceed the maximum length and combines the previous text with the user input, if required.

- **Output Generation**: The `gen_output` method takes the prepared input and several parameters controlling the generation process, like temperature, top_p, top_k, etc., and produces the text response.

- **Warm-up**: Before the main interactions, the `warmup_model` method helps in "warming up" the model to make subsequent runs faster.

- **Text Processing**: Several methods like `unique_sentences`, `remove_repetitions`, and `extract_bot_response` handle the text processing to ensure the generated text is readable and free from repetitions or unnecessary echoes.

Feel free to explore the code and play around with different parameters. Remember, this is a simple and interactive way to experiment with text generation. It's not a cutting-edge chatbot, but rather a playful tool to engage with language models. Enjoy the journey into the world of LLMs, using Intel® Data Center GPU Max 1100s!


In [3]:
MODEL_CACHE_PATH = "/home/common/data/Big_Data/GenAI/llm_models"
class ChatBotModel:
    """
    ChatBotModel is a class for generating responses based on text prompts using a pretrained model.

    Attributes:
    - device: The device to run the model on. Default is "xpu" if available, otherwise "cpu".
    - model: The loaded model for text generation.
    - tokenizer: The loaded tokenizer for the model.
    - torch_dtype: The data type to use in the model.
    """

    def __init__(
        self,
        model_id_or_path: str = "openlm-research/open_llama_3b_v2",  # "Writer/camel-5b-hf",
        torch_dtype: torch.dtype = torch.bfloat16,
        optimize: bool = True,
    ) -> None:
        """
        The initializer for ChatBotModel class.

        Parameters:
        - model_id_or_path: The identifier or path of the pretrained model.
        - torch_dtype: The data type to use in the model. Default is torch.bfloat16.
        - optimize: If True, ipex is used to optimized the model
        """
        self.torch_dtype = torch_dtype
        self.device = select_device("xpu")
        self.model_id_or_path = model_id_or_path
        local_model_id = self.model_id_or_path.replace("/", "--")
        local_model_path = os.path.join(MODEL_CACHE_PATH, local_model_id)

        if (
            self.device == self.device.startswith("xpu")
            if isinstance(self.device, str)
            else self.device.type == "xpu"
        ):

            self.autocast = torch.xpu.amp.autocast
        else:
            self.autocast = torch.cpu.amp.autocast
        self.torch_dtype = torch_dtype
        try:
            if "llama" in model_id_or_path:
                self.tokenizer = LlamaTokenizer.from_pretrained(local_model_path)
                self.model = (
                    LlamaForCausalLM.from_pretrained(
                        local_model_path,
                        low_cpu_mem_usage=True,
                        torch_dtype=self.torch_dtype,
                    )
                    .to(self.device)
                    .eval()
                )
            else:
                self.tokenizer = AutoTokenizer.from_pretrained(
                    local_model_path, trust_remote_code=True
                )
                self.model = (
                    AutoModelForCausalLM.from_pretrained(
                        local_model_path,
                        low_cpu_mem_usage=True,
                        trust_remote_code=True,
                        torch_dtype=self.torch_dtype,
                    )
                    .to(self.device)
                    .eval()
                )
        except (OSError, ValueError, EnvironmentError) as e:
            logging.info(
                f"Tokenizer / model not found locally. Downloading tokenizer / model for {self.model_id_or_path} to cache...: {e}"
            )
            if "llama" in model_id_or_path:
                self.tokenizer = LlamaTokenizer.from_pretrained(self.model_id_or_path)
                self.model = (
                    LlamaForCausalLM.from_pretrained(
                        self.model_id_or_path,
                        low_cpu_mem_usage=True,
                        torch_dtype=self.torch_dtype,
                    )
                    .to(self.device)
                    .eval()
                )
            else:
                self.tokenizer = AutoTokenizer.from_pretrained(
                    self.model_id_or_path, trust_remote_code=True
                )
                self.model = (
                    AutoModelForCausalLM.from_pretrained(
                        self.model_id_or_path,
                        low_cpu_mem_usage=True,
                        trust_remote_code=True,
                        torch_dtype=self.torch_dtype,
                    )
                    .to(self.device)
                    .eval()
                )
            
        self.max_length = 256

        if optimize:
            if hasattr(ipex, "optimize_transformers"):
                try:
                    ipex.optimize_transformers(self.model, dtype=self.torch_dtype)
                except:
                    ipex.optimize(self.model, dtype=self.torch_dtype)
            else:
                ipex.optimize(self.model, dtype=self.torch_dtype)

    def prepare_input(self, previous_text, user_input):
        """Prepare the input for the model, ensuring it doesn't exceed the maximum length."""
        response_buffer = 100
        user_input = (
             "Below is an instruction that describes a task. "
            "Write a response that appropriately completes the request.\n\n"
            f"### Instruction:\n{user_input}\n\n### Response:")
        combined_text = previous_text + "\nUser: " + user_input + "\nBot: "
        input_ids = self.tokenizer.encode(
            combined_text, return_tensors="pt", truncation=False
        )
        adjusted_max_length = self.max_length - response_buffer
        if input_ids.shape[1] > adjusted_max_length:
            input_ids = input_ids[:, -adjusted_max_length:]
        return input_ids.to(device=self.device)

    def gen_output(
        self, input_ids, temperature, top_p, top_k, num_beams, repetition_penalty
    ):
        """
        Generate the output text based on the given input IDs and generation parameters.

        Args:
            input_ids (torch.Tensor): The input tensor containing token IDs.
            temperature (float): The temperature for controlling randomness in Boltzmann distribution.
                                Higher values increase randomness, lower values make the generation more deterministic.
            top_p (float): The cumulative distribution function (CDF) threshold for Nucleus Sampling.
                           Helps in controlling the trade-off between randomness and diversity.
            top_k (int): The number of highest probability vocabulary tokens to keep for top-k-filtering.
            num_beams (int): The number of beams for beam search. Controls the breadth of the search.
            repetition_penalty (float): The penalty applied for repeating tokens.

        Returns:
            torch.Tensor: The generated output tensor.
        """
        print(f"Using max length: {self.max_length}")
        with self.autocast(
            enabled=True if self.torch_dtype != torch.float32 else False,
            dtype=self.torch_dtype,
        ):
            with torch.no_grad():
                output = self.model.generate(
                    input_ids,
                    pad_token_id=self.tokenizer.eos_token_id,
                    max_length=self.max_length,
                    temperature=temperature,
                    top_p=top_p,
                    top_k=top_k,
                    num_beams=num_beams,
                    repetition_penalty=repetition_penalty,
                )
                return output

    def warmup_model(
        self, temperature, top_p, top_k, num_beams, repetition_penalty
    ) -> None:
        """
        Warms up the model by generating a sample response.
        """
        sample_prompt = """A dialog, where User interacts with a helpful Bot.
        AI is helpful, kind, obedient, honest, and knows its own limits.
        User: Hello, Bot.
        Bot: Hello! How can I assist you today?
        """
        input_ids = self.tokenizer(sample_prompt, return_tensors="pt").input_ids.to(
            device=self.device
        )
        _ = self.gen_output(
            input_ids,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            num_beams=num_beams,
            repetition_penalty=repetition_penalty,
        )

    def strip_response(self, generated_text):
        """Remove ### Response: from string if exists."""
        match = re.search(r'### Response:(.*)', generated_text, re.S)
        if match:
            return match.group(1).strip()
    
        else:
            return generated_text
        
    def unique_sentences(self, text: str) -> str:
        sentences = text.split(". ")
        if sentences[-1] and sentences[-1][-1] != ".":
            sentences = sentences[:-1]
        sentences = set(sentences)
        return ". ".join(sentences) + "." if sentences else ""

    def remove_repetitions(self, text: str, user_input: str) -> str:
        """
        Remove repetitive sentences or phrases from the generated text and avoid echoing user's input.

        Args:
            text (str): The input text with potential repetitions.
            user_input (str): The user's original input to check against echoing.

        Returns:
            str: The processed text with repetitions and echoes removed.
        """
        text = re.sub(re.escape(user_input), "", text, count=1).strip()
        text = self.unique_sentences(text)
        return text

    def extract_bot_response(self, generated_text: str) -> str:
        """
        Extract the first response starting with "Bot:" from the generated text.

        Args:
            generated_text (str): The full generated text from the model.

        Returns:
            str: The extracted response starting with "Bot:".
        """
        prefix = "Bot:"
        generated_text = generated_text.replace("\n", ". ")
        bot_response_start = generated_text.find(prefix)
        if bot_response_start != -1:
            response_start = bot_response_start + len(prefix)
            end_of_response = generated_text.find("\n", response_start)
            if end_of_response != -1:
                return generated_text[response_start:end_of_response].strip()
            else:
                return generated_text[response_start:].strip()
        return re.sub(r'^[^a-zA-Z0-9]+', '', generated_text)

    def interact(
        self,
        out: Output,  # Output widget to display the conversation
        with_context: bool = True,
        temperature: float = 0.10,
        top_p: float = 0.95,
        top_k: int = 40,
        num_beams: int = 3,
        repetition_penalty: float = 1.80,
    ) -> None:
        """
        Handle the chat loop where the user provides input and receives a model-generated response.

        Args:
            with_context (bool): Whether to consider previous interactions in the session. Default is True.
            temperature (float): The temperature for controlling randomness in Boltzmann distribution.
                                 Higher values increase randomness, lower values make the generation more deterministic.
            top_p (float): The cumulative distribution function (CDF) threshold for Nucleus Sampling.
                           Helps in controlling the trade-off between randomness and diversity.
            top_k (int): The number of highest probability vocabulary tokens to keep for top-k-filtering.
            num_beams (int): The number of beams for beam search. Controls the breadth of the search.
            repetition_penalty (float): The penalty applied for repeating tokens.
            """
        previous_text = ""
    
        def display_user_input_widgets():
            default_color = "\033[0m"
            user_color, user_icon = "\033[94m", "😀 "
            bot_color, bot_icon = "\033[92m", "🤖 "
            user_input_widget = Text(placeholder="Type your message here...", layout=Layout(width='80%'))
            send_button = Button(description="Send", button_style = "primary", layout=Layout(width='10%'))
            chat_spin = HTML(value = "")
            spin_style = """
            <div class="loader"></div>
            <style>
            .loader {
              border: 5px solid #f3f3f3;
              border-radius: 50%;
              border-top: 5px solid #3498db;
              width: 8px;
              height: 8px;
              animation: spin 3s linear infinite;
            }
            @keyframes spin {
              0% { transform: rotate(0deg); }
              100% { transform: rotate(360deg); }
            }
            </style>
            """
            display(HBox([chat_spin, user_input_widget, send_button, ]))
            
            def on_send(button):
                nonlocal previous_text
                send_button.button_style = "warning"
                chat_spin.value = spin_style
                orig_input = ""
                user_input = user_input_widget.value
                with out:
                    print(f" {user_color}{user_icon}You: {user_input}{default_color}")
                if user_input.lower() == "exit":
                    return
                if "camel" in self.model_id_or_path:
                        orig_input = user_input
                        user_input = (
                            "Below is an instruction that describes a task. "
                            "Write a response that appropriately completes the request.\n\n"
                            f"### Instruction:\n{user_input}\n\n### Response:")
                if with_context:
                    self.max_length = 256
                    input_ids = self.prepare_input(previous_text, user_input)
                else:
                    self.max_length = 96
                    input_ids = self.tokenizer.encode(user_input, return_tensors="pt").to(self.device)
    
                output_ids = self.gen_output(
                    input_ids,
                    temperature=temperature,
                    top_p=top_p,
                    top_k=top_k,
                    num_beams=num_beams,
                    repetition_penalty=repetition_penalty,
                )
                generated_text = self.tokenizer.decode(output_ids[0], skip_special_tokens=True)
                generated_text = self.strip_response(generated_text)
                generated_text = self.extract_bot_response(generated_text)
                generated_text = self.remove_repetitions(generated_text, user_input)
                send_button.button_style = "success"
                chat_spin.value = ""

                with out:
                    if orig_input:
                        user_input = orig_input
                    print(f" {bot_color}{bot_icon}Bot: {generated_text}{default_color}")    
                if with_context:
                    previous_text += "\nUser: " + user_input + "\nBot: " + generated_text
                user_input_widget.value = "" 
                display_user_input_widgets()
            send_button.on_click(on_send)
        display_user_input_widgets()

---
**Setting Up the Interactive Text Generation Interface**

In the next section, we'll create an interactive text generation interface right here in this notebook. This will enable you to select a model, provide a prompt, and tweak various parameters without touching the code itself.

- **Model Selection**: Choose from available pre-trained models or enter a custom model from the HuggingFace Hub.
- **Interaction Mode**: Decide whether to interact with or without context, allowing the model to remember previous interactions or treat each input independently.
- **Temperature**: Adjust this parameter to control the randomness in text generation. Higher values increase creativity; lower values make the generation more deterministic.
- **Top_p, Top_k**: Play with these parameters to influence the diversity and quality of the generated text.
- **Number of Beams**: Control the breadth of the search in text generation.
- **Repetition Penalty**: Modify this to prevent or allow repeated phrases and sentences.

Once you've set your preferences, you can start the interaction and even reset or reload the model to try different settings. Let's set this up and explore the playful world of text generation using Intel® Data Center GPU Max 1100s!



In [4]:
model_cache = {}

from ipywidgets import HTML
def interact_with_llm():
    models = ["Writer/camel-5b-hf", 
              "openlm-research/open_llama_3b_v2",
              "Intel/neural-chat-7b-v3", 
              "Intel/neural-chat-7b-v3-1", # https://huggingface.co/Intel/neural-chat-7b-v3-1 - checkout the prompting template on the site to get better response.
              "HuggingFaceH4/zephyr-7b-beta", 
              "tiiuae/falcon-7b"
             ]
    interaction_modes = ["Interact with context", "Interact without context"]
    model_dropdown = Dropdown(options=models, value=models[0], description="Model:")
    interaction_mode = Dropdown(options=interaction_modes, value=interaction_modes[1], description="Interaction:")
    temperature_slider = FloatSlider(value=0.71, min=0, max=1, step=0.01, description="Temperature:")
    top_p_slider = FloatSlider(value=0.95, min=0, max=1, step=0.01, description="Top P:")
    top_k_slider = IntSlider(value=40, min=0, max=100, step=1, description="Top K:")
    num_beams_slider = IntSlider(value=3, min=1, max=10, step=1, description="Num Beams:")
    repetition_penalty_slider = FloatSlider(value=1.80, min=0, max=2, step=0.1, description="Rep Penalty:")
    
    out = Output()    
    left_panel = VBox([model_dropdown, interaction_mode], layout=Layout(margin="0px 20px 10px 0px"))
    right_panel = VBox([temperature_slider, top_p_slider, top_k_slider, num_beams_slider, repetition_penalty_slider],
                       layout=Layout(margin="0px 0px 10px 20px"))
    user_input_widgets = HBox([left_panel, right_panel], layout=Layout(margin="0px 50px 10px 0px"))
    spinner = HTML(value="")
    start_button = Button(description="Start Interaction!", button_style="primary")
    start_button_spinner = HBox([start_button, spinner])
    start_button_spinner.layout.margin = '0 auto'
    display(user_input_widgets)
    display(start_button_spinner)
    display(out)
    
    def on_start(button):
        start_button.button_style = "warning"
        start_button.description = "Loading..."
        spinner.value = """
        <div class="loader"></div>
        <style>
        .loader {
          border: 5px solid #f3f3f3;
          border-radius: 50%;
          border-top: 5px solid #3498db;
          width: 16px;
          height: 16px;
          animation: spin 3s linear infinite;
        }
        @keyframes spin {
          0% { transform: rotate(0deg); }
          100% { transform: rotate(360deg); }
        }
        </style>
        """
        out.clear_output()
        with out:
            print("\nSetting up the model, please wait...")
        #out.clear_output()
        model_choice = model_dropdown.value
        with_context = interaction_mode.value == interaction_modes[0]
        temperature = temperature_slider.value
        top_p = top_p_slider.value
        top_k = top_k_slider.value
        num_beams = num_beams_slider.value
        repetition_penalty = repetition_penalty_slider.value
        model_key = (model_choice, "xpu")
        if model_key not in model_cache:
            model_cache[model_key] = ChatBotModel(model_id_or_path=model_choice)
        bot = model_cache[model_key]
        #if model_key not in model_cache:
        #    bot.warmup_model(
        #        temperature=temperature,
        #        top_p=top_p,
        #        top_k=top_k,
        #        num_beams=num_beams,
        #        repetition_penalty=repetition_penalty,
        #    )
        
        with out:
            start_button.button_style = "success"
            start_button.description = "Refresh"
            spinner.value = ""
            print("Ready!")
            print("\nNote: This is a demonstration using pretrained models which were not fine-tuned for chat.")
            print("If the bot doesn't respond, try clicking on refresh.\n")
        try:
            with out:
                bot.interact(
                    with_context=with_context,
                    out=out,
                    temperature=temperature,
                    top_p=top_p,
                    top_k=top_k,
                    num_beams=num_beams,
                    repetition_penalty=repetition_penalty,
                )
        except Exception as e:
            with out:
                print(f"An error occurred: {e}")

    start_button.on_click(on_start)


---
**Let's Dive In and Have Some Fun with LLM Models!**

Ready for a playful interaction with some interesting LLM models? The interface below lets you choose from different models and settings. Just select your preferences, click the "Start Interaction!" button, and you're ready to chat.

You can ask questions, make statements, or simply explore how the model responds to different inputs. It's a friendly way to get acquainted with AI and see what it has to say.

Remember, this is all in good fun, and the models are here to engage with you. So go ahead, start a conversation, and enjoy the interaction!

In [5]:
interact_with_llm()

HBox(children=(VBox(children=(Dropdown(description='Model:', options=('Writer/camel-5b-hf', 'openlm-research/o…

HBox(children=(Button(button_style='primary', description='Start Interaction!', style=ButtonStyle()), HTML(val…

Output()

## Language Models Disclaimer and Information

### Camel-5B
- **Model card:** [Camel-5B](https://huggingface.co/Writer/camel-5b-hf)
- **License:** Apache 2.0
- **Reference:**
    ```bibtex
    @misc{Camel,
      author = {Writer Engineering team},
      title = {{Camel-5B InstructGPT}},
      howpublished = {\url{https://dev.writer.com}},
      year = 2023,
      month = April 
    }
    ```

### OpenLLaMA 3b v2
- **Model card:** [OpenLLaMA 3b v2](https://huggingface.co/openlm-research/open_llama_3b_v2)
- **License:** Apache 2.0
- **References:**
    ```bibtex
    @software{openlm2023openllama,
      author = {Geng, Xinyang and Liu, Hao},
      title = {OpenLLaMA: An Open Reproduction of LLaMA},
      month = May,
      year = 2023,
      url = {https://github.com/openlm-research/open_llama}
    }
    @software{together2023redpajama,
      author = {Together Computer},
      title = {RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset},
      month = April,
      year = 2023,
      url = {https://github.com/togethercomputer/RedPajama-Data}
    }
    @article{touvron2023llama,
      title={Llama: Open and efficient foundation language models},
      author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and others},
      journal={arXiv preprint arXiv:2302.13971},
      year={2023}
    }
    ```
### Falcon 7B

- **Model card:** [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b)
- **License:** Apache 2.0
- **References:**
  ```bibtex
    @article{falcon40b,
      title = {{Falcon-40B}: an open large language model with state-of-the-art performance},
      author = {Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malrtic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme},
      year={2023}
      }
  ```
### Zephyr 7B

- **Model card:** [Zephyr 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
- **License:** MIT
- **References:**
  ```bibtex
    @misc{alignment_handboox2023,
        author = {Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Alexander M. Rush and Thomas Wolf},
        title = {The Alignment Handbook},
        year = {2023},
        publisher = {GitHub},
        journal = {GitHub repository},
        howpublished = {\url{https://github.com/huggingface/alignment-handbook}}
        }
    ```
### Neural Chat 7b
- **Model card:** [Neural Chat](https://huggingface.co/Intel/neural-chat-7b-v3)
- **License:** Apache 2.0

### Disclaimer for Using Large Language Models

Please be aware that while Large Language Models like Camel-5B and OpenLLaMA 3b v2 are powerful tools for text generation, they may sometimes produce results that are unexpected, biased, or inconsistent with the given prompt. It's advisable to carefully review the generated text and consider the context and application in which you are using these models.

Usage of these models must also adhere to the licensing agreements and be in accordance with ethical guidelines and best practices for AI. If you have any concerns or encounter issues with the models, please refer to the respective model cards and documentation provided in the links above.

To the extent that any public or non-Intel datasets or models are referenced by or accessed using these materials those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.

 
Intel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.

Intel’s provision of these resources does not expand or otherwise alter Intel’s applicable published warranties or warranty disclaimers for Intel products or solutions, and no additional obligations, indemnifications, or liabilities arise from Intel providing such resources. Intel reserves the right, without notice, to make corrections, enhancements, improvements, and other changes to its materials.
