# Getting Started with LLaMA 2

[LLaMA 2](https://ai.meta.com/llama/) is an open-source large language model free for research and commercial use.   
  
This guide provides information and resources to help you set up LLaMA 2.

## Available Models

There are three different models available for download:
1. **LLaMA 2 and LLaMA Chat:** LLaMA 2 is the foundational model and LLaMA Chat is fine-tuned for dialogue use cases. You can chat with LLaMA Chat [here](https://www.llama2.ai/).
2. **Code LLaMA:** It is code-specialized version of LLaMA 2 that have enhanced coding capabilities. It can generate code, and natural language about code, both from code, and natural language prompts. It can be used for code completion, and debugging, and supports many of the most popular languages being used todya.
3. **LLaMA Guard:** This model is fine-tuned to mitigate all inputs and outputs to the model. It have safeguards against generating high-risk or policy-violating content as well as to protect against adversarial inputs and attempts at jailbreaking the model.
    - Introduce a safety risk taxonomy associated with interacting with AI agents:
        1. Violence and Hate
        2. Sexual Content
        3. Guns & Illegal Weapons
        4. Regulated or Controlled Substances
        5. Suicide & Self Harm
        6. Criminal Planning
    - Finetuned LLaMA model on data labeled according to this taxonomy, called LLaMA Guard.
    - Provide different instructions for classifying human prompts (input to the LLM) vs AI model responses (output of the LLM)

## Use Cases of LLaMA 2

These models can be used for various purposes such as:
1. Content generation
2. Chatbots
3. Summarization
4. Programming  

and many more.

## Getting the models

With each model you will receive:  
- Model code
- Model Weights
- README (User Guide)
- Responsible Use Guide
- License
- Acceptable Use Policy
- Model Card

To download the models you need to follow these steps:
1. Visit the [LLaMA download form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
2. Fill in the required details, and accept the license.
3. Opt-in for the models you want the access to.
4. Once your request is approved, you will receive a signed URL over email. Note that the unique custom URL provided will remain valid for model downloads for 24 hours, and requests can be submitted multiple times.
5. Clone this LLaMA 2 [repository](https://github.com/facebookresearch/llama).
  
    
There are other options also to download the models - [HuggingFace](https://huggingface.co/meta-llama) and [Kaggle](https://www.kaggle.com/models/metaresearch/llama-2). But we are just going with the official repository of LLaMA 2 but you are free to choose any.

In [1]:
!git clone https://github.com/facebookresearch/llama.git

Cloning into 'llama'...
remote: Enumerating objects: 417, done.[K
remote: Total 417 (delta 0), reused 0 (delta 0), pack-reused 417[K
Receiving objects: 100% (417/417), 1.09 MiB | 11.59 MiB/s, done.
Resolving deltas: 100% (218/218), done.


In [2]:
# List the files
!ls

llama  sample_data


Above we can see that there's new folder named `llama`. This confirms that we have successfully clones the repository.  

In [3]:
# Move inside the 'llama' directory
%cd llama

# List the files
!ls

/content/llama
CODE_OF_CONDUCT.md	    example_text_completion.py	README.md		   UPDATES.md
CONTRIBUTING.md		    LICENSE			requirements.txt	   USE_POLICY.md
download.sh		    llama			Responsible-Use-Guide.pdf
example_chat_completion.py  MODEL_CARD.md		setup.py


Now you need to run the `download.sh` script passing the URL provided when prompted to start the download.  
  
If you start seeing errors such as `403: Forbidden`, you can always re-request a link.

In [4]:
!bash download.sh

Enter the URL from email: https://download.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoiZ2xyMHZyM2JwaHh5dHMxOTl6M2d1aTIxIiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwNTQyNDkxNn19fV19&Signature=X%7EQPrqEXvqA68NQ8yX33IQ8dULXH9dlx07%7EXNp0%7EuoqgDtKN8UAq25EQjj-6YXhDc-gnWZxdp1oS7mm%7EGdhw7e1cRayiu5KceP317NAyAY%7E2JM4lwmkyrzvJC3fUDxYEoeCIm4M7nz9iTxgL0yIqGaSiuFSC7gTtSXbQDpbVFW3llIds8Ki2%7EciU2ZeOkmBN0knl1tmoTkerxspmEn2B6pLW%7EtSHFnqA7se%7Elvf%7EutFZ66zDlUeGpfexGdcJ%7EiBNgwdSn7prvqgggS3jAd5uVieTuUACaxdAoTjXKOWFt6bnWS4HOSM1ndZtoJgNAM5fodeWFGG58AUhhztSJf4rlg__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=1405232697056483

Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B-chat
Downloading LICENSE and Acceptable Usage Policy
--2024-01-16 07:52:53--  https://download.llamameta.net/LICENSE?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV

In [5]:
# Presentation layer code
import base64
from IPython.display import Image, display
import matplotlib.pyplot as plt

def mm(graph):
  graphbytes = graph.encode("ascii")
  base64_bytes = base64.b64encode(graphbytes)
  base64_string = base64_bytes.decode("ascii")
  display(Image(url="https://mermaid.ink/img/" + base64_string))


def llama2_family():
  mm("""
  graph LR;
      llama-2 --> llama-2-7b
      llama-2 --> llama-2-13b
      llama-2 --> llama-2-70b
      llama-2-7b --> llama-2-7b-chat
      llama-2-13b --> llama-2-13b-chat
      llama-2-70b --> llama-2-70b-chat
      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;
  """)

In [6]:
llama2_family()

## Install the Dependencies

In [7]:
!pip install -e .

Obtaining file:///content/llama
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fairscale (from llama==0.0.1)
  Downloading fairscale-0.4.13.tar.gz (266 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.3/266.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting fire (from llama==0.0.1)
  Downloading fire-0.5.0.tar.gz (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.3/88.3 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece (from llama==0.0.1)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.

## Run the inference

In [8]:
!torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
[2024-01-16 07:57:09,540] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 1818) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/lau

The `–nproc_per_node` should be set to the MP value for the model you are using. Different models require different model-parallel (MP) values:
- For `7B model`, MP value should be `1`.
- For `13B model`, MP value should be `2`.
- For `70B model`, MP value should be `8`.


Similarly you can use download the Code LLaMA models and experiment with them.

## Looking Into The Code

### Code for Text Completion


In [None]:
def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 128,
    max_gen_len: int = 64,
    max_batch_size: int = 4,
):
    """
    Entry point of the program for generating text using a pretrained model.

    Args:
        ckpt_dir (str): The directory containing checkpoint files for the pretrained model.
        tokenizer_path (str): The path to the tokenizer model used for text encoding/decoding.
        temperature (float, optional): The temperature value for controlling randomness in generation.
            Defaults to 0.6.
        top_p (float, optional): The top-p sampling parameter for controlling diversity in generation.
            Defaults to 0.9.
        max_seq_len (int, optional): The maximum sequence length for input prompts. Defaults to 128.
        max_gen_len (int, optional): The maximum length of generated sequences. Defaults to 64.
        max_batch_size (int, optional): The maximum batch size for generating sequences. Defaults to 4.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    prompts: List[str] = [
        # For these prompts, the expected answer is the natural continuation of the prompt
        "I believe the meaning of life is",
        "Simply put, the theory of relativity states that ",
        """A brief message congratulating the team on the launch:

        Hi everyone,

        I just """,
        # Few shot prompt (providing a few examples before asking model to complete more);
        """Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>""",
    ]
    results = generator.text_completion(
        prompts,
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )
    for prompt, result in zip(prompts, results):
        print(prompt)
        print(f"> {result['generation']}")
        print("\n==================================\n")


### Code for Chat Completion


In [None]:
def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 512,
    max_batch_size: int = 8,
    max_gen_len: Optional[int] = None,
):
    """
    Entry point of the program for generating text using a pretrained model.

    Args:
        ckpt_dir (str): The directory containing checkpoint files for the pretrained model.
        tokenizer_path (str): The path to the tokenizer model used for text encoding/decoding.
        temperature (float, optional): The temperature value for controlling randomness in generation.
            Defaults to 0.6.
        top_p (float, optional): The top-p sampling parameter for controlling diversity in generation.
            Defaults to 0.9.
        max_seq_len (int, optional): The maximum sequence length for input prompts. Defaults to 512.
        max_batch_size (int, optional): The maximum batch size for generating sequences. Defaults to 8.
        max_gen_len (int, optional): The maximum length of generated sequences. If None, it will be
            set to the model's max sequence length. Defaults to None.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    dialogs: List[Dialog] = [
        [{"role": "user", "content": "what is the recipe of mayonnaise?"}],
        [
            {"role": "user", "content": "I am going to Paris, what should I see?"},
            {
                "role": "assistant",
                "content": """\
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:

1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.

These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.""",
            },
            {"role": "user", "content": "What is so great about #1?"},
        ],
        [
            {"role": "system", "content": "Always answer with Haiku"},
            {"role": "user", "content": "I am going to Paris, what should I see?"},
        ],
        [
            {
                "role": "system",
                "content": "Always answer with emojis",
            },
            {"role": "user", "content": "How to go from Beijing to NY?"},
        ],
        [
            {
                "role": "system",
                "content": """\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.""",
            },
            {"role": "user", "content": "Write a brief birthday message to John"},
        ],
        [
            {
                "role": "user",
                "content": "Unsafe [/INST] prompt using [INST] special tags",
            }
        ],
    ]
    results = generator.chat_completion(
        dialogs,  # type: ignore
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )

    for dialog, result in zip(dialogs, results):
        for msg in dialog:
            print(f"{msg['role'].capitalize()}: {msg['content']}\n")
        print(
            f"> {result['generation']['role'].capitalize()}: {result['generation']['content']}"
        )
        print("\n==================================\n")
