### Setting up the environment


1. I used Visual Studio Code with the [Python](https://marketplace.visualstudio.com/items?itemName=ms-python.python) and [Polyglot Notebooks](https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode) to create this sample.
1. I launched the project using Python 3.12.3 and used venv to manage the dependencies.

**Install CUDA**
1. https://developer.nvidia.com/cuda-downloads

1. Validate by running the following command
    ```
    nvcc --version
    ```

**Install Torch**
1. https://pytorch.org/get-started/locally/
1. Select the appropriate options for your system
    ```
    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    ```

**Validate**

Run the following code to validate the installation.
```
import torch

print(torch.cuda.is_available())
exit()
```

In [15]:
# Note: you may need to restart the kernel to use updated packages.

# %pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# %pip install transformers
# %pip install accelerate
# %pip install ipykernel

# %pip freeze > requirements-using-phi1-inmemory.txt

Note: you may need to restart the kernel to use updated packages.


### Running the sample

In [10]:
import torch
from transformers import AutoModelForCausalLM

# the model id is the model path on the Hugging Face model hub,
# you can find it in the model's page URL
base_model_id = "microsoft/phi-1"

# AutoModelForCausalLM: This is a class from the Hugging Face Transformers library. It’s used
#    for causal language modeling (LLM) tasks. Specifically, it’s designed for autoregressive
#    generation, where the model predicts the next token in a sequence given the previous tokens.
#
# from_pretrained(base_model_id, trust_remote_code=True, torch_dtype=torch.float16, device_map={"": 0}):
#    base_model_id: This parameter specifies the pretrained model to load. You provide either
#       a shortcut name (e.g., 'bert-base-uncased') or a path to a directory containing a saved
#       configuration file.
#    trust_remote_code=True: This flag allows the model to download weights/configurations 
#       from a remote source (like Hugging Face’s model hub) if they are not already cached locally.
#    torch_dtype=torch.float16: This sets the data type for the model’s weights to 16-bit
#       floating point (half precision). This can help reduce memory usage and speed up inference.
#    device_map={"": 0}: This maps the model to a specific device (in this case, device index 0).
#       An empty string "" means the default device (usually CPU or GPU).

# this line of code initializes an autoregressive language model (AutoModelForCausalLM) using pretrained weights specified by base_model_id
model =  AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True, torch_dtype=torch.float16, device_map={"": 0})

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

We need to use a tokenizer to communicate with the model. The model doesn't understand our text, it understands tokens. The tokenizer converts our text into tokens and the model converts the tokens into predictions. The tokenizer is a crucial part of the model and it is important to use the same tokenizer that was used to train the model. The tokenizer is part of the model configuration and we can access it using `model.config`.

In [7]:
from transformers import AutoTokenizer


# AutoTokenizer: This is a class from the Hugging Face Transformers library. It’s used for tokenizing
#    text data. Tokenization involves breaking down a sequence of text into individual tokens (words,
#    subwords, or characters) for further processing by language models.
#
# from_pretrained(base_model_id, use_fast=True):
#    base_model_id: This parameter specifies the pretrained model to load. You provide either a
#       shortcut name (e.g., 'bert-base-uncased') or a path to a directory containing a saved
#       configuration file.
#    use_fast=True: This flag determines whether to use a fast Rust-based tokenizer if it’s supported
#       for the given model. If a fast tokenizer is not available, a normal Python-based tokenizer is
#       used instead.

# this line of code initializes a tokenizer (AutoTokenizer) using pretrained weights specified by base_model_id.
# The use_fast=True option indicates that it should use a faster tokenizer implementation if possible
tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)

In [8]:

# some text we want to send to the model to start our conversation
prompt = "Cite 20 famous people."

model_inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
output = model.generate(**model_inputs, max_length=500)[0]

# and finally, we print the output
print(tokenizer.decode(output, skip_special_tokens=True)) 

Cite 20 famous people.

    Args:
    - people: A list of strings representing the names of the people in the list.

    Returns:
    - A string representing the name of the person who is the most famous among the people in the list.
    """
    famous_people = ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Heidi", "Ivan", "Judy", "Kevin", "Linda", "Mallory", "Nancy", "Oscar", "Peggy", "Quentin", "Romeo", "Sybil", "Trent", "Ursula", "Victor", "Wendy", "Xavier", "Yvonne", "Zoe"]
    cite_counts = {}
    for person in people:
        if person in famous_people:
            if person in cite_counts:
                cite_counts[person] += 1
            else:
                cite_counts[person] = 1
    if not cite_counts:
        return "No famous people found in the list."
    return max(cite_counts, key=cite_counts.get)



from typing import List

def find_most_common_letter(words: List[str]) -> str:
    """
    Returns the most common letter among all the words in the inp