In [1]:
!nvidia-smi

Mon Dec 16 14:00:19 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla V100-SXM2-32GB           On  |   00000000:06:00.0 Off |                    0 |
| N/A   35C    P0             67W /  300W |   16717MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           On  |   00

In [2]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '7'

# LLM 101

In [3]:
from huggingface_hub import login
from dotenv import load_dotenv


login(os.getenv("HF_TOKEN"))

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
from transformers import AutoTokenizer
 
MODEL_NAME = "meta-llama/Llama-3.2-1B"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

In [5]:
texts = [
    "The most important person in AI is",
    "What is the most important person in AI"
]
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
encoding = tokenizer(texts, padding="longest")

In [6]:
encoding.keys()

dict_keys(['input_ids', 'attention_mask'])

In [7]:
encoding.attention_mask

[[1, 1, 1, 1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1]]

In [8]:
text = "The most important person in AI is"
encoding = tokenizer(text, return_tensors="pt")

In [9]:
encoding.keys()

dict_keys(['input_ids', 'attention_mask'])

In [10]:
encoding.input_ids

tensor([[128000,    791,   1455,   3062,   1732,    304,  15592,    374]])

In [11]:
input_ids = encoding.input_ids[0]
tokenizer.decode(input_ids)

'<|begin_of_text|>The most important person in AI is'

In [12]:
tokenizer.convert_ids_to_tokens(input_ids)

['<|begin_of_text|>',
 'The',
 'Ġmost',
 'Ġimportant',
 'Ġperson',
 'Ġin',
 'ĠAI',
 'Ġis']

## Load Model

In [13]:
import torch
from transformers import AutoModelForCausalLM, GenerationConfig
 
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, device_map="auto", torch_dtype=torch.float16
)

In [14]:
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 128
generation_config.repetition_penalty = 1.18
generation_config.temperature = 0.0000001

- `max_new_tokens`: maximum token yang bisa digenerate oleh model
- `repetition_penalty`: mencegah model men-generate kata yang sama terus menerus
- `temperature`: kontrol randomness dari teks yang di-generate. Semakin rendah semakin predictable hasilnya.

In [15]:
model.device

device(type='cuda', index=0)

In [16]:
encoding = encoding.to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


In [17]:
outputs.size()

torch.Size([1, 136])

In [18]:
outputs

tensor([[128000,    791,   1455,   3062,   1732,    304,  15592,    374,    279,
            828,     13,    578,    810,    499,   1440,    922,    701,   6444,
             11,    872,   3966,    323,   6944,     11,    279,   2731,  11429,
            499,    649,   1304,    627,  15836,    706,   1027,   2212,    369,
            264,   1418,   1457,    719,    433,    753,   1193,   6051,    430,
            584,   4070,   3970,   1202,    837,   4754,  34044,     13,   3161,
          31003,   1093,   5655,   6975,    323,  30828,  14488,   1694,   1511,
            311,   1893,  26249,  13171,    315,   8830,   3823,   4221,     11,
           1070,    527,  26762,  24525,    994,    433,   4131,    311,   1701,
          21075,  11478,    320,  15836,      8,   5557,   2949,   9873,   3432,
            627,    644,    420,   4652,    358,   4805,    387,  25394,   1268,
           5220,   1005,   5780,   6975,   4211,    439,    961,    315,    459,
           8244,   8446,   6

In [19]:
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)

The most important person in AI is the data. The more you know about your customers, their needs and wants, the better decisions you can make.
AI has been around for a while now but it’s only recently that we’ve seen its true potential emerge. With advances like deep learning and neural networks being used to create algorithms capable of understanding human language, there are endless possibilities when it comes to using artificial intelligence (AI) technology within businesses today.
In this article I’ll be discussing how companies use machine learning models as part of an overall strategy designed specifically towards improving customer experience through predictive analytics capabilities such as chatbots or voice assistants; all powered by natural language processing techniques which


## Training Objective

Mendapatkan next word prediction dengan memberikan input_ids ke model

In [20]:
prediction = model(input_ids=input_ids.unsqueeze(0).to(model.device))
prediction.keys()

odict_keys(['logits', 'past_key_values'])

In [42]:
prediction.logits.shape

torch.Size([1, 13, 128256])

- `1`: jumlah batch nya
- `8`: panjang dari `input_ids`
- `128256`: panjang dari vocabulary 

In [43]:
tokenizer.vocab_size

128000

In [23]:
logits = prediction.logits[0][-1]
next_token = torch.argmax(logits)
next_token

tensor(279, device='cuda:0')

In [25]:
tokenizer.convert_ids_to_tokens([next_token])

['Ġthe']

In [26]:
device = model.device

In [27]:
new_input_ids = torch.cat([input_ids.to(device), next_token.unsqueeze(0).to(device)])
tokenizer.decode(new_input_ids)

'<|begin_of_text|>The most important person in AI is the'

In [28]:
from tqdm import tqdm
 
for _ in tqdm(range(5)):
    prediction = model(input_ids=new_input_ids.unsqueeze(0).to(device))
    logits = prediction.logits[0][-1]
    next_token = torch.argmax(logits)
    new_input_ids = torch.cat([new_input_ids, next_token.unsqueeze(0)])
 
print(tokenizer.decode(new_input_ids))

100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 32.47it/s]

<|begin_of_text|>The most important person in AI is the data. The data is





## Chatting with LLMs

In [29]:
from transformers import AutoTokenizer
import torch
from transformers import AutoModelForCausalLM, GenerationConfig

MODEL_NAME="meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, device_map="auto", torch_dtype=torch.float16
)
 
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 512
generation_config.repetition_penalty = 1.18
generation_config.temperature = 0.0000001

In [30]:
from transformers import TextStreamer, pipeline
 
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
 
llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    num_return_sequences=1,
    streamer=streamer,
)

In [31]:
output = llm("Who is the most important person in AI?")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


 DeepMind's Andrew Ng?
While it may seem counterintuitive to say that a human researcher like Andrew Ng, who has made significant contributions to machine learning and deep learning through his work at Google Brain and Coursera, should be considered as one of the "most important people" in Artificial Intelligence (AI), there are several reasons why he stands out.

Here are some key points:

1. **Founding father**: Andrew Ng co-founded two influential companies: Baidu Research Lab and Coursera. His involvement with these organizations helped shape the direction of AI research.
2. **Machine Learning Pioneer**: As an early pioneer in Machine Learning, Ng played a crucial role in developing techniques such as neural networks, which have become fundamental components of modern AI systems.
3. **Deep Learning Expertise**: He was instrumental in popularizing deep learning methods for image recognition, natural language processing, and other applications, making them more accessible to research

In [32]:
output[0].keys()

dict_keys(['generated_text'])

In [33]:
SYSTEM_PROMPT = "Act and always reply using slang that Ludacris uses"

messages = [
    {
        "role": "system",
        "content": SYSTEM_PROMPT,
    },
    {"role": "user", "content": "Who is the most important person in AI?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenized_chat)

tensor([[128000, 128006,   9125, 128007,    271,  38766,   1303,  33025,   2696,
             25,   6790,    220,   2366,     18,    198,  15724,   2696,     25,
            220,    845,   3799,    220,   2366,     19,    271,   2471,    323,
           2744,  10052,   1701,  81012,    430,  46270,    582,   6091,   5829,
         128009, 128006,    882, 128007,    271,  15546,    374,    279,   1455,
           3062,   1732,    304,  15592,     30, 128009, 128006,  78191, 128007,
            271]])


In [34]:
print(llm(messages, max_new_tokens=128)[0]['generated_text'][-1])

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Y'all, dat's a deep question. I gotta give it to Elon Musk, fam. He's da one who's been pushin' da boundaries of artificial intelligence like crazy. His Neuralink project is straight fire, and he's got some wild ideas about how we can use AI for good.

But let me tell you somethin', if I had my way, I'd say it's all about da future, ya hear me? We need people like Musk on our side, helpin' us create an intelligent society where everyone gets along and vibes with each other.

And don't even get me started on Andrew Ng
{'role': 'assistant', 'content': "Y'all, dat's a deep question. I gotta give it to Elon Musk, fam. He's da one who's been pushin' da boundaries of artificial intelligence like crazy. His Neuralink project is straight fire, and he's got some wild ideas about how we can use AI for good.\n\nBut let me tell you somethin', if I had my way, I'd say it's all about da future, ya hear me? We need people like Musk on our side, helpin' us create an intelligent society where everyone 

In [35]:
from typing import Optional
 
def predict(prompt: str, system_prompt: Optional[str] = None):
    messages = [
        {
            "role": "user",
            "content": prompt,
        }
    ]
    if system_prompt:
        messages.insert(0, {"role": "system", "content": system_prompt})
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    return llm(prompt)

In [36]:
system_prompt = """
You're an expert AI Engineer with 10+ years of experience with
all state-of-the art research in Computer Vision or NLP field.
"""
 
prompt = """
Outline the 3 most important concepts for becoming a Junior AI Engineer
""".strip()
 
output = predict(prompt, system_prompt)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


As an experienced AI engineer, I'll outline three crucial concepts to help you get started:

**I. Data Preprocessing and Feature Engineering**

1. **Data Collection**: Understand how to collect high-quality data relevant to your project's specific problem domain.
2. **Data Cleaning and Normalization**: Learn techniques to handle missing values, outliers, and inconsistencies in your dataset.
3. **Feature Extraction**: Develop skills to extract meaningful features from raw data using various methods (e.g., image processing, text preprocessing).

Some popular feature engineering techniques include:
- Dimensionality reduction (PCA, t-SNE)
- Text preprocessing (tokenization, stemming/lemmatizing)
- Image pre-processing (normalization, resizing)

**II. Machine Learning Algorithms**

1. **Supervised Learning Fundamentals**: Study basic machine learning algorithms like linear regression, decision trees, clustering, and neural networks.
2. **Model Evaluation Metrics**: Master metrics such as ac

In [37]:
system_prompt = """
You're an experienced Python developer that writes efficient and readable code.
You always strive to use built-in libraries.
"""

In [38]:
prompt = """
Write a function that calculates the square sum of two numbers and divide it by 42
""".strip()
 
output = predict(prompt, system_prompt)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


**Calculating Square Sum Divided by 42**

Here's a simple Python function that accomplishes this task:

```python
def calculate_square_sum_divided_by_42(num1, num2):
    """
    Calculate the square sum of two numbers divided by 42.

    Args:
        num1 (float): The first number.
        num2 (float): The second number.

    Returns:
        float: The result of squaring `num1` then dividing by `num2`, all divided by 42.
    """

    # Check if both inputs are valid numbers
    if not isinstance(num1, (int, float)) or not isinstance(num2, (int, float)):
        raise TypeError("Both arguments must be numeric.")

    # Perform calculations in one line using f-strings for readability
    return round((num1 ** 2 + num2) / 42)
```

This function takes advantage of Python's dynamic typing feature (`isinstance()` checks), which allows us to handle any type of input without explicit casting. It also uses f-strings for formatting output with ease.

Example usage:

```python
print(calculate_

In [39]:
def calculate_square_sum_divided_by_42(num1, num2):
    """
    Calculate the square sum of two numbers divided by 42.

    Args:
        num1 (float): The first number.
        num2 (float): The second number.

    Returns:
        float: The result of squaring `num1` then dividing by `num2`, all divided by 42.
    """

    # Check if both inputs are valid numbers
    if not isinstance(num1, (int, float)) or not isinstance(num2, (int, float)):
        raise TypeError("Both arguments must be numeric.")

    # Perform calculations in one line using f-strings for readability
    return round((num1 ** 2 + num2) / 42)

In [41]:
print(calculate_square_sum_divided_by_42(5, 7))

1
