&nbsp;
# Step 1: Download model checkpoints. 

If you want to further finetune the instruction variant of Llama Llama 3.2 3B, you can download it via the following command:

```bash
litgpt download meta-llama/Llama-3.2-3B-Instruct --access_token hf_...
```

(Note that some models, such as Llama 3.2, require that you accept Meta AI's terms of service for this model, and you need to use a special access token via the `--access_token ...` option. For more information, visit the respective Model Hub website, e.g., [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B). The access token can be created under your Model Hub in the `Profile > Access Tokens` menu.)

(To list the other available models, execute `litgpt download list` .)


&nbsp;  
# **Step 2: Finetune the Model**  

This section illustrates how to fine-tune the model on a custom dataset. In this case, the **Supervised Fine-Tuning (SFT) dataset** is located in the folder **`data/train.csv`**.  

### **Data Preparation:**  
1. **Load the dataset** from `data/train.csv`.  
2. **Split it into training and validation sets** to ensure proper evaluation.  
3. **Convert the dataset into the lit-GPT format**, following this structure:  

```json
[
  {
    "instruction": "How does the refund process work for canceled orders?",
    "input": "",
    "output": "If you cancel an order, the refund process depends on the payment method used. Typically, refunds are processed within 5-7 business days. Please check your bank statement for confirmation."
  },
  {
    "instruction": "What payment methods do you accept?",
    "input": "",
    "output": "We accept credit cards, PayPal, and Apple Pay. Please visit our payments page for more details."
  },
  ...
]
```

To finetune the model, we use the following command: 

```bash
litgpt finetune_lora meta-llama/Llama-3.2-1B-Instruct \
  --data JSON \
  --data.json_path my_custom_dataset.json \
  --data.val_split_fraction 0.1 \
  --train.epochs 1 \
  --out_dir out/llama-3.2-3b-finetuned \
  --precision bf16-true
```


In [None]:
import os
import json
from dotenv import load_dotenv
from huggingface_hub import login
from transformers import AutoModelForCausalLM,AutoTokenizer
import wandb
import torch
from dotenv import load_dotenv

In [26]:
load_dotenv()

True

https://github.com/Lightning-AI/litgpt/blob/main/tutorials/finetune_full.md#tune-on-your-dataset

In [14]:
import pandas as pd
dataset_name = "AndresR2909/youtube_transcriptions_summaries_2025_gpt4.1"
splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
df_test = pd.read_parquet(f"hf://datasets/{dataset_name}/" + splits["test"])
df_train = pd.read_parquet(f"hf://datasets/{dataset_name}/" + splits["train"])

In [15]:
df_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   channel_name       221 non-null    object 
 1   video_id           221 non-null    object 
 2   source             221 non-null    object 
 3   publish_date       158 non-null    object 
 4   duration           221 non-null    float64
 5   last_update_date   221 non-null    object 
 6   title              221 non-null    object 
 7   text               221 non-null    object 
 8   year               221 non-null    int64  
 9   month              158 non-null    float64
 10  number_of_tokenks  221 non-null    int64  
 11  prompt             221 non-null    object 
 12  summary            221 non-null    object 
 13  key_terms          221 non-null    object 
 14  __index_level_0__  221 non-null    int64  
dtypes: float64(2), int64(3), object(10)
memory usage: 26.0+ KB


In [27]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2004 entries, 0 to 2003
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   channel_name       2004 non-null   object 
 1   video_id           2004 non-null   object 
 2   source             2004 non-null   object 
 3   publish_date       1452 non-null   object 
 4   duration           2004 non-null   float64
 5   last_update_date   2004 non-null   object 
 6   title              2004 non-null   object 
 7   text               2004 non-null   object 
 8   year               2004 non-null   int64  
 9   month              1452 non-null   float64
 10  number_of_tokenks  2004 non-null   int64  
 11  prompt             2004 non-null   object 
 12  summary            2004 non-null   object 
 13  key_terms          2004 non-null   object 
 14  __index_level_0__  2004 non-null   int64  
dtypes: float64(2), int64(3), object(10)
memory usage: 235.0+ KB


In [28]:
tokenizer = AutoTokenizer.from_pretrained('checkpoints/meta-llama/Llama-3.2-3B-Instruct')
df_train["llama_tokens"] = df_train["text"].apply(lambda x: len(tokenizer.encode(str(x))))
df_mas_largos = df_train[df_train["llama_tokens"] >= 4096] #8192

In [29]:
new_df_train = df_train[df_train["llama_tokens"] < 4096]

In [30]:
new_df_train[['llama_tokens']].describe()

Unnamed: 0,llama_tokens
count,1021.0
mean,2493.77669
std,1053.139173
min,42.0
25%,1935.0
50%,2702.0
75%,3324.0
max,4084.0


In [31]:
df_mas_largos[['channel_name','llama_tokens']].describe(include="all")

Unnamed: 0,channel_name,llama_tokens
count,983,983.0
unique,6,
top,USACRYPTONOTICIAS,
freq,688,
mean,,14273.471007
std,,7589.613838
min,,4096.0
25%,,6055.5
50%,,14203.0
75%,,21249.5


In [16]:
# Leer la instrucción desde el archivo
with open('prompts/v3_summary_expert.txt', 'r', encoding='utf-8') as f:
    instruction_template = f.read()

print(instruction_template)

Actúa como un experto en trading y análisis de mercados financieros.

INSTRUCCIONES:
1. Analiza el texto proporcionado entre las líneas de guiones.
2. Elabora un informe estructurado siguiendo exactamente el formato solicitado.
3. Utiliza un lenguaje claro, conciso y relevante para inversores.
4. No inventes información; limita tu análisis únicamente al contenido del texto.

FORMATO DEL INFORME:
- **Introducción:** Presenta una visión general del tema tratado.
- **Puntos clave:** Resume los aspectos más importantes en formato de viñetas.
- **Conclusión:** Ofrece un cierre que sintetice el análisis realizado.
- **Activos recomendados:** Extrae y lista, en una sección aparte, todos los activos mencionados como opciones de inversión.

Texto a analizar:
------------
{context}
------------

Recuerda: Sigue el formato solicitado y asegúrate de que la información sea precisa y útil para inversores.



In [16]:
def convert_to_json_format(df,instruction):
    return [
        {
            "instruction": instruction,
            "input": row["text"],
            "output": row["summary"]
        }
        for _, row in df.iterrows()
    ]

# Convertir train y test
train_data_litgpt = convert_to_json_format(new_df_train,instruction_template)
test_data_litgpt = convert_to_json_format(df_test,instruction_template)

# Guardar a archivos JSON
with open('data/train_data.json', 'w', encoding='utf-8') as f:
    json.dump(train_data_litgpt, f, ensure_ascii=False, indent=4)

with open('data/test_data.json', 'w', encoding='utf-8') as f:
    json.dump(test_data_litgpt, f, ensure_ascii=False, indent=4)

print("Archivos guardados: data/train_data.json y data/test_data.json")

Archivos guardados: data/train_data.json y data/test_data.json


```bash

litgpt finetune_lora meta-llama/Llama-3.2-3B-Instruct \
  --devices 1 \
  --data JSON \
  --data.json_path data/train_data.json \
  --data.val_split_fraction 0.1 \
  --train.epochs 1 \
  --train.max_seq_length 4096 \
  --train.global_batch_size 2 \
  --eval.max_new_tokens 800 \
  --out_dir out/llama-3.2-3b-finetuned_bnb_int8 \
  --logger_name wandb \
  --precision bf16-true \
  --quantize bnb.nf4

```

&nbsp;  
# **Step 3: Deploy the Model**  

This section explains how to deploy the fine-tuned model and use it to generate responses for the **questions in `data/test.csv`**. We will set up an inference server using [LitServe](https://github.com/Lightning-AI/LitServe), a high-performance serving tool integrated into **lit-GPT**.


## **3.1: Query the Inference Server with `df_test data`**  

To launch an inference server that serves the fine-tuned model (e.g., **Llama 3.2 1B** stored in `checkpoints/meta-llama/Llama-3.2-1B`), use the following command:

```bash
litgpt serve out/llama-3.2-3b-finetuned_bnb_nf4_v2/final --max_new_tokens 1200 --temperature 0.0 --top_p 0.9
litgpt serve out/llama-3.2-3b-finetuned_bnb_nf4/final --max_new_tokens 1200 --temperature 0.0 --top_p 0.9
litgpt serve out/llama-3.2-3b-finetuned_v1/final --max_new_tokens 1200 --temperature 0.0 --top_p 0.9
litgpt serve out/llama-3.2-1b-finetuned_v2/final --max_new_tokens 1200 --temperature 0.0 --top_p 0.9
litgpt serve out/llama-3.2-1b-finetuned_v5/final --max_new_tokens 1200 --temperature 0.0 --top_p 0.9
```


In [17]:
tokenizer = AutoTokenizer.from_pretrained('checkpoints/meta-llama/Llama-3.2-3B-Instruct')
df_test["llama_tokens"] = df_test["text"].apply(lambda x: len(tokenizer.encode(str(x))))
new_df_test = df_test[df_test["llama_tokens"] < 8192]

In [18]:
df_test[['llama_tokens']].describe()

Unnamed: 0,llama_tokens
count,221.0
mean,7398.99095
std,7619.794179
min,39.0
25%,2379.0
50%,3658.0
75%,11808.0
max,31448.0


In [19]:
new_df_test[['llama_tokens']].describe()

Unnamed: 0,llama_tokens
count,158.0
mean,3014.689873
std,1506.119699
min,39.0
25%,2042.5
50%,3015.0
75%,3879.25
max,7979.0


In [22]:
import requests
import pandas as pd
import time
from IPython.display import clear_output

# Cargar el dataset de prueba
test_data = new_df_test.copy()

# Lista para almacenar las respuestas generadas
results = []


# Iterar sobre cada instrucción y consultar el modelo
for index, row in test_data.iterrows():
    channel_name = row["channel_name"]
    video_id = row["video_id"]
    input = row["text"]
    query = row["prompt"]
    reference = row["summary"]
    
    try:
        # Realizar una solicitud POST al modelo
        response = requests.post(
            "http://127.0.0.1:8000/predict",
            json={"prompt": query}
        )
        
        # Obtener el texto de la respuesta del modelo
        generated_response = response.json().get('output', '')
        
        # Limpiar la salida anterior
        clear_output(wait=True)

        # Imprimir  respuesta
        print(f"index: {index}")
        print(f"Model Response: {generated_response}\n")
    except Exception as e:
        print(f"error: {e}")
        generated_response = None
    
    # Agregar el resultado a la lista
    results.append({
        "channel_name":channel_name,
        "video_id":video_id,
        "input":input,
        "instruction": instruction_template,
        "prompt":query,
        "generated_response": generated_response,
        "reference":reference
    })


index: 220
Model Response: - **Introducción:**  
El texto analiza el momento en el que los inversores deberían entrar en activos, con especial en bitcoin (BTC), considerando el comportamiento de los mercados y los patrones técnicos que podría llevar al precio. Se hace referencia a técnicas de trading y se mencionan movimientos alcistas en el gráfico semanal y diario.

- **Puntos clave:**
  - Se recomienda buscar entrada en activos como bitcoin, especialmente en el momento de la onda B, considerando posibles movimientos alcistas a corto plazo.
  - Se sugiere priorizar las entradas por cierre de la onda A en el gráfico semanal y posteriormente buscar oportunidades de long en el gráfico semanal, preferiblemente después de una onda B.
  - Es probable que el precio de BTC sea cerca de los 95,000 (un 40% de ganancia sobre los 25,000 dólares), pero se espera esperar a que el precio se acerque a la parte baja de la onda B antes de realizar la entrada.
  - Se menciona que el mercado parece mani

In [24]:
# Crear un DataFrame con los resultados
results_df = pd.DataFrame(results)

# Guardar los resultados en un archivo CSV
results_df.to_csv(f"data/llama-3.2-1b-finetuned_v5.csv", index=False, sep=";")

print("Respuestas guardadas en data")

Respuestas guardadas en data


# step 4:  Merge LoRA weights:

```bash
litgpt merge_lora out/llama-3.2-3b-finetuned_bnb_nf4_v2/final
litgpt merge_lora out/llama-3.2-1b-finetuned_v5/final
litgpt merge_lora out/llama-3.2-3b-finetuned_bnb_nf4/final
litgpt merge_lora out/llama-3.2-3b-finetuned_v1/final
```

# step 5: Convert the finetuning model back into a HF format:

```bash
litgpt convert_from_litgpt out/llama-3.2-3b-finetuned_bnb_nf4_v2/final/ out/hf-llama-3.2-3b-finetuned_bnb_nf4_v2/converted/

litgpt convert_from_litgpt out/llama-3.2-3b-finetuned_bnb_nf4/final/ out/llama-3.2-3b-finetuned_bnb_nf4/converted/

litgpt convert_from_litgpt out/llama-3.2-3b-finetuned_v1/final/ out/llama-3.2-3b-finetuned_v1/converted/

litgpt convert_from_litgpt out/llama-3.2-1b-finetuned_v5/final/ out/hf-llama-3.2-1b-finetuned_v5/converted/
```

# step 6: Instence hf model and push it to hf:

In [39]:
# 2. Crea el modelo
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')


config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

In [3]:
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')

In [4]:
tokenizer.push_to_hub("AndresR2909/hf-llama-3.2-1b-finetuned_v5")

tokenizer.json: 100%|██████████| 17.2M/17.2M [00:45<00:00, 375kB/s] 


CommitInfo(commit_url='https://huggingface.co/AndresR2909/hf-llama-3.2-1b-finetuned_v5/commit/a980d802b28228454740436a221f8c3d6f881705', commit_message='Upload tokenizer', commit_description='', oid='a980d802b28228454740436a221f8c3d6f881705', pr_url=None, repo_url=RepoUrl('https://huggingface.co/AndresR2909/hf-llama-3.2-1b-finetuned_v5', endpoint='https://huggingface.co', repo_type='model', repo_id='AndresR2909/hf-llama-3.2-1b-finetuned_v5'), pr_revision=None, pr_num=None)

In [None]:
# 3. Carga tus pesos
state_dict = torch.load('out/hf-llama-3.2-1b-finetuned_v5/converted/model.pth')
model.load_state_dict(state_dict)

<All keys matched successfully>

In [43]:
# Subes al Hub:
model.push_to_hub("AndresR2909/hf-llama-3.2-1b-finetuned_v5")
tokenizer.push_to_hub("AndresR2909/hf-llama-3.2-1b-finetuned_v5")

model.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/AndresR2909/hf-llama-3.2-1b-finetuned_v5/commit/90693e936700b9eff030ee685c5b7de95deab128', commit_message='Upload LlamaForCausalLM', commit_description='', oid='90693e936700b9eff030ee685c5b7de95deab128', pr_url=None, repo_url=RepoUrl('https://huggingface.co/AndresR2909/hf-llama-3.2-1b-finetuned_v5', endpoint='https://huggingface.co', repo_type='model', repo_id='AndresR2909/hf-llama-3.2-1b-finetuned_v5'), pr_revision=None, pr_num=None)