https://arxiv.org/pdf/2104.07545

In [2]:
import os
from dotenv import load_dotenv
from huggingface_hub import login
from datasets import load_dataset
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
load_dotenv()

True

In [4]:
os.chdir('..')
from app.llm.llm import SummaryLlm

In [5]:
login(token = os.environ.get("HF_TOKEN") )

dataset_name = "AndresR2909/youtube_transcriptions_summaries_2025_gpt4.1"

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [6]:
splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
df_test = pd.read_parquet(f"hf://datasets/{dataset_name}/" + splits["test"])

In [7]:
print(df_test.loc[0, 'prompt'])

Actúa como un experto en trading y análisis de mercados financieros.

INSTRUCCIONES:
1. Analiza el texto proporcionado entre las líneas de guiones.
2. Elabora un informe estructurado siguiendo exactamente el formato solicitado.
3. Utiliza un lenguaje claro, conciso y relevante para inversores.
4. No inventes información; limita tu análisis únicamente al contenido del texto.

FORMATO DEL INFORME:
- **Introducción:** Presenta una visión general del tema tratado.
- **Puntos clave:** Resume los aspectos más importantes en formato de viñetas.
- **Conclusión:** Ofrece un cierre que sintetice el análisis realizado.
- **Activos recomendados:** Extrae y lista, en una sección aparte, todos los activos mencionados como opciones de inversión.

Texto a analizar:
------------
¿cuanto dinero necesito para vivir del trading? ¿ libertad financiera? vivir de la bolsa?. un saludo para todos hoy vamos a hablar de un tema de mucho interés algo que nos preguntan mucho constantemente y es cuánto dinero neces

In [8]:
df_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   channel_name       221 non-null    object 
 1   video_id           221 non-null    object 
 2   source             221 non-null    object 
 3   publish_date       158 non-null    object 
 4   duration           221 non-null    float64
 5   last_update_date   221 non-null    object 
 6   title              221 non-null    object 
 7   text               221 non-null    object 
 8   year               221 non-null    int64  
 9   month              158 non-null    float64
 10  number_of_tokenks  221 non-null    int64  
 11  prompt             221 non-null    object 
 12  summary            221 non-null    object 
 13  key_terms          221 non-null    object 
 14  __index_level_0__  221 non-null    int64  
dtypes: float64(2), int64(3), object(10)
memory usage: 26.0+ KB


In [9]:
selected_columns = ['video_id', 'channel_name','prompt','text', 'summary',]
df_test_slm = df_test[selected_columns]
df_test_slm

Unnamed: 0,video_id,channel_name,prompt,text,summary
0,nn-WwZRAAO0,ARENA ALFA,Actúa como un experto en trading y análisis de...,¿cuanto dinero necesito para vivir del trading...,- **Introducción:** \nEl texto analiza de man...
1,YyNzWme4cwg,ARENA ALFA,Actúa como un experto en trading y análisis de...,el secreto de buffett: ventas masivas y auge d...,- **Introducción:** \nEl texto analiza los re...
2,qVLNbTt9xSI,ARENA ALFA,Actúa como un experto en trading y análisis de...,porsche sale a la bolsa y crisis mercados tras...,- **Introducción:** \nEl texto analiza la sal...
3,cA1gS9jSeFU,ARENA ALFA,Actúa como un experto en trading y análisis de...,la recesion economica de 2023 ¿una oportunidad...,**Informe de Análisis: Recesión Económica de 2...
4,aQj_NEG3h8Q,ARENA ALFA,Actúa como un experto en trading y análisis de...,conconcreto en reorganizacion - ¿que significa...,- **Introducción:** \nEl texto analiza la sit...
...,...,...,...,...,...
216,kE4PHBzjK9w,USACRYPTONOTICIAS,Actúa como un experto en trading y análisis de...,bitcoin: cuidado | crypto | btc. para decirles...,- **Introducción:** \nEl texto analizado ofre...
217,84zFrrHaBCw,USACRYPTONOTICIAS,Actúa como un experto en trading y análisis de...,bitcoin en español | bitcoin noticias | bitcoi...,- **Introducción:** \nEl texto analizado corr...
218,F4-oXv3oB9w,USACRYPTONOTICIAS,Actúa como un experto en trading y análisis de...,"bitcoin: podría llegar a los 24,700 otra vez |...",**Informe de Análisis de Mercado Cripto**\n\n-...
219,oi9z9YkeUZ8,USACRYPTONOTICIAS,Actúa como un experto en trading y análisis de...,bitcoin: cuidado con esto. a ver ahora sí pare...,- **Introducción:** \nEl texto analizado corr...


In [25]:
def generate_summaries(df, models:list, prompts:list):
    from app.llm.llm import SummaryLlm
    
    def add_llm_summary_column(df, summary_llm, col_name="slm_summary"):
        df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))
        return df

    def add_prompt_column(df, summary_llm, col_name="slm_prompt"):
        df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
        return df
   
    df_in= df.copy()
    for model in models:
        for prompt in prompts:
            model_name = model.split('/')[-1]
            file_name = f"test_slm_{model_name.replace('.','_').replace(':','_').replace('-','_')}_{prompt}"
            llm_config = {
                "type": "ollama",
                "model": model,
                "base_url": "http://localhost:11434",
            }
            summary_llm = SummaryLlm(config=llm_config, prompt_name=prompt)
            add_prompt_column(df_in, summary_llm, col_name="slm_prompt")
            add_llm_summary_column(df_in, summary_llm, col_name="slm_summary")
            print(f"Saving summaries for {file_name}")
            df_in.to_csv(f"data/slm_summaries/{file_name}.csv", sep=";",index=False)

In [None]:
prompts =['v1_summary_expert_one_shot', 'v1_summary_expert','v2_summary_expert_one_shot','v2_summary_expert','v3_summary_expert']
models = ['llama3.2:3b-instruct-fp16', 'phi4:latest','deepseek-r1:8b']
generate_summaries(df_test_slm, models, prompts)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_3b_instruct_fp16_v1_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(la

Saving summaries for test_slm_llama3_2_3b_instruct_fp16_v1_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_3b_instruct_fp16_v2_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(la

Saving summaries for test_slm_llama3_2_3b_instruct_fp16_v2_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_3b_instruct_fp16_v3_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_phi4_latest_v1_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_phi4_latest_v1_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_phi4_latest_v2_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_phi4_latest_v2_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_phi4_latest_v3_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_deepseek_r1_8b_v1_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_deepseek_r1_8b_v1_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_deepseek_r1_8b_v2_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_deepseek_r1_8b_v2_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))


Saving summaries for test_slm_deepseek_r1_8b_v3_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


In [None]:
prompts =['v1_summary_expert_one_shot', 'v1_summary_expert','v2_summary_expert_one_shot','v2_summary_expert','v3_summary_expert']
models = ['llama3.2:1b-instruct-fp16']

generate_summaries(df_test_slm, models, prompts)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_1b_instruct_fp16_v1_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_1b_instruct_fp16_v1_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_1b_instruct_fp16_v2_summary_expert_one_shot


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_1b_instruct_fp16_v2_summary_expert


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summary_prompt_template.replace("{context}", x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col_name] = df["text"].apply(lambda x: summary_llm.summarize(x))


Saving summaries for test_slm_llama3_2_1b_instruct_fp16_v3_summary_expert


# using finetunning models

In [37]:
prompts =['v3_summary_expert']#,'v2_summary_expert','v1_summary_expert']
models = ['hf.co/AndresR2909/hf-llama-3.2-3b-finetuned_qlora_bnb_nf4_v2_gguf:latest',
          'hf.co/AndresR2909/llama-3.2-1b-finetuned_qlora_v5_gguf:latest']
generate_summaries(df_test_slm, models, prompts)

Saving summaries for test_slm_hf_llama_3_2_3b_finetuned_qlora_bnb_nf4_v2_gguf_latest_v3_summary_expert
Saving summaries for test_slm_llama_3_2_1b_finetuned_qlora_v5_gguf_latest_v3_summary_expert


In [36]:
prompts =['v3_summary_expert']#,'v2_summary_expert','v1_summary_expert']
models = ['hf.co/AndresR2909/llama-3.2-3b-finetuned_qlora_bnb_nf4_v2-gguf_q8_0:latest']
generate_summaries(df_test_slm, models, prompts)

Saving summaries for test_slm_llama_3_2_3b_finetuned_qlora_bnb_nf4_v2_gguf_q8_0_latest_v3_summary_expert
