Este pipeline toma historias de usuario del dataset salony_train.csv y las descompone
en tareas de desarrollo

In [2]:
import pandas as pd
import argparse
from pathlib import Path
from typing import Dict

from simple_pipeline import SimplePipeline
from simple_pipeline.steps import LoadDataFrame, OllamaLLMStep

In [3]:
def create_task_generation_prompt(row: Dict) -> str:
    """
    Crea el prompt para generar tareas a partir de una historia de usuario del dataset Salony.
    
    Args:
        row: Fila del DataFrame con la columna 'input' que contiene la historia
    
    Returns:
        Prompt formateado
    """
    user_story = row['input'].strip()
    
    prompt = f"""Below is an instruction that describes a task, paired with an input that provides a user story.

Write a response that appropriately completes the request.


Instruction:

Break this user story into smaller development tasks to help the developers implement it efficiently. You can divide this user story into as many tasks as needed, depending on its complexity. Each task must be unique, actionable, and non-overlapping.

Use the following format for the response:

1. summary: ‚Äπtask summary 1‚Ä∫
description: ‚Äπtask description 1‚Ä∫
2. summary: ‚Äπtask summary 2‚Ä∫
description: ‚Äπtask description 2‚Ä∫

N. summary: ‚Äπtask summary N‚Ä∫
description: ‚Äπtask description N‚Ä∫


Input:

{user_story}


Response:"""
    
    return prompt

In [4]:
def run_salony_pipeline(
    output_csv: str,
    model_name: str = "llama3.1:8b",
    batch_size: int = 2,
    temperature: float = 0.3,
    num_predict: int = 1000,
    sample_size: int = None
):
    """
    Ejecuta el pipeline de generaci√≥n de tareas para historias de usuario Salony.
    
    Args:
        output_csv: Ruta donde guardar el resultado
        model_name: Modelo de Ollama a usar
        batch_size: N√∫mero de historias a procesar simult√°neamente
        temperature: Temperatura para generaci√≥n
        num_predict: Tokens m√°ximos a generar
        sample_size: Si se especifica, procesa solo N historias (para pruebas)
    """
    
    print(f"\n{'='*80}")
    print("üöÄ SALONY USER STORIES TO TASKS PIPELINE")
    print(f"{'='*80}\n")
    
    # Cargar datos - Usar ruta relativa desde el notebook
    input_csv = Path("../data/salony_train.csv")
    print(f"üì• Cargando datos desde: {input_csv}")
    
    if not input_csv.exists():
        raise FileNotFoundError(f"No se encontr√≥ el archivo: {input_csv}")
    
    df = pd.read_csv(input_csv)
    
    # Eliminar la primera columna si es un √≠ndice
    if df.columns[0] == 'Unnamed: 0' or df.columns[0] == '':
        df = df.iloc[:, 1:]
    
    print(f"   ‚úì {len(df)} historias cargadas")
    
    # Verificar columna 'input'
    if 'input' not in df.columns:
        raise ValueError("El CSV debe tener una columna 'input' con las historias de usuario")
    
    # Aplicar sampling si se solicita
    if sample_size:
        df = df.head(sample_size)
        print(f"   ‚ÑπÔ∏è  Procesando solo {sample_size} historias (modo muestra)")
    
    # Limpiar datos
    df = df.dropna(subset=['input'])
    df['input'] = df['input'].str.strip()
    
    # Crear pipeline
    print(f"\n‚öôÔ∏è Configurando pipeline:")
    print(f"   Modelo: {model_name}")
    print(f"   Batch size: {batch_size}")
    print(f"   Temperature: {temperature}")
    print(f"   Historias a procesar: {len(df)}")
    
    pipeline = SimplePipeline(
        name="salony-tasks-pipeline",
        description="Pipeline para generar tareas de desarrollo del dataset Salony"
    )
    
    pipeline.add_step(
        LoadDataFrame(name="load", df=df)
    )
    
    pipeline.add_step(
        OllamaLLMStep(
            name="generate_tasks",
            model_name=model_name,
            prompt_column="input",
            output_column="tasks",
            prompt_template=create_task_generation_prompt,
            system_prompt="You are an expert software development lead who excels at breaking down user stories into clear, actionable development tasks.",
            batch_size=batch_size,
            generation_kwargs={
                "temperature": temperature,
                "num_predict": num_predict
            },
        )
    )
    
    # Ejecutar
    print(f"\nüîÑ Procesando historias...\n")
    result_df = pipeline.run(use_cache=False)
    
    # Guardar
    print(f"\nüíæ Guardando resultados...")
    result_df.to_csv(output_csv, index=False)
    print(f"   ‚úì CSV guardado: {output_csv}")
    print(f"   ‚úì {len(result_df)} historias procesadas")
    
    print(f"\n{'='*80}")
    print("‚úÖ PIPELINE COMPLETADO EXITOSAMENTE")
    print(f"{'='*80}\n")
    
    # Mostrar ejemplo
    print("üìã Ejemplo de resultado (primeras 3 filas):\n")
    for idx, row in result_df.head(3).iterrows():
        print(f"üîπ Historia #{idx}:")
        print(f"   Input: {row['input'][:100]}...")
        if 'tasks' in row and pd.notna(row['tasks']):
            print(f"   Tasks: {row['tasks'][:200]}...")
        print()


In [6]:
run_salony_pipeline(
            output_csv="salony_tasks_output.csv",
            model_name="llama3.1:8b",
            batch_size=2,
            temperature=0.3,
            num_predict=1000,
            sample_size=5
        )

2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Added step: load
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Added step: generate_tasks
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Starting pipeline: salony-tasks-pipeline
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Number of steps: 2
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Executing generator step: load
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Added step: generate_tasks
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Starting pipeline: salony-tasks-pipeline
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Number of steps: 2
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Executing generator step: load
2025-10-20 17:13:55 - SimplePipeline.salony-tasks-pipeline - INFO - Executing step: generate_tasks
2025-10-20 17:13:55 - SimplePipeline


üöÄ SALONY USER STORIES TO TASKS PIPELINE

üì• Cargando datos desde: ../data/salony_train.csv
   ‚úì 1999 historias cargadas
   ‚ÑπÔ∏è  Procesando solo 5 historias (modo muestra)

‚öôÔ∏è Configurando pipeline:
   Modelo: llama3.1:8b
   Batch size: 2
   Temperature: 0.3
   Historias a procesar: 5

üîÑ Procesando historias...



Processing generate_tasks:   0%|          | 0/5 [00:00<?, ?it/s]2025-10-20 17:14:21 - OllamaLLMStep.generate_tasks - INFO - Generation for row 0: 1. summary: Retrieve Transaction History Data
description: Develop an API endpoint to fetch the user's transaction history from the database, including relevant details such as date, amount, and description.

2. summary: Display Transaction History on User Interface
description: Create a user interface component (e.g., table or chart) to display the retrieved transaction history data in a readable format, allowing users to easily navigate through their past transactions.

3. summary: Implement Record-Keeping Functionality
description: Develop a feature that enables users to mark specific transactions as "recorded" or "saved," allowing them to quickly identify and access important transactions when needed.

4. summary: Enhance Search and Filtering Capabilities
description: Add search functionality to the transaction history display, enabling u


üíæ Guardando resultados...
   ‚úì CSV guardado: salony_tasks_output.csv
   ‚úì 5 historias procesadas

‚úÖ PIPELINE COMPLETADO EXITOSAMENTE

üìã Ejemplo de resultado (primeras 3 filas):

üîπ Historia #0:
   Input: As a user, I want to be able to check transaction history and keep a record of it, so that I can go ...
   Tasks: 1. summary: Retrieve Transaction History Data
description: Develop an API endpoint to fetch the user's transaction history from the database, including relevant details such as date, amount, and descr...

üîπ Historia #1:
   Input: As a researcher, I want to have the ability to insert Greek symbols into my logbook entries....
   Tasks: 1. summary: Research and Document Required Greek Symbols
description: Identify all necessary Greek symbols that will be supported in the logbook entries, document their Unicode values, and create a re...

üîπ Historia #2:
   Input: As a DigitalRecords Archivist, I want to have the repository to lift embargoes on the release d