# Pełny Cykl Życia Modelu Sentymentu

Ten notebook krok po kroku demonstruje cały cykl życia modelu do klasyfikacji sentymentu, od przygotowania danych, przez eksperymenty i wybór najlepszego modelu, aż po jego użycie do predykcji na nowych danych.

## Sekcja 1: Przygotowanie i Konfiguracja

Zakładamy, że dane treningowe zostały już przygotowane i znajdują się w pliku `artifacts/data/all_train.csv`. W tej komórce definiujemy zmienne, których będziemy używać w całym procesie.

In [1]:
import os
import pandas as pd
import json

os.environ['EXPERIMENT_ID'] = "exp_notebook_tutorial"
os.environ['INPUT_DATA'] = "artifacts/data/all_train.csv"

## Sekcja 2: Generowanie Podziałów Danych

Tworzymy podziały danych dla naszego eksperymentu. Użyjemy strategii `train-val` z wydzielonym zbiorem testowym (20% danych), który posłuży do ostatecznej oceny najlepszego modelu.

In [2]:
!uv run cli generate-splits --experiment-id $EXPERIMENT_ID --input-file $INPUT_DATA --test-size 0.2 --backtesting-strategy train-val

[32m2025-10-29 12:56:43.951[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mgenerate_splits[0m:[36m31[0m - [1mCreated directory structure for experiment 'exp_notebook_tutorial'[0m
[32m2025-10-29 12:56:43.970[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mgenerate_splits[0m:[36m39[0m - [1mSaved test set to /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/experiments/exp_notebook_tutorial/test.csv[0m
[32m2025-10-29 12:56:43.970[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mgenerate_splits[0m:[36m41[0m - [1mGenerating backtesting splits with strategy: 'train-val'...[0m
[32m2025-10-29 12:56:43.981[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mgenerate_splits[0m:[36m57[0m - [1m- Generated train/val/test split in /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/experiments/exp_notebook_tutorial/backtesting/train-val[0m
[32m2025-10-29 12:56:43.981[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mgenerate_splits[0m:[36m59[0m - 

## Sekcja 3: Uruchomienie Eksperymentów i Porównanie Modeli

Teraz uruchomimy proces treningu i walidacji dla każdego z wybranych modeli. Po każdym przebiegu wyświetlimy zaktualizowaną tabelę z porównaniem wyników.

### Model: svm-base

In [2]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name svm-base

[32m2025-11-04 20:15:32.042[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-11-04 20:15:32.042[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: SVMModel from models.svm.model[0m
[32m2025-11-04 20:15:32.185[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-11-04 20:15:32.191[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m39[0m - [1mLogging configured. Log file: /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/trained_models/exp_notebook_tutorial_backtesting/fold_0/svm-base/training.log[0m
[32m2025-11-04 20:15:32.191[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m42[0m - [1mCreating SVM pipeline...[0m
[32m2025-11-04 20:15:32.191[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m8

In [3]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-11-04 20:19:08.151[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-11-04 20:19:08.152[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-11-04 20:19:08.152[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-11-04 20:19:08.158[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1                  bert-micro-long  0.643162  0.536105  0.643162  0.576582
2                          finbert  0.822650  0.812028  0.822650  0.815154
3     prompt-ge

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,roberta-base-3,0,0.850427,0.859027,0.850427,0.853571
1,backtesting,roberta-base,0,0.844017,0.850461,0.844017,0.828156
2,backtesting,finbert,0,0.82265,0.812028,0.82265,0.815154
3,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
4,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
5,backtesting,bert-micro-long,0,0.643162,0.536105,0.643162,0.576582
6,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
7,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
8,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
9,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: svm-opt

In [4]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name svm-opt

[32m2025-11-04 20:21:42.430[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-11-04 20:21:42.430[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: SVMModel from models.svm.model[0m
[32m2025-11-04 20:21:42.514[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-11-04 20:21:42.520[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m39[0m - [1mLogging configured. Log file: /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/trained_models/exp_notebook_tutorial_backtesting/fold_0/svm-opt/training.log[0m
[32m2025-11-04 20:21:42.520[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m42[0m - [1mCreating SVM pipeline...[0m
[32m2025-11-04 20:21:42.521[0m | [1mINFO    [0m | [36mmodels.svm.model[0m:[36mtrain[0m:[36m54

In [5]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-11-04 20:22:15.897[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-11-04 20:22:15.897[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-11-04 20:22:15.897[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-11-04 20:22:15.902[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1                  bert-micro-long  0.643162  0.536105  0.643162  0.576582
2                          finbert  0.822650  0.812028  0.822650  0.815154
3     prompt-ge

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,roberta-base-3,0,0.850427,0.859027,0.850427,0.853571
1,backtesting,roberta-base,0,0.844017,0.850461,0.844017,0.828156
2,backtesting,finbert,0,0.82265,0.812028,0.82265,0.815154
3,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
4,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
5,backtesting,bert-micro-long,0,0.643162,0.536105,0.643162,0.576582
6,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
7,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
8,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
9,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: Gemma Zero-shot

In [7]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name prompt-gemma-zeroshot-3-1b-it

[32m2025-10-29 12:57:13.395[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 12:57:13.395[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: PromptGemmaModel from models.prompt_gemma.model[0m
[32m2025-10-29 12:57:13.412[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36m__init__[0m:[36m23[0m - [1mUsing device: mps[0m
[32m2025-10-29 12:57:13.412[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 12:57:13.412[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m39[0m - [1mPromptGemmaModel does not require training. Skipping.[0m
[32m2025-10-29 12:57:13.412[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m52[0m - [1mOutput directory ensured at: /Users/witjakuczunpriv/projects/AI_Ma

### Model: Gemma Few-shot

In [8]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name prompt-gemma-fewshot-3-1b-it

[32m2025-10-29 12:59:25.816[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 12:59:25.816[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: PromptGemmaModel from models.prompt_gemma.model[0m
[32m2025-10-29 12:59:25.830[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36m__init__[0m:[36m23[0m - [1mUsing device: mps[0m
[32m2025-10-29 12:59:25.830[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 12:59:25.830[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m39[0m - [1mPromptGemmaModel does not require training. Skipping.[0m
[32m2025-10-29 12:59:25.830[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m52[0m - [1mOutput directory ensured at: /Users/witjakuczunpriv/projects/AI_Ma

In [9]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 13:03:30.187[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 13:03:30.188[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 13:03:30.189[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 13:03:30.195[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                      model_name  accuracy precision    recall  f1_score
                                      mean      mean      mean      mean
0   prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
1  prompt-gemma-zeroshot-3-1b-it  0.000000  0.000000  0.000000  0.000000
2                       svm-base  0.675214  0.669999  0.675214  0.671415
3                        

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
1,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
2,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
3,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: Gemma Few-shot v2

In [10]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name prompt-gemma-fewshot-v2-3-1b-it

[32m2025-10-29 13:03:33.942[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 13:03:33.942[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: PromptGemmaModel from models.prompt_gemma.model[0m
[32m2025-10-29 13:03:33.957[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36m__init__[0m:[36m23[0m - [1mUsing device: mps[0m
[32m2025-10-29 13:03:33.957[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 13:03:33.957[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m39[0m - [1mPromptGemmaModel does not require training. Skipping.[0m
[32m2025-10-29 13:03:33.957[0m | [1mINFO    [0m | [36mmodels.prompt_gemma.model[0m:[36mtrain[0m:[36m52[0m - [1mOutput directory ensured at: /Users/witjakuczunpriv/projects/AI_Ma

In [11]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 13:08:59.652[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 13:08:59.653[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 13:08:59.653[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 13:08:59.656[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0     prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
1  prompt-gemma-fewshot-v2-3-1b-it  0.621795  0.706001  0.621795  0.613496
2    prompt-gemma-zeroshot-3-1b-it  0.000000  0.000000  0.000000  0.000000
3              

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
1,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
2,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
3,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
4,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: bert-micro-base

In [12]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name bert-micro-base

[32m2025-10-29 13:09:03.419[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 13:09:03.419[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-29 13:09:07.892[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 13:09:07.892[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|█████████████████████████| 3504/3504 [00:00<00:00, 8982.98 examples/s]
[32m2025-10-29 13:09:08.699[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|███████████████████████████| 701/701 [00:00<00:00, 9777.13 examples/s]
[32m2025-10-29 13:09:08.777[0m | [1mINFO  

In [13]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 13:10:48.687[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 13:10:48.687[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 13:10:48.687[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 13:10:48.691[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1     prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
2  prompt-gemma-fewshot-v2-3-1b-it  0.621795  0.706001  0.621795  0.613496
3    prompt-gem

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
1,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
2,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
3,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
4,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
5,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: bert-micro-long

In [14]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name bert-micro-long

[32m2025-10-29 13:10:52.304[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 13:10:52.304[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-29 13:10:55.709[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 13:10:55.709[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|█████████████████████████| 3504/3504 [00:00<00:00, 9139.90 examples/s]
[32m2025-10-29 13:10:56.496[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|███████████████████████████| 701/701 [00:00<00:00, 9324.31 examples/s]
[32m2025-10-29 13:10:56.578[0m | [1mINFO  

In [15]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 13:18:36.989[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 13:18:36.990[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 13:18:36.990[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 13:18:36.993[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1                  bert-micro-long  0.643162  0.536105  0.643162  0.576582
2     prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
3  prompt-gemma

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
1,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
2,backtesting,bert-micro-long,0,0.643162,0.536105,0.643162,0.576582
3,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
4,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
5,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
6,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Model: roberta-base

In [16]:
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name roberta-base

[32m2025-10-29 13:18:40.579[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 13:18:40.579[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-29 13:18:44.016[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 13:18:44.017[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|████████████████████████| 3504/3504 [00:00<00:00, 11571.20 examples/s]
[32m2025-10-29 13:18:44.983[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|██████████████████████████| 701/701 [00:00<00:00, 12395.69 examples/s]
[32m2025-10-29 13:18:45.060[0m | [1mINFO  

In [17]:
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 13:44:52.709[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 13:44:52.709[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 13:44:52.709[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 13:44:52.713[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1                  bert-micro-long  0.643162  0.536105  0.643162  0.576582
2     prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
3  prompt-gemma

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,roberta-base,0,0.844017,0.850461,0.844017,0.828156
1,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
2,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
3,backtesting,bert-micro-long,0,0.643162,0.536105,0.643162,0.576582
4,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
5,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
6,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
7,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


### Eksperyment: Zwiększenie liczby epok dla `roberta-base`

Tworzymy nową konfigurację `roberta-base-3` w locie, zmieniając liczbę epok treningowych, aby sprawdzić, czy dłuższy trening poprawi wynik.

In [18]:
# Wczytanie bazowej konfiguracji
with open('model_configs/roberta-base.json', 'r') as f:
    config = json.load(f)

# Modyfikacja hiperparametru
config["training_arguments"]["num_train_epochs"] = 3

# Zapis nowej konfiguracji
with open('model_configs/roberta-base-3.json', 'w') as f:
    json.dump(config, f, indent=4)

print("Utworzono nową konfigurację: model_configs/roberta-base-3.json")

Utworzono nową konfigurację: model_configs/roberta-base-3.json


In [19]:
# Uruchomienie eksperymentu dla nowej konfiguracji
!uv run cli run-backtesting --experiment-id $EXPERIMENT_ID --model-config-name roberta-base-3

[32m2025-10-29 13:44:56.379[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m58[0m - [1m--- Backtesting Fold 0 ---[0m
[32m2025-10-29 13:44:56.379[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-29 13:45:00.429[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_backtesting[0m:[36m68[0m - [1mTraining model...[0m
[32m2025-10-29 13:45:00.429[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|████████████████████████| 3504/3504 [00:00<00:00, 11285.38 examples/s]
[32m2025-10-29 13:45:01.359[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|██████████████████████████| 701/701 [00:00<00:00, 12308.40 examples/s]
[32m2025-10-29 13:45:01.436[0m | [1mINFO  

In [20]:
# Ostateczne porównanie
!uv run cli compare-models --experiment-id $EXPERIMENT_ID --output-file artifacts/experiments/$EXPERIMENT_ID/comparison.csv --run-type backtesting
comparison_df = pd.read_csv(f"artifacts/experiments/{os.environ['EXPERIMENT_ID']}/comparison.csv")
display(comparison_df)

[32m2025-10-29 15:01:00.423[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m79[0m - [1m--- Model Comparison for Experiment 'exp_notebook_tutorial' ---[0m
[32m2025-10-29 15:01:00.424[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m82[0m - [1m
--- Run Type: backtesting ---[0m
[32m2025-10-29 15:01:00.425[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m91[0m - [1mSummary for backtesting (train-val strategy):[0m
[32m2025-10-29 15:01:00.431[0m | [1mINFO    [0m | [36mevaluate[0m:[36mcompare_models[0m:[36m95[0m - [1m
                        model_name  accuracy precision    recall  f1_score
                                        mean      mean      mean      mean
0                  bert-micro-base  0.536325  0.287644  0.536325  0.374458
1                  bert-micro-long  0.643162  0.536105  0.643162  0.576582
2     prompt-gemma-fewshot-3-1b-it  0.606838  0.727688  0.606838  0.626360
3  prompt-gemma

Unnamed: 0,run_type,model_name,fold,accuracy,precision,recall,f1_score
0,backtesting,roberta-base-3,0,0.850427,0.859027,0.850427,0.853571
1,backtesting,roberta-base,0,0.844017,0.850461,0.844017,0.828156
2,backtesting,svm-opt,0,0.705128,0.687651,0.705128,0.688811
3,backtesting,svm-base,0,0.675214,0.669999,0.675214,0.671415
4,backtesting,bert-micro-long,0,0.643162,0.536105,0.643162,0.576582
5,backtesting,prompt-gemma-fewshot-v2-3-1b-it,0,0.621795,0.706001,0.621795,0.613496
6,backtesting,prompt-gemma-fewshot-3-1b-it,0,0.606838,0.727688,0.606838,0.62636
7,backtesting,bert-micro-base,0,0.536325,0.287644,0.536325,0.374458
8,backtesting,prompt-gemma-zeroshot-3-1b-it,0,0.0,0.0,0.0,0.0


## Sekcja 4: Wybór Najlepszego Modelu

Na podstawie powyższych wyników, jako najlepszy model do dalszych kroków wybieramy `roberta-base-3` (założenie na potrzeby tego tutoriala).

In [21]:
os.environ['BEST_MODEL'] = "roberta-base-3"

## Sekcja 5: Oszacowanie Wydajności na Zbiorze Testowym

Wybrany model sprawdzamy na odłożonym wcześniej zbiorze testowym. To da nam ostateczną, bezstronną ocenę jego jakości.

In [22]:
!uv run cli estimate-performance --experiment-id $EXPERIMENT_ID --model-config-name $BEST_MODEL

[32m2025-10-30 10:47:20.878[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-30 10:47:26.833[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mrun_performance_estimation[0m:[36m105[0m - [1mTraining model for performance estimation...[0m
[32m2025-10-30 10:47:26.833[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|█████████████████████████| 4205/4205 [00:00<00:00, 9097.67 examples/s]
[32m2025-10-30 10:47:27.990[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|███████████████████████████| 468/468 [00:00<00:00, 8308.01 examples/s]
[32m2025-10-30 10:47:28.072[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m48[0m - [1mValidation data preprocessing complete

## Sekcja 6: Trening Finalnego Modelu

Mając pewność co do jakości naszego modelu, trenujemy go po raz ostatni na pełnym zbiorze danych (trening + walidacja), aby był gotowy do użycia produkcyjnego.

In [23]:
!uv run cli train-final-model --experiment-id $EXPERIMENT_ID --model-config-name $BEST_MODEL --model-output-dir artifacts/trained_models/

[32m2025-10-30 12:45:24.432[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-30 12:45:28.974[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36mtrain_final_model[0m:[36m139[0m - [1mTraining final model...[0m
[32m2025-10-30 12:45:28.974[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m15[0m - [1mPreprocessing data for RoBERTa model...[0m
Map: 100%|████████████████████████| 5257/5257 [00:00<00:00, 12049.57 examples/s]
[32m2025-10-30 12:45:30.076[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m41[0m - [1mPreprocessing validation data...[0m
Map: 100%|██████████████████████████| 585/585 [00:00<00:00, 12181.24 examples/s]
[32m2025-10-30 12:45:30.145[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m48[0m - [1mValidation data preprocessing complete.[0m
[32m2025-10-30 12:45:30

## Sekcja 7: Predykcja na Nowych Danych

Symulujemy scenariusz produkcyjny: pojawiły się nowe dane i chcemy poznać ich sentyment. Użyjemy do tego naszego generatora danych.

In [24]:
import glob
import os

# Run data generation
!uv run cli data-generate --name new_unseen_data --examples-per-sentiment 5

# Find the latest generated file
list_of_dirs = glob.glob('artifacts/new_data/*')
latest_dir = max(list_of_dirs, key=os.path.getctime)
new_data_file = os.path.join(latest_dir, 'new_unseen_data.csv')
os.environ['NEW_DATA_FILE'] = new_data_file
print(f"New data file: {new_data_file}")

[32m2025-10-30 14:30:08.961[0m | [1mINFO    [0m | [36mdata.generator[0m:[36m__init__[0m:[36m15[0m - [1mInitializing generator with model: google/gemma-3-1b-it[0m
[32m2025-10-30 14:30:15.765[0m | [1mINFO    [0m | [36mdata.generator[0m:[36m__init__[0m:[36m25[0m - [1mModel and tokenizer loaded successfully.[0m
[32m2025-10-30 14:30:15.765[0m | [1mINFO    [0m | [36mdata.generator[0m:[36mgenerate[0m:[36m103[0m - [1mGenerating 5 examples for sentiment: 'positive' in language: en[0m
[32m2025-10-30 14:30:20.888[0m | [1mINFO    [0m | [36mdata.generator[0m:[36mgenerate[0m:[36m107[0m - [1m(1/5) Generated for 'positive': 486356789000000000000000000000000000000000000000000000000000[0m
[32m2025-10-30 14:30:25.279[0m | [1mINFO    [0m | [36mdata.generator[0m:[36mgenerate[0m:[36m107[0m - [1m(2/5) Generated for 'positive': 1000% profit on a new stock, the potential is immense.[0m
[32m2025-10-30 14:30:29.181[0m | [1mINFO    [0m | [36mdata.ge

Używamy naszego finalnego modelu do wykonania predykcji na nowych danych.

In [25]:
from pathlib import Path

FINAL_MODEL_PATH = f"artifacts/trained_models/{os.environ['EXPERIMENT_ID']}_final/{os.environ['BEST_MODEL']}"
PREDICTION_ID = "tutorial_prediction" # Using a fixed ID for reproducibility

# Construct the output path based on the new structure
new_data_path = Path(os.environ['NEW_DATA_FILE'])
timestamp = new_data_path.parent.name
name = new_data_path.stem
model_id = os.environ['BEST_MODEL']

PREDICTION_OUTPUT_FILE = f"artifacts/predictions/new_data/{timestamp}/{name}/{model_id}/{PREDICTION_ID}/predictions.csv"

In [27]:
# Using f-string to pass Python variables to the shell command
!uv run cli predict-new --model-path {FINAL_MODEL_PATH} --input-file {os.environ['NEW_DATA_FILE']} --prediction-id {PREDICTION_ID}

[32m2025-10-30 14:32:06.577[0m | [1mINFO    [0m | [36mmlops.runner[0m:[36m_load_model_from_config[0m:[36m28[0m - [1mLoading model: RobertaModel from models.roberta.model[0m
[32m2025-10-30 14:32:09.827[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m120[0m - [1mRunning RoBERTa predictions...[0m
[32m2025-10-30 14:32:11.730[0m | [1mINFO    [0m | [36mmodels.roberta.pipeline[0m:[36mrun[0m:[36m152[0m - [1mPredictions saved to /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/predictions/new_data/20251030_1431/new_unseen_data/roberta-base-3/tutorial_prediction/predictions.csv[0m
[32m2025-10-30 14:32:11.738[0m | [32m[1mSUCCESS [0m | [36mmlops.runner[0m:[36mpredict_new[0m:[36m172[0m - [32m[1mPredictions saved to /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/predictions/new_data/20251030_1431/new_unseen_data/roberta-base-3/tutorial_prediction/predictions.csv[0m


In [28]:

# Wyświetlenie predykcji
predictions_df = pd.read_csv(PREDICTION_OUTPUT_FILE)
print("Wygenerowane predykcje:")
display(predictions_df)

Wygenerowane predykcje:


Unnamed: 0,Sentence,Sentiment,Predicted_Sentiment
0,4863567890000000000000000000000000000000000000...,positive,neutral
1,"1000% profit on a new stock, the potential is ...",positive,positive
2,Sentiment: positive,positive,positive
3,10% of Apple is currently being traded at a pr...,positive,positive
4,The stock market experienced a dramatic declin...,positive,negative
5,"100,000 is the projected growth of the IPO in Q3",neutral,positive
6,The market's recent decline resulted in a sign...,neutral,negative
7,123456789 is a valid email address,neutral,neutral
8,The Company's Q3 earnings report showed a sign...,neutral,positive
9,Sentiment: negative,negative,negative


## Sekcja 8: Symulacja Oceny Jakości

W realnym scenariuszu dla nowo wygenerowanych danych nie mielibyśmy prawdziwych etykiet. Aby jednak zademonstrować działanie skryptu `evaluate.py`, zasymulujemy ocenę, traktując sentyment, o który prosiliśmy generator, jako "prawdziwą etykietę".

In [29]:
os.environ["NEW_DATA_FILE"]

'artifacts/new_data/20251030_1431/new_unseen_data.csv'

In [32]:
!uv run cli evaluate-prediction --predictions-file {PREDICTION_OUTPUT_FILE} --ground-truth-file {os.environ['NEW_DATA_FILE']} --display

[32m2025-10-30 14:35:51.189[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mevaluate_prediction[0m:[36m86[0m - [1mEvaluating predictions from 'artifacts/predictions/new_data/20251030_1431/new_unseen_data/roberta-base-3/tutorial_prediction/predictions.csv' against 'artifacts/new_data/20251030_1431/new_unseen_data.csv'.[0m
[32m2025-10-30 14:35:51.193[0m | [1mINFO    [0m | [36mmlops.app[0m:[36mevaluate_prediction[0m:[36m88[0m - [1mEvaluation metrics: {'accuracy': 0.42857142857142855, 'precision': 0.45535714285714285, 'recall': 0.42857142857142855, 'f1_score': 0.4188034188034188}[0m
[32m2025-10-30 14:35:51.194[0m | [32m[1mSUCCESS [0m | [36mmlops.app[0m:[36mevaluate_prediction[0m:[36m96[0m - [32m[1mMetrics saved to /Users/witjakuczunpriv/projects/AI_MasterClass_CIONET/artifacts/metrics/20251030_1431/new_unseen_data/roberta-base-3/metrics.csv[0m
   accuracy  precision    recall  f1_score
0  0.428571   0.455357  0.428571  0.418803
