# 02 Feature Build (Optimised) â€” Monthly Partitioning + Representative Sample

This notebook produces a **model-ready churn/cease dataset** aligned to the business objective:

> **Prioritise retention resources by calling customers most likely to place a cease in the next 30 days.**

## What makes this notebook optimised for low-resource PCs
- Uses **DuckDB** to query CSV/Parquet directly (no full in-memory pandas loads).
- Builds a **leakage-safe target**: `target_cease_30d` (cease placed within 30 days after snapshot).
- Builds a **representative sample** (monthly + churn-aware) **after** a strong row reduction step (monthly snapshot + customer sampling).
- Computes features **month-by-month** and exports **partitioned Parquet files**, avoiding Out-Of-Memory errors.
- Uses a **lean feature set** (high-signal, low redundancy) suitable for baseline modelling.

## Outputs
- `../outputs/features_parts/model_ready_sample_YYYY_MM.parquet` (monthly partitions)
- Optional view: `model_ready_sample_all` (reads all partitions via wildcard)


In [1]:
# Core libs
from pathlib import Path
import pandas as pd
import duckdb

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 200)


In [2]:
# Paths (works if notebook is in repo/notebooks or repo root)
cwd = Path.cwd()
repo_dir = cwd.parent if cwd.name.lower() in {"notebook", "notebooks"} else cwd

data_dir = repo_dir / "data"
outputs_dir = repo_dir / "outputs"
features_parts_dir = outputs_dir / "features_parts"
temp_dir = repo_dir / "temp_duckdb"

features_parts_dir.mkdir(parents=True, exist_ok=True)
temp_dir.mkdir(parents=True, exist_ok=True)

# Default project data paths (edit if names differ)
cease_path_default = data_dir / "cease.csv"
calls_path_default = data_dir / "calls.csv"
customer_path_default = data_dir / "customer_info.parquet"
usage_path_default = data_dir / "usage.parquet"

# Fallback to uploaded samples (if running in a hosted notebook)
uploaded = Path("/mnt/data")
cease_path_sample = uploaded / "duck_cease_100_from_csv.csv"
calls_path_sample = uploaded / "duck_calls_100_from_csv.csv"
customer_path_sample = uploaded / "duck_customer_info_100.parquet"
usage_path_sample = uploaded / "duck_usage_100.parquet"

cease_path = cease_path_default if cease_path_default.exists() else cease_path_sample
calls_path = calls_path_default if calls_path_default.exists() else calls_path_sample
customer_path = customer_path_default if customer_path_default.exists() else customer_path_sample
usage_path = usage_path_default if usage_path_default.exists() else usage_path_sample

print("Using files:")
for p in [cease_path, calls_path, customer_path, usage_path]:
    print(" -", p)
print("\nOutputs:", features_parts_dir)
print("Temp dir:", temp_dir)


Using files:
 - c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\data\cease.csv
 - c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\data\calls.csv
 - c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\data\customer_info.parquet
 - c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\data\usage.parquet

Outputs: c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\outputs\features_parts
Temp dir: c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\temp_duckdb


In [12]:
# DuckDB connection + low-memory settings
db_path = repo_dir / "notebooks" / "uk_telecom_new_2.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)

con = duckdb.connect(str(db_path))

# IMPORTANT: low-memory tuning (adjust memory_limit to your PC RAM)
con.execute("SET threads=2")
con.execute("SET preserve_insertion_order=false")
con.execute("SET memory_limit='11GB'")
con.execute(f"PRAGMA temp_directory='{temp_dir.as_posix()}'")

print("DuckDB DB:", db_path)


DuckDB DB: c:\Users\Admin\OneDrive - University of West London\Desktop\AA\TECH_REYAL_project\Talk_talk\Churn_retention_taltalk\notebooks\uk_telecom_new_2.duckdb


## 1) Register raw files as views (no full loads)

In [19]:
con.execute(f"CREATE OR REPLACE VIEW customer_raw AS SELECT * FROM read_parquet('{customer_path.as_posix()}')")
con.execute(f"CREATE OR REPLACE VIEW calls_raw    AS SELECT * FROM read_csv_auto('{calls_path.as_posix()}')")
con.execute(f"CREATE OR REPLACE VIEW cease_raw    AS SELECT * FROM read_csv_auto('{cease_path.as_posix()}')")
con.execute(f"CREATE OR REPLACE VIEW usage_raw    AS SELECT * FROM read_parquet('{usage_path.as_posix()}')")

display(con.execute("DESCRIBE customer_raw").df().head(30))
display(con.execute("DESCRIBE calls_raw").df().head(30))
display(con.execute("DESCRIBE cease_raw").df().head(30))
display(con.execute("DESCRIBE usage_raw").df().head(30))


Unnamed: 0,column_name,column_type,null,key,default,extra
0,unique_customer_identifier,VARCHAR,YES,,,
1,datevalue,DATE,YES,,,
2,contract_status,VARCHAR,YES,,,
3,contract_dd_cancels,BIGINT,YES,,,
4,dd_cancel_60_day,INTEGER,YES,,,
5,ooc_days,INTEGER,YES,,,
6,technology,VARCHAR,YES,,,
7,speed,INTEGER,YES,,,
8,line_speed,DOUBLE,YES,,,
9,sales_channel,VARCHAR,YES,,,


Unnamed: 0,column_name,column_type,null,key,default,extra
0,unique_customer_identifier,VARCHAR,YES,,,
1,event_date,DATE,YES,,,
2,call_type,VARCHAR,YES,,,
3,talk_time_seconds,DOUBLE,YES,,,
4,hold_time_seconds,DOUBLE,YES,,,


Unnamed: 0,column_name,column_type,null,key,default,extra
0,unique_customer_identifier,VARCHAR,YES,,,
1,cease_placed_date,DATE,YES,,,
2,cease_completed_date,VARCHAR,YES,,,
3,reason_description,VARCHAR,YES,,,
4,reason_description_insight,VARCHAR,YES,,,


Unnamed: 0,column_name,column_type,null,key,default,extra
0,unique_customer_identifier,VARCHAR,YES,,,
1,calendar_date,DATE,YES,,,
2,usage_download_mbs,VARCHAR,YES,,,
3,usage_upload_mbs,VARCHAR,YES,,,


## 2) Standardise core tables (types + minimal columns)

In [20]:
# --- Tune for low-memory PCs (adjust as needed) ---
from pathlib import Path

# DuckDB connection + low-memory settings
db_path = repo_dir / "notebooks" / "uk_telecom_new_3.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)

con = duckdb.connect(str(db_path))

# IMPORTANT: low-memory tuning (adjust memory_limit to your PC RAM)
con.execute("SET threads=1")
con.execute("SET preserve_insertion_order=false")
con.execute("SET memory_limit='11GB'")
con.execute(f"PRAGMA temp_directory='{temp_dir.as_posix()}'")



# ============================================================
# 2) Standardise core tables (types + minimal columns)
#    + BIG ROW REDUCTION (monthly snapshot + customer sample)
# ============================================================

# Customer snapshots (keep ONLY high-signal columns for modelling)
con.execute("""
CREATE OR REPLACE VIEW customer_info_std AS
SELECT
    unique_customer_identifier,
    CAST(datevalue AS DATE) AS snapshot_date,
    CAST(contract_status AS VARCHAR) AS contract_status,
    TRY_CAST(ooc_days AS DOUBLE) AS ooc_days,
    TRY_CAST(dd_cancel_60_day AS DOUBLE) AS dd_cancel_60_day,
    TRY_CAST(contract_dd_cancels AS DOUBLE) AS contract_dd_cancels,
    CAST(Technology AS VARCHAR) AS technology,
    CAST(crm_package_name AS VARCHAR) AS crm_package_name,
    CAST(sales_channel AS VARCHAR) AS sales_channel,
    TRY_CAST(tenure_days AS DOUBLE) AS tenure_days
FROM customer_raw
WHERE unique_customer_identifier IS NOT NULL
  AND datevalue IS NOT NULL
""")

# Keep ONLY the latest snapshot per customer per month (removes repeated snapshots)
con.execute("""
CREATE OR REPLACE VIEW customer_info_monthly AS
SELECT *
FROM (
    SELECT
        *,
        date_trunc('month', snapshot_date) AS snapshot_month,
        ROW_NUMBER() OVER (
            PARTITION BY unique_customer_identifier, date_trunc('month', snapshot_date)
            ORDER BY snapshot_date DESC
        ) AS rn
    FROM customer_info_std
)
WHERE rn = 1
""")

# Sample customers (further row reduction)
# NOTE: 20k is intentionally small to reduce compute/memory. Increase gradually if your PC can handle it.
SAMPLE_CUSTOMERS = 20000

con.execute(f"""
CREATE OR REPLACE TABLE sample_customers AS
SELECT DISTINCT unique_customer_identifier
FROM customer_info_monthly
USING SAMPLE {SAMPLE_CUSTOMERS} ROWS
""")

# Final deduped snapshots limited to sampled customers
con.execute("""
CREATE OR REPLACE VIEW customer_info_dedup AS
SELECT c.*
FROM customer_info_monthly c
JOIN sample_customers sc
  ON c.unique_customer_identifier = sc.unique_customer_identifier
""")

# Cease placements (minimal columns) + restrict to sampled customers
con.execute("""
CREATE OR REPLACE VIEW cease_std AS
SELECT
    unique_customer_identifier,
    CAST(cease_placed_date AS DATE) AS cease_placed_date
FROM cease_raw
WHERE unique_customer_identifier IS NOT NULL
  AND cease_placed_date IS NOT NULL
""")

con.execute("""
CREATE OR REPLACE VIEW cease_sample AS
SELECT z.*
FROM cease_std z
JOIN sample_customers sc
  ON z.unique_customer_identifier = sc.unique_customer_identifier
""")

# Calls (minimal columns) + restrict to sampled customers
con.execute("""
CREATE OR REPLACE VIEW calls_std AS
SELECT
    unique_customer_identifier,
    CAST(event_date AS DATE) AS event_date,
    CAST(call_type AS VARCHAR) AS call_type
FROM calls_raw
WHERE unique_customer_identifier IS NOT NULL
  AND event_date IS NOT NULL
""")

con.execute("""
CREATE OR REPLACE VIEW calls_dedup AS
SELECT *
FROM (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY unique_customer_identifier, event_date, coalesce(call_type,'')
           ORDER BY event_date DESC
         ) AS rn
  FROM calls_std
)
WHERE rn = 1
""")

con.execute("""
CREATE OR REPLACE VIEW calls_sample AS
SELECT c.*
FROM calls_dedup c
JOIN sample_customers sc
  ON c.unique_customer_identifier = sc.unique_customer_identifier
""")

# Usage (minimal columns) + restrict to sampled customers
con.execute("""
CREATE OR REPLACE VIEW usage_std AS
SELECT
    unique_customer_identifier,
    CAST(calendar_date AS DATE) AS usage_date,
    TRY_CAST(usage_download_mbs AS DOUBLE) AS usage_download_mbs,
    TRY_CAST(usage_upload_mbs AS DOUBLE) AS usage_upload_mbs
FROM usage_raw
WHERE unique_customer_identifier IS NOT NULL
  AND calendar_date IS NOT NULL
""")

con.execute("""
CREATE OR REPLACE VIEW usage_dedup AS
SELECT *
FROM (
  SELECT
      *,
      ROW_NUMBER() OVER (
          PARTITION BY unique_customer_identifier, usage_date
          ORDER BY usage_date DESC
      ) AS rn
  FROM usage_std
)
WHERE rn = 1
""")

con.execute("""
CREATE OR REPLACE VIEW usage_sample AS
SELECT u.*
FROM usage_dedup u
JOIN sample_customers sc
  ON u.unique_customer_identifier = sc.unique_customer_identifier
""")

print("Counts (reduced dataset):")
for t in ["customer_info_dedup","cease_sample","calls_sample","usage_sample"]:
    print(t, con.execute(f"SELECT COUNT(*) FROM {t}").fetchone()[0])


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Counts (reduced dataset):


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

customer_info_dedup 411740
cease_sample 10302
calls_sample 43252


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

usage_sample 9838742


## 3) Leakage-safe target per snapshot (`target_cease_30d`)

In [21]:
con.execute("""
CREATE OR REPLACE VIEW snapshot_target AS
SELECT
  c.*,
  CASE WHEN EXISTS (
    SELECT 1
    FROM cease_sample z
    WHERE z.unique_customer_identifier = c.unique_customer_identifier
      AND z.cease_placed_date > c.snapshot_date
      AND z.cease_placed_date <= c.snapshot_date + INTERVAL 30 DAY
  ) THEN 1 ELSE 0 END AS target_cease_30d
FROM customer_info_dedup c
""")

display(con.execute("""
SELECT target_cease_30d, COUNT(*) AS n, ROUND(100.0*AVG(target_cease_30d),2) AS churn_rate_pct
FROM snapshot_target
GROUP BY target_cease_30d
ORDER BY target_cease_30d
""").df())


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,target_cease_30d,n,churn_rate_pct
0,0,401974,0.0
1,1,9766,100.0


## 4) Representative sample (monthly + churn-aware)

In [22]:
# Sampling parameters (edit)
N_PER_MONTH_POS = 100   # churners per month
N_PER_MONTH_NEG = 400   # non-churners per month

con.execute(f"""
CREATE OR REPLACE TABLE snapshot_sample AS
WITH base AS (
  SELECT
    unique_customer_identifier,
    snapshot_date,
    target_cease_30d,
    date_trunc('month', snapshot_date) AS snapshot_month
  FROM snapshot_target
),
ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY snapshot_month, target_cease_30d
           ORDER BY random()
         ) AS rn
  FROM base
)
SELECT unique_customer_identifier, snapshot_date, target_cease_30d, snapshot_month
FROM ranked
WHERE (target_cease_30d = 1 AND rn <= {N_PER_MONTH_POS})
   OR (target_cease_30d = 0 AND rn <= {N_PER_MONTH_NEG})
""")

sample_dist = con.execute("""
SELECT snapshot_month, target_cease_30d, COUNT(*) AS n
FROM snapshot_sample
GROUP BY 1,2
ORDER BY 1,2
""").df()
display(sample_dist)

print("Total sample rows:", con.execute("SELECT COUNT(*) FROM snapshot_sample").fetchone()[0])
print("Sample churn rate:", round(con.execute("SELECT AVG(target_cease_30d) FROM snapshot_sample").fetchone()[0]*100,2), "%")


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,snapshot_month,target_cease_30d,n
0,2022-08-01,0,400
1,2022-08-01,1,100
2,2022-09-01,0,400
3,2022-09-01,1,100
4,2022-10-01,0,400
5,2022-10-01,1,100
6,2022-11-01,0,400
7,2022-11-01,1,100
8,2022-12-01,0,400
9,2022-12-01,1,100


Total sample rows: 12900
Sample churn rate: 19.38 %


## 5) Monthly feature build + export (partitioned Parquet)

In [23]:
months = con.execute("""
SELECT DISTINCT snapshot_month
FROM snapshot_sample
ORDER BY snapshot_month
""").df()

months_list = [pd.to_datetime(x).date() for x in months["snapshot_month"].tolist()]
print("Months in sample:", len(months_list))
print("First months:", months_list[:10])


Months in sample: 26
First months: [datetime.date(2022, 8, 1), datetime.date(2022, 9, 1), datetime.date(2022, 10, 1), datetime.date(2022, 11, 1), datetime.date(2022, 12, 1), datetime.date(2023, 1, 1), datetime.date(2023, 2, 1), datetime.date(2023, 3, 1), datetime.date(2023, 4, 1), datetime.date(2023, 5, 1)]


In [24]:
for m in months_list:
    out_file = features_parts_dir / f"model_ready_sample_{m.strftime('%Y_%m')}.parquet"
    print("Exporting:", out_file.name)

    con.execute(f"""
    COPY (
      WITH s AS (
        SELECT unique_customer_identifier, snapshot_date, target_cease_30d
        FROM snapshot_sample
        WHERE snapshot_month = DATE '{m}'
      ),
      cust AS (
        SELECT
          st.unique_customer_identifier,
          st.snapshot_date,
          st.target_cease_30d,
          st.contract_status,
          st.technology,
          st.crm_package_name,
          st.sales_channel,
          st.ooc_days,
          st.tenure_days,
          st.dd_cancel_60_day,
          st.contract_dd_cancels,
          CASE WHEN st.ooc_days > 0 THEN 1 ELSE 0 END AS is_out_of_contract,
          CASE WHEN st.ooc_days BETWEEN -30 AND 0 THEN 1 ELSE 0 END AS is_near_contract_end,
        FROM snapshot_target st
        JOIN s
          ON st.unique_customer_identifier = s.unique_customer_identifier
         AND st.snapshot_date = s.snapshot_date
      ),
      call_agg AS (
        SELECT
          s.unique_customer_identifier,
          s.snapshot_date,
          COUNT(*) FILTER (
            WHERE c.event_date > s.snapshot_date - INTERVAL 30 DAY
              AND c.event_date <= s.snapshot_date
          ) AS calls_30d,
          COUNT(*) FILTER (
            WHERE c.event_date > s.snapshot_date - INTERVAL 30 DAY
              AND c.event_date <= s.snapshot_date
              AND lower(coalesce(c.call_type,'')) LIKE '%loyalty%'
          ) AS loyalty_calls_30d,
          DATE_DIFF('day',
            MAX(c.event_date) FILTER (WHERE c.event_date <= s.snapshot_date),
            s.snapshot_date
          ) AS days_since_last_call
        FROM s
        LEFT JOIN calls_sample c
          ON s.unique_customer_identifier = c.unique_customer_identifier
         AND c.event_date <= s.snapshot_date
         AND c.event_date > s.snapshot_date - INTERVAL 30 DAY
        GROUP BY 1,2
      ),
      usage_agg AS (
        SELECT
          s.unique_customer_identifier,
          s.snapshot_date,
          SUM(coalesce(u.usage_download_mbs,0)+coalesce(u.usage_upload_mbs,0)) FILTER (
            WHERE u.usage_date > s.snapshot_date - INTERVAL 30 DAY
              AND u.usage_date <= s.snapshot_date
          ) AS usage_30d_total_mb,
          SUM(coalesce(u.usage_download_mbs,0)+coalesce(u.usage_upload_mbs,0)) FILTER (
            WHERE u.usage_date > s.snapshot_date - INTERVAL 60 DAY
              AND u.usage_date <= s.snapshot_date - INTERVAL 30 DAY
          ) AS usage_prev_30d_total_mb,
          DATE_DIFF('day',
            MAX(u.usage_date) FILTER (WHERE u.usage_date <= s.snapshot_date),
            s.snapshot_date
          ) AS days_since_last_usage
        FROM s
        LEFT JOIN usage_sample u
          ON s.unique_customer_identifier = u.unique_customer_identifier
         AND u.usage_date <= s.snapshot_date
         AND u.usage_date > s.snapshot_date - INTERVAL 60 DAY
        GROUP BY 1,2
      )
      SELECT
        cust.unique_customer_identifier,
        cust.snapshot_date,
        cust.target_cease_30d,
        cust.contract_status,
        cust.technology,
        cust.crm_package_name,
        cust.sales_channel,
        cust.ooc_days,
        cust.tenure_days,
        cust.dd_cancel_60_day,
        cust.contract_dd_cancels,
        cust.is_out_of_contract,
        cust.is_near_contract_end,

        coalesce(call_agg.calls_30d,0) AS calls_30d,
        coalesce(call_agg.loyalty_calls_30d,0) AS loyalty_calls_30d,
        call_agg.days_since_last_call,

        coalesce(usage_agg.usage_30d_total_mb,0) AS usage_30d_total_mb,
        coalesce(usage_agg.usage_prev_30d_total_mb,0) AS usage_prev_30d_total_mb,
        CASE
          WHEN usage_agg.usage_prev_30d_total_mb IS NULL OR usage_agg.usage_prev_30d_total_mb = 0 THEN NULL
          ELSE (usage_agg.usage_30d_total_mb - usage_agg.usage_prev_30d_total_mb) / usage_agg.usage_prev_30d_total_mb
        END AS usage_change_pct_30d,
        usage_agg.days_since_last_usage

      FROM cust
      LEFT JOIN call_agg
        ON cust.unique_customer_identifier = call_agg.unique_customer_identifier
       AND cust.snapshot_date = call_agg.snapshot_date
      LEFT JOIN usage_agg
        ON cust.unique_customer_identifier = usage_agg.unique_customer_identifier
       AND cust.snapshot_date = usage_agg.snapshot_date
    )
    TO '{out_file.as_posix()}'
    (FORMAT PARQUET)
    """)

Exporting: model_ready_sample_2022_08.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2022_09.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2022_10.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2022_11.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2022_12.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_01.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_02.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_03.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_04.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_05.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_06.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_07.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_08.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_09.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_10.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_11.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2023_12.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_01.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_02.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_03.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_04.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_05.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_06.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_07.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_08.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Exporting: model_ready_sample_2024_09.parquet


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))