This code uses best models from the supplmenentary file and adds on tree based approaches to produce a demo of PulseLab.

> **Why separate notebooks?**
For easy reproducability and less computation if deployment needs to be done.

> **What has been used for deployment?**
Streamlit as the frontend, python running models in the backend, deployed on localhost and exposed on a tunnel on port 8501 through ngrok or localtunnel (we ran out of tunnels in 3 accounts for ngrok, any inconsistency in key naming can be attributed to this - but we can ensure that the key naming in colab secrets and referencing has been consistent)

In [31]:
%%writefile requirements.txt
numpy pandas matplotlib seaborn scikit-learn scipy statsmodels pmdarima xgboost lightgbm catboost tensorflow>=2.11 streamlit streamlit-lottie plotly altair requests fpdf pillow kaleido optuna pyngrok tqdm

Writing requirements.txt


In [1]:
# We installed and tested the required packages for the app and training pipeline.
# These were installed so the notebook can run in Colab and support Streamlit, ML libraries, Optuna, and ngrok.
!pip install --upgrade pip
# Core libs
!pip install numpy pandas matplotlib seaborn scikit-learn scipy statsmodels pmdarima
# Boosters & ML
!pip install xgboost lightgbm catboost
# Deep learning
!pip install 'tensorflow>=2.11'
# Streamlit & UI extras
!pip install streamlit streamlit-lottie plotly altair requests fpdf pillow kaleido
# Optimization libraries
!pip install optuna
# ngrok helper
!pip install pyngrok
# tqdm for progress bars
!pip install tqdm

print('Installed packages. This step was done and tested in our environment.')

Collecting pip
  Downloading pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.2-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.2
Collecting pmdarima
  Downloading pmdarima-2.0.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata (7.8 kB)
Downloading pmdarima-2.0.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m32.0 MB/s[0m  [33m0:00:00[0m
[?25hInstalling collected packages: pmdarima
Successfully installed pmdarima-2.0.4
Collecting catboost
  Downloading catboost-1.2.8-cp312-cp312-manylinux2014_x86_64.wh

In [2]:
# We validated the dataset presence and fetched Colab secrets (NewsAPI, ngrok auth) via google.colab.userdata.
# This was done so the Streamlit app can read the NewsAPI key and ngrok auth token without embedding secrets in code.
FILENAME = 'BTC-USD_2022-06-30_to_2025-09-30.csv' # for easy referencing

import os, pandas as pd
print('Looking for dataset file:', FILENAME)
print('Exists:', os.path.exists(FILENAME))
if os.path.exists(FILENAME):
    df = pd.read_csv(FILENAME)
    print('Rows, cols:', df.shape)
    display(df.head(3))
else:
    print('Dataset not found. Upload to Colab or change FILENAME variable.')

# Fetch secrets from Colab userdata if available
try:
    from google.colab import userdata
    news_api_secret = userdata.get('news_api')
    ngrok_auth_secret = userdata.get('ngrok_auth')
    if news_api_secret:
        os.environ['news_api'] = news_api_secret
        print('NewsAPI key loaded into environment from Colab secrets: news_api')
    else:
        print('No NewsAPI secret found in Colab userdata under name "news_api".')
    if ngrok_auth_secret:
        os.environ['ngrok_auth'] = ngrok_auth_secret
        print('ngrok auth token loaded into environment from Colab secrets: ngrok_auth')
    else:
        print('No ngrok auth token found in Colab userdata under name "ngrok_auth".')
except Exception as e:
    print('Could not fetch Colab userdata secrets:', e)

Looking for dataset file: BTC-USD_2022-06-30_to_2025-09-30.csv
Exists: True
Rows, cols: (1189, 6)


Unnamed: 0,Date,Open,High,Low,Close,Volume
0,22-06-30,20108.3125,20141.160156,18729.65625,19784.726562,26267239923
1,22-07-01,19820.470703,20632.671875,19073.708984,19269.367188,30767551159
2,22-07-02,19274.835938,19371.748047,19027.082031,19242.255859,18100418740


NewsAPI key loaded into environment from Colab secrets: news_api
ngrok auth token loaded into environment from Colab secrets: ngrok_auth


In [27]:
%%writefile app.py
# PulseLab — Finalized app.py
# Notes:
# - We pre-run a quick set of models (no heavy optimization) to populate the leaderboard quickly. But full models can be run in the Model Arena.
#   This was done because the models took a lot of time to run at once which increased wait time for the webapp.
# - The NewsAPI key is read silently from Colab secrets if present: google.colab.userdata.get('news_api'), then from NEWS_API env var.
# - PDF export includes plots and a short model summary & parameters.

import os
import time
import tempfile
import warnings # hides warnings in streamlit app only, not here
import json
import requests
import numpy as np
import pandas as pd
import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
from fpdf import FPDF # for export thing
from io import BytesIO

warnings.filterwarnings("ignore")

# --- ML libraries (try/except to allow partial installs) ---
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler

try:
    from catboost import CatBoostRegressor
    _CATBOOST = True
except Exception:
    _CATBOOST = False
try:
    import xgboost as xgb
    _XGB = True
except Exception:
    _XGB = False
try:
    import lightgbm as lgb
    _LGB = True
except Exception:
    _LGB = False
try:
    import optuna
    _OPTUNA = True
except Exception:
    _OPTUNA = False
try:
    import tensorflow as tf
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM, SimpleRNN, Dropout
    from tensorflow.keras.callbacks import EarlyStopping
    _TF = True
except Exception:
    _TF = False
try:
    from statsmodels.tsa.holtwinters import ExponentialSmoothing
    _STATS = True
except Exception:
    _STATS = False

# Lottie support
try:
    from streamlit_lottie import st_lottie
    _LOTTIE = True
except Exception:
    _LOTTIE = False

# -------------------------
# Globals & constants
# -------------------------
SEED = 42
np.random.seed(SEED) # for easy reporducability
DEFAULT_FILENAME = "BTC-USD_2022-06-30_to_2025-09-30.csv"
H_DEFAULT = 7

st.set_page_config(page_title="PulseLab — Crypto Forecast Studio", layout="wide", page_icon="⚡")

# --- Styling CSS (improve legibility & contrast across pages) ---
st.markdown("""
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;700;800&display=swap" rel="stylesheet">
<style>
html, body, [class*="css"] { font-family: 'Inter', sans-serif; }
.stApp { background: linear-gradient(180deg,#ffffff,#f7fbff); color: #0f1724; }
.hero { background: linear-gradient(90deg,#ffffff,#fbf8ff); border-radius:14px; padding:18px; box-shadow: 0 12px 40px rgba(15,23,42,0.06); margin-bottom:14px; }
.section { background:white; border-radius:12px; padding:14px; box-shadow: 0 8px 28px rgba(15,23,42,0.04); }
.block { border-left:6px solid #7c3aed; padding:12px; border-radius:10px; margin-bottom:12px; }
.kpi { font-size:1.1rem; font-weight:700; color:#0f1724; }
.small { color:#6b7280; font-size:0.95rem; }
.button-primary { background:#7c3aed;color:white;border-radius:8px;padding:8px 12px;text-decoration:none; }
</style>
""", unsafe_allow_html=True)

# -------------------------
# Utility functions (preserved functionality) - defining our metrics, easy referencing and calculation later for leaderboard etc
# -------------------------
def mape(y_true, y_pred):
    y_true = np.array(y_true).astype(float)
    y_pred = np.array(y_pred).astype(float)
    eps = 1e-8
    return float(np.mean(np.abs((y_true - y_pred) / (np.abs(y_true) + eps))) * 100.0)

def mae(y_true, y_pred):
    return float(mean_absolute_error(y_true, y_pred))

def rmse(y_true, y_pred):
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))

def safe_numeric_df(df):
    num = df.select_dtypes(include=[np.number]).copy()
    num = num.fillna(method='ffill').fillna(0)
    return num

def basic_preprocess(df, target_col=None):
    df = df.copy()
    if target_col is None:
        cands = [c for c in df.columns if c.lower() in ['close','adj close','adj_close','price','close_usd','close*']]
        target_col = cands[0] if cands else df.columns[0]
    df[target_col] = pd.to_numeric(df[target_col], errors='coerce')
    df = df.dropna(subset=[target_col])
    df['lag_1'] = df[target_col].shift(1)
    df['lag_7'] = df[target_col].shift(7)
    df['rmean_7'] = df[target_col].rolling(7, min_periods=1).mean()
    df['rstd_7'] = df[target_col].rolling(7, min_periods=1).std().fillna(0)
    df = df.dropna()
    return df, target_col

def time_series_split(df, features, target, H_local):
    split_idx = max(1, len(df) - int(H_local))
    train = df.iloc[:split_idx]
    test = df.iloc[split_idx:]
    X_train = safe_numeric_df(train[features])
    X_test  = safe_numeric_df(test[features])
    y_train = train[target].fillna(method='ffill').fillna(0)
    y_test  = test[target].fillna(method='ffill').fillna(0)
    return X_train, X_test, y_train, y_test

def naive_last(X_train, X_test, y_train, y_test):
    last = y_train.iloc[-1]
    return np.repeat(last, len(y_test))

# -------------------------
# Model runner
# Returns dict with name, pred, train_pred, model, mape, train_mape, params (added to leaderboard 'lb'later below)
# -------------------------
def run_model(model_name, X_train, y_train, X_test, y_test, optimizer="None", optuna_trials=20, quick_mode=False):
    try:
        # RANDOM FOREST
        if model_name == "RandomForest":
            if optimizer == "GridSearch":
                param_grid = {'n_estimators':[100,200], 'max_depth':[5,10,None], 'min_samples_split':[2,5]} # this grid forms the search space
                tscv = TimeSeriesSplit(n_splits=3)
                gs = GridSearchCV(RandomForestRegressor(random_state=SEED), param_grid, cv=tscv, scoring='neg_mean_absolute_error', n_jobs=-1)
                gs.fit(X_train.values, y_train.values)
                model = gs.best_estimator_
            else:
                model = RandomForestRegressor(n_estimators=200, random_state=SEED)
                model.fit(X_train, y_train)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"RandomForest","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # CATBOOST
        if model_name == "CatBoost":
            if not _CATBOOST:
                return {"error":"CatBoost not installed"}
            model = CatBoostRegressor(iterations=300, learning_rate=0.05, depth=6, random_seed=SEED, verbose=0)
            model.fit(X_train, y_train, eval_set=(X_test,y_test), early_stopping_rounds=30, verbose=False)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"CatBoost","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_all_params()}

        # XGBOOST
        if model_name == "XGBoost":
            if not _XGB:
                return {"error":"XGBoost not installed"}
            model = xgb.XGBRegressor(n_estimators=300, learning_rate=0.05, random_state=SEED, verbosity=0)
            model.fit(X_train, y_train, eval_set=[(X_test,y_test)], verbose=False)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"XGBoost","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # LIGHTGBM
        if model_name == "LightGBM":
            if not _LGB:
                return {"error":"LightGBM not installed"}
            model = lgb.LGBMRegressor(n_estimators=300, learning_rate=0.05, random_state=SEED)
            model.fit(X_train, y_train)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"LightGBM","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # ANN_MLP
        if model_name == "ANN_MLP":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            scaler_X = StandardScaler(); scaler_y = StandardScaler()
            Xtr = scaler_X.fit_transform(X_train); Xte = scaler_X.transform(X_test)
            ytr_orig = y_train.values.reshape(-1,1)
            ytr = scaler_y.fit_transform(ytr_orig).flatten()
            model = Sequential([Dense(64, activation='relu', input_shape=(Xtr.shape[1],)), Dropout(0.2), Dense(32, activation='relu'), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(Xtr, ytr, validation_split=0.1, epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(Xte).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(Xtr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"ANN_MLP","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(y_test,pred),"train_mape":mape(y_train,train_pred),"params":{"architecture":"Dense64-32","optimizer":"adam","epochs":None}}

        # LSTM
        if model_name == "LSTM":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            lookback=14
            scaler_X = StandardScaler(); scaler_y = StandardScaler() #required here but not for trees as in trees, wherver u split the scale doesnt matter split remains same
            Xt = scaler_X.fit_transform(pd.concat([X_train, X_test]))
            Xtr = Xt[:len(X_train)]; Xte = Xt[len(X_train):]
            y_train_vals = y_train.values; y_test_vals = y_test.values
            def build_sequences(X_all, y_all, lb):
                Xs, ys = [], []
                for i in range(len(X_all)-lb):
                    Xs.append(X_all[i:(i+lb)])
                    ys.append(y_all[i+lb])
                return np.array(Xs), np.array(ys)
            seq_X_tr, seq_y_tr_orig = build_sequences(Xtr, y_train_vals, lookback)
            combined = np.vstack([Xtr[-lookback:], Xte])
            combined_y = np.hstack([y_train_vals[-lookback:], y_test_vals])
            seq_X_te, seq_y_te_orig = build_sequences(combined, combined_y, lookback)
            if len(seq_X_tr)==0 or len(seq_X_te)==0:
                return {"error":"Not enough data for LSTM"}
            seq_y_tr_scaled = scaler_y.fit_transform(seq_y_tr_orig.reshape(-1,1)).flatten()
            seq_y_te_scaled = scaler_y.transform(seq_y_te_orig.reshape(-1,1)).flatten()
            model = Sequential([LSTM(64, input_shape=(seq_X_tr.shape[1], seq_X_tr.shape[2])), Dropout(0.2), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(seq_X_tr, seq_y_tr_scaled, validation_data=(seq_X_te, seq_y_te_scaled), epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(seq_X_te).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(seq_X_tr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"LSTM","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(seq_y_te_orig,pred),"train_mape":mape(seq_y_tr_orig,train_pred),"params":{"lookback":lookback,"layers":"LSTM64"}}

        # SimpleRNN
        if model_name == "SimpleRNN":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            lookback=14
            scaler_X = StandardScaler(); scaler_y = StandardScaler()
            Xt = scaler_X.fit_transform(pd.concat([X_train, X_test]))
            Xtr = Xt[:len(X_train)]; Xte = Xt[len(X_train):]
            y_train_vals = y_train.values; y_test_vals = y_test.values
            def build_sequences(X_all, y_all, lb):
                Xs, ys = [], []
                for i in range(len(X_all)-lb):
                    Xs.append(X_all[i:(i+lb)])
                    ys.append(y_all[i+lb])
                return np.array(Xs), np.array(ys)
            seq_X_tr, seq_y_tr_orig = build_sequences(Xtr, y_train_vals, lookback)
            combined = np.vstack([Xtr[-lookback:], Xte])
            combined_y = np.hstack([y_train_vals[-lookback:], y_test_vals])
            seq_X_te, seq_y_te_orig = build_sequences(combined, combined_y, lookback)
            if len(seq_X_tr)==0 or len(seq_X_te)==0:
                return {"error":"Not enough data for SimpleRNN"}
            seq_y_tr_scaled = scaler_y.fit_transform(seq_y_tr_orig.reshape(-1,1)).flatten()
            seq_y_te_scaled = scaler_y.transform(seq_y_te_orig.reshape(-1,1)).flatten()
            model = Sequential([SimpleRNN(64, input_shape=(seq_X_tr.shape[1], seq_X_tr.shape[2])), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(seq_X_tr, seq_y_tr_scaled, validation_data=(seq_X_te, seq_y_te_scaled), epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(seq_X_te).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(seq_X_tr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"SimpleRNN","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(seq_y_te_orig,pred),"train_mape":mape(seq_y_tr_orig,train_pred),"params":{"lookback":lookback,"layers":"SimpleRNN64"}}

        # HoltWinters
        if model_name == "HoltWinters":
            if not _STATS:
                return {"error":"statsmodels not installed"}
            y = pd.concat([y_train, y_test])
            train = y.iloc[:len(y_train)]; test = y.iloc[len(y_train):]
            best = {"mape": np.inf}
            for a in [0.1,0.2]:
                for b in [0.01,0.05]:
                    try:
                        holt = ExponentialSmoothing(train, trend='add', seasonal=None, initialization_method='heuristic').fit(smoothing_level=a, smoothing_trend=b, optimized=False)
                        fc = holt.forecast(len(test))
                        sc = mape(test, fc)
                        if sc < best['mape']:
                            best = {'mape':sc,'pred':fc.values,'train_pred':holt.fittedvalues.values,'params':{'smoothing_level':a,'smoothing_trend':b}}
                    except:
                        pass
            if best['mape'] < np.inf:
                return {"name":"HoltWinters","model":None,"pred":np.array(best['pred']),"train_pred":np.array(best['train_pred']),"mape":best['mape'],"train_mape":mape(train,best['train_pred']),"params":best.get('params',{})}

    except Exception as e:
        return {"error": str(e)}
    return {"error":"Unknown model"}

# -------------------------
# Helper: fetch news key from Colab secrets or env var and fetch headlines automatically
# -------------------------
def get_news_key():
    key = os.environ.get("NEWS_API") or os.environ.get("news_api") or None
    if not key:
        # try Colab userdata (silent). In Colab this should return the stored secret without printing.
        try:
            from google.colab import userdata
            key = userdata.get('news_api')
        except Exception:
            key = None
    return key

def fetch_news_articles(n=10):
    key = get_news_key()
    if not key:
        return []  # no key
    try:
        url = "https://newsapi.org/v2/everything"
        params = {"q":"bitcoin OR crypto OR BTC", "language":"en", "pageSize":n, "sortBy":"publishedAt", "apiKey": key}
        r = requests.get(url, params=params, timeout=10); r.raise_for_status()
        return r.json().get("articles", [])
    except Exception:
        return []

# Fetch news once at app start if not already stored
if "news_articles" not in st.session_state:
    st.session_state["news_articles"] = fetch_news_articles(10)

# -------------------------
# Utility: plot creation and save for PDF
# -------------------------
def plot_preds_vs_actual(actual_series, pred_series, title):
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=actual_series.index, y=actual_series.values, mode="lines+markers", name="Actual"))
    fig.add_trace(go.Scatter(x=pred_series.index, y=pred_series.values, mode="lines+markers", name="Predicted"))
    fig.update_layout(title=title, xaxis_title="Index", yaxis_title="Price")
    # return fig and png bytes
    img_bytes = fig.to_image(format="png", width=900, height=400, scale=1)
    return fig, img_bytes

# -------------------------
# Page navigation and default page
# -------------------------
# We use session_state navigation so CTA button can jump to Model Arena.
if "page" not in st.session_state:
    st.session_state["page"] = "Home"

# Sidebar nav
pages = ["Home", "Data", "Model Arena", "Leaderboard", "News", "Export", "About"]
selected = st.sidebar.radio("Navigate", pages, index=pages.index(st.session_state["page"]) if st.session_state["page"] in pages else 0)
st.session_state["page"] = selected

# -------------------------
# Helper: run quick_all_models to populate the leaderboard fast
# This runs each model once with default settings (no heavy optimization - doesnt choose between our optuna and grid search thingy, basic run, used to populate leaderboard and not give appearance of an empty webpage, in demo - lets run 1-2 full models for leaderboard - 1 overfitting and 1 normal).
# -------------------------
def quick_all_models(df, target_col, features, H_local=H_DEFAULT):
    X_train, X_test, y_train, y_test = time_series_split(df, features, target_col, H_local)
    model_list = ["RandomForest","XGBoost","CatBoost","LightGBM","HoltWinters","ANN_MLP","LSTM","SimpleRNN"]
    results = []
    models_run = {}
    naive_preds = naive_last(X_train, X_test, y_train, y_test)
    results.append({"Model":"NaiveLast","MAPE":mape(y_test, naive_preds),"MAE":mae(y_test,naive_preds),"RMSE":rmse(y_test,naive_preds),"Train_MAPE":mape(y_train,np.repeat(y_train.iloc[-1], len(y_train)))})
    for m in model_list:
        out = run_model(m, X_train, y_train, X_test, y_test, optimizer="None", quick_mode=True)
        if out is None or "error" in out:
            continue
        preds = out.get("pred"); train_preds = out.get("train_pred")
        test_m = float(out.get("mape")) if out.get("mape") is not None else (mape(y_test,preds) if preds is not None else np.nan)
        train_m = float(out.get("train_mape")) if out.get("train_mape") is not None else (mape(y_train,train_preds) if train_preds is not None else np.nan)
        mae_v = mae(y_test,preds) if preds is not None else np.nan
        rmse_v = rmse(y_test,preds) if preds is not None else np.nan
        # simple overfitting heuristic
        overfit = False
        if not np.isnan(train_m) and train_m + 5.0 < test_m:  # absolute gap heuristic
            overfit = True
        models_run[out.get("name",m)] = {"out":out,"overfit":overfit,"MAPE":test_m,"MAE":mae_v,"RMSE":rmse_v,"Train_MAPE":train_m}
        if not overfit:
            results.append({"Model":out.get("name",m),"MAPE":test_m,"MAE":mae_v,"RMSE":rmse_v,"Train_MAPE":train_m})
    lb = pd.DataFrame(results).sort_values("MAPE").reset_index(drop=True)
    return lb, models_run

# -------------------------
# Pages - streamlit app defined here! @_@
# -------------------------
# HOME
if st.session_state["page"] == "Home":
    # Hero
    st.markdown('<div class="hero"><div style="display:flex;align-items:center;gap:16px"><div><h1 style="margin:0"> ⚡ PulseLab — Crypto Forecast Studio ⚡</h1><div class="small">Models, live context and export — designed for stakeholders.</div></div></div></div>', unsafe_allow_html=True)
    # Live BTC tile
    try:
        r = requests.get("https://api.coingecko.com/api/v3/simple/price", params={"ids":"bitcoin","vs_currencies":"usd","include_24hr_change":"true"}, timeout=6)
        j = r.json().get("bitcoin", {})
        price = j.get("usd"); ch = j.get("usd_24h_change")
    except Exception:
        price=None; ch=None
    cols = st.columns([1,1,1,1])
    cols[0].markdown(f'<div class="section"><div class="small">Live BTC</div><div class="kpi">${price:,.2f}</div></div>' if price else '<div class="section"><div class="small">Live BTC</div><div class="kpi">N/A</div></div>', unsafe_allow_html=True)
    cols[1].markdown(f'<div class="section"><div class="small">24h change</div><div class="kpi">{ch:.2f}%</div></div>' if ch else '<div class="section"><div class="small">24h change</div><div class="kpi">N/A</div></div>', unsafe_allow_html=True)
    cols[2].markdown('<div class="section"><div class="small">Model Arena</div><div class="kpi">Run experiments</div></div>', unsafe_allow_html=True)
    cols[3].markdown('<div class="section"><div class="small">News</div><div class="kpi">Latest headlines</div></div>', unsafe_allow_html=True)

    st.markdown("")
    # Why blocks (styled)
    left, right = st.columns([2,1])
    with left:
        st.markdown('<div class="block"><div style="font-weight:700;">Why cryptocurrency?</div><div class="small">Crypto is a novel asset class that combines technology and finance. It enables permissionless value transfer, high liquidity and programmability — but also introduces volatility and regulatory risk. We built PulseLab to make data-driven decisions easier.</div></div>', unsafe_allow_html=True)
        st.markdown('<div class="block"><div style="font-weight:700;">Why Bitcoin?</div><div class="small">Bitcoin (2009) is the most recognized crypto asset. BTC often leads market sentiment; forecasting BTC can provide broader market signals for allocation and risk management.</div></div>', unsafe_allow_html=True)
        st.markdown('<div class="block"><div style="font-weight:700;">How PulseLab helps</div><div class="small">We run classical and modern ML models, surface a leaderboard (MAPE primary), and flag potential overfitting. Exportable reports and news context help stakeholders act.</div></div>', unsafe_allow_html=True)
    with right:
        # Lottie if available (safely)
        if _LOTTIE:
            try:
                lurl = "https://cdn.dribbble.com/userupload/23393056/file/original-5fdb1d394522b222db5342a239d74f86.gif"
                st_lottie(requests.get(lurl, timeout=6).json(), height=200)
            except Exception:
                st.image("https://cdn.dribbble.com/userupload/23393056/file/original-5fdb1d394522b222db5342a239d74f86.gif", width=400)
        else:
            st.image("https://cdn.dribbble.com/userupload/23393056/file/original-5fdb1d394522b222db5342a239d74f86.gif", width=400)
        # CTA button (works: sets page to Model Arena)
        if st.button("🚀 Explore the Model Arena"):
            st.session_state["page"] = "Model Arena"
            # Force a rerun so sidebar reflects change
            st.experimental_rerun()

# DATA
elif st.session_state["page"] == "Data":
    st.markdown('<div class="section"><h3>Data — Upload or choose</h3><div class="small">Upload CSV (must contain price/close column). We keep the notebook preprocessing for features.</div></div>', unsafe_allow_html=True)
    uploaded = st.file_uploader("Upload CSV (optional)", type=["csv"])
    if uploaded:
        df = pd.read_csv(uploaded); st.session_state["dataset_df"] = df; st.success("Dataset loaded into session.")
    else:
        fname = st.text_input("Filename in working dir", value=DEFAULT_FILENAME)
        df = pd.read_csv(fname) if fname and os.path.exists(fname) else st.session_state.get("dataset_df", None)
        if df is None:
            st.info("No dataset found; upload or place file in working dir.")
    if df is not None:
        st.subheader("Preview & diagnostics")
        st.dataframe(df.head(8))
        st.dataframe(pd.DataFrame({"dtype":df.dtypes, "missing":df.isnull().sum()}), width=800)

# MODEL ARENA
elif st.session_state["page"] == "Model Arena":
    st.markdown('<div class="section"><h3>Model Arena — run experiments</h3><div class="small">We pre-run a quick set of all models (fast defaults) automatically to recommend a best candidate. For deeper tuning, run optimizers per model below.</div></div>', unsafe_allow_html=True)
    # get dataset
    df = None
    if "dataset_df" in st.session_state:
        df = st.session_state["dataset_df"]
    else:
        uploaded = st.file_uploader("Upload CSV for modeling (optional)", type=["csv"], key="arena_upload")
        if uploaded:
            df = pd.read_csv(uploaded); st.session_state["dataset_df"] = df
        else:
            fname = st.text_input("Dataset filename (in working dir)", value=DEFAULT_FILENAME)
            df = pd.read_csv(fname) if fname and os.path.exists(fname) else None
    if df is None:
        st.info("Provide dataset for modeling.")
    else:
        df_p, tgt = basic_preprocess(df)
        st.write("Preprocessed rows:", df_p.shape[0])
        features = [c for c in df_p.columns if c != tgt]
        selected_features = st.multiselect("Select features for modeling", options=features, default=features[:6])
        if not selected_features:
            st.warning("Pick features.")
            st.stop()
        H_local = st.number_input("Test horizon (H)", min_value=1, max_value=max(1,len(df_p)-1), value=H_DEFAULT)
        # Option: pre-run quick leaderboard automatically (only run once per session)
        if "quick_leaderboard" not in st.session_state:
            with st.spinner("Running fast comparisons across all models (quick defaults)..."):
                lb, models_run = quick_all_models(df_p, tgt, selected_features, H_local)
                st.session_state["leaderboard"] = lb
                st.session_state["models_run"] = models_run
                st.session_state["quick_leaderboard"] = True
            st.success("Quick leaderboard populated. You can run targeted optimizations below.")
        # show quick leaderboard
        lb = st.session_state.get("leaderboard", None)
        if lb is not None:
            st.subheader("Quick leaderboard (MAPE, quick defaults)")
            st.dataframe(lb.style.format({"MAPE":"{:.3f}","MAE":"{:.3f}","RMSE":"{:.3f}","Train_MAPE":"{:.3f}"}))
            # recommend top non-overfitting model
            models_run = st.session_state.get("models_run", {})
            candidates = [name for name, rec in models_run.items() if not rec.get("overfit")]
            if candidates:
                # choose best by MAPE among candidates
                best = min(candidates, key=lambda n: models_run[n]['MAPE'])
                st.success(f"Recommended model (quick run): **{best}** — MAPE {models_run[best]['MAPE']:.3f}%")
        # Give user controls to run heavy optimizers per-model
        st.markdown("---")
        st.markdown("### Run optimized experiment for a model (opt-in)")
        model_choice = st.selectbox("Choose model to optimize", ["RandomForest","CatBoost","XGBoost","LightGBM","ANN_MLP","LSTM","SimpleRNN","HoltWinters"])
        optimizer = st.selectbox("Optimizer", ["None","GridSearch","Optuna"])
        optuna_trials = st.number_input("Optuna trials", min_value=5, max_value=100, value=30, step=5)
        run_opt = st.button("Run optimization")
        if run_opt:
            with st.spinner(f"Running {model_choice} with {optimizer}... this may take time"):
                X_train, X_test, y_train, y_test = time_series_split(df_p, selected_features, tgt, H_local)
                out = run_model(model_choice, X_train, y_train, X_test, y_test, optimizer=optimizer if optimizer!="None" else "None", optuna_trials=optuna_trials, quick_mode=False)
                if out is None or "error" in out:
                    st.error(f"{model_choice} failed: {out.get('error','unknown')}")
                else:
                    # insert/update into session_state models_run
                    models_run = st.session_state.get("models_run", {})
                    preds = out.get("pred"); train_preds = out.get("train_pred")
                    test_m = float(out.get("mape")) if out.get("mape") is not None else (mape(y_test,preds) if preds is not None else np.nan)
                    train_m = float(out.get("train_mape")) if out.get("train_mape") is not None else (mape(y_train,train_preds) if train_preds is not None else np.nan)
                    # overfitting detection
                    overfit = False
                    if not np.isnan(train_m) and (test_m / (train_m + 1e-9)) > 1.4:
                        overfit = True
                    models_run[out.get("name",model_choice)] = {"out":out,"overfit":overfit,"MAPE":test_m,"MAE":mae(y_test,preds) if preds is not None else np.nan,"RMSE":rmse(y_test,preds) if preds is not None else np.nan,"Train_MAPE":train_m}
                    st.session_state["models_run"] = models_run
                    # update leaderboard table
                    results = []
                    for name, rec in models_run.items():
                        if not rec.get("overfit"):
                            results.append({"Model":name,"MAPE":rec.get("MAPE"),"MAE":rec.get("MAE"),"RMSE":rec.get("RMSE"),"Train_MAPE":rec.get("Train_MAPE")})
                    if results:
                        st.session_state["leaderboard"] = pd.DataFrame(results).sort_values("MAPE").reset_index(drop=True)
                    st.success(f"{model_choice} optimized run finished and added to the leaderboard.")
# LEADERBOARD
elif st.session_state["page"] == "Leaderboard":
    st.markdown('<div class="section"><h3>Leaderboard — MAPE (lower is better)</h3></div>', unsafe_allow_html=True)
    lb = st.session_state.get("leaderboard")
    models_run = st.session_state.get("models_run", {})
    if lb is None or lb.empty:
        st.info("No leaderboard yet — run the Arena or upload a dataset.")
    else:
        st.dataframe(lb.style.format({"MAPE":"{:.3f}","MAE":"{:.3f}","RMSE":"{:.3f}","Train_MAPE":"{:.3f}"}))
        excluded = [name for name, rec in models_run.items() if rec.get("overfit")]
        if excluded:
            st.warning("Excluded due to suspected overfitting: " + ", ".join(excluded))
        # top model details & residuals explanation and plot
        if not lb.empty:
            top = lb.iloc[0]["Model"]
            st.markdown(f"**Top recommended model:** {top}")
            rec = models_run.get(top)
            if rec:
                out = rec.get("out")
                preds = out.get("pred"); train_preds = out.get("train_pred")
                # attempt to display prediction vs actual if default dataset present
                if preds is not None and isinstance(preds, (np.ndarray,list)):
                    # align with last N points of preprocessed DF if available
                    if "dataset_df" in st.session_state:
                        try:
                            df_full = st.session_state["dataset_df"]
                            df_p, tgt = basic_preprocess(df_full)
                            actual = df_p[tgt].iloc[-len(preds):]
                            pred_series = pd.Series(preds, index=actual.index)
                            fig, _ = plot_preds_vs_actual(actual, pred_series, f"{top} predictions vs actual")
                            st.plotly_chart(fig, use_container_width=True) # Use st.plotly_chart to display the figure
                            # residuals histogram
                            resid = actual.values - np.array(preds)
                            fig2 = px.histogram(resid, nbins=40, title="Residuals histogram (actual - pred)")
                            st.plotly_chart(fig2, use_container_width=True) # Use st.plotly_chart to display the figure
                            # interpretation text
                            st.markdown("**Residuals interpretation:**")
                            st.markdown("- Center at zero: indicates unbiased predictions on average.")
                            st.markdown("- Narrow distribution: consistent predictions; Wide tails: occasional large errors (outliers).")
                            st.markdown("- Skew indicates systematic over/underprediction in some regimes.")
                        except Exception as e:
                            st.write("Could not plot predictions vs actual due to:", str(e))
                # show params summary
                params = out.get("params", {})
                st.markdown("**Model parameters / summary**")
                st.write(params)
# NEWS — fetching automatically and showing top 10 results (can keywords be made more specific?)
elif st.session_state["page"] == "News":
    st.markdown('<div class="section"><h3>Crypto News — latest</h3><div class="small">News is fetched silently from Colab secrets or server env var; keys are never printed.</div></div>', unsafe_allow_html=True)
    articles = st.session_state.get("news_articles", [])
    if not articles:
        st.info("No news available (NewsAPI key not configured or fetch failed).")
    else:
        for art in articles:
            st.markdown(f"**{art.get('title')}**")
            st.markdown(f"*{art.get('source',{}).get('name')} — {art.get('publishedAt')[:10]}*")
            st.write(art.get("description") or "")
            if art.get("url"):
                st.markdown(f"[Read more]({art.get('url')})")
            st.markdown("---")

# EXPORT — enhanced PDF with plots & params
elif st.session_state["page"] == "Export":
    st.markdown('<div class="section"><h3>Export — PDF report (models & plots)</h3><div class="small">Select models to include; we will embed a small plot and parameters summary for each.</div></div>', unsafe_allow_html=True)
    models_run = st.session_state.get("models_run", {})
    if not models_run:
        st.info("No model runs available to export.")
    else:
        names = list(models_run.keys())
        selected = st.multiselect("Select models to include in PDF", options=names, default=names)
        if st.button("Generate PDF with plots"):
            pdf = FPDF()
            pdf.set_auto_page_break(auto=True, margin=15)
            if "dataset_df" in st.session_state:
                try:
                    df_full = st.session_state["dataset_df"]
                    df_p, tgtcol = basic_preprocess(df_full)
                    for name in selected:
                        rec = models_run[name]
                        out = rec.get("out")
                        preds = out.get("pred")
                        # create page
                        pdf.add_page()
                        pdf.set_font("Arial", 'B', 14); pdf.cell(0,8, f"Model: {name}", ln=True)
                        pdf.set_font("Arial", size=11)
                        pdf.multi_cell(0,6, f"MAPE: {rec.get('MAPE'):.3f}%, MAE: {rec.get('MAE'):.3f}, RMSE: {rec.get('RMSE'):.3f}")
                        pdf.ln(2)
                        # params text
                        params = out.get("params", {})
                        pdf.set_font("Arial", 'B', 12); pdf.cell(0,6,"Parameters:", ln=True)
                        pdf.set_font("Arial", size=10)
                        pdf.multi_cell(0,5, json.dumps(params, indent=2) if params else "N/A")
                        pdf.ln(4)
                        # if dataset present and predictions exist, create plot image and embed
                        if preds is not None and isinstance(preds, (list, np.ndarray)):
                            try:
                                actual = df_p[tgtcol].iloc[-len(preds):]
                                pred_series = pd.Series(preds, index=actual.index)
                                fig, img_bytes = plot_preds_vs_actual(actual, pred_series, f"{name} predictions vs actual")
                                # save png temp
                                tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
                                tmp.write(img_bytes); tmp.flush(); tmp.close()
                                pdf.image(tmp.name, x=10, w=190)
                            except Exception:
                                pass
                except Exception:
                    pdf.add_page()
                    pdf.set_font("Arial", 'B', 14); pdf.cell(0,8, "Error generating plots", ln=True)

            # output pdf
            tmpf = tempfile.NamedTemporaryFile(delete=False, suffix=".pdf")
            pdf.output(tmpf.name)
            with open(tmpf.name, "rb") as f:
                data = f.read()
            st.download_button("📥 Download report (with plots)", data=data, file_name="pulselab_report_with_plots.pdf")

# ABOUT - can remove, msg if needs to be rerun after removing
elif st.session_state["page"] == "About":
    st.markdown('<div class="section"><h3>About PulseLab</h3><div class="small">PulseLab converts your notebook into a polished app with model comparison, news, and export capabilities.</div></div>', unsafe_allow_html=True)
    st.markdown("""
    **Next recommended steps**
    - Add walk-forward cross-validation (rolling-origin) for robust ranking.
    - Persist best models and provide downloadable pickles.
    - Consider adding forecast uncertainty / prediction intervals.
    """)

# End

Overwriting app.py


In [26]:
# We start ngrok, expose the Streamlit server, and print the public URL.
# This was done so the Streamlit app can be accessed from the internet while running in Colab.
# Ensure ngrok auth token is present in Colab userdata under name 'ngrok_2' (we loaded it into env earlier)

import os, time, subprocess, sys
from pyngrok import ngrok

from google.colab import userdata
ngrok_token=userdata.get('ngrok_3')

if not ngrok_token:
    print('No ngrok_auth token found in environment. Please set it in Colab userdata as ngrok_auth.')
else:
    print('Configuring ngrok...')
    ngrok.set_auth_token(ngrok_token)

    # Kill any existing ngrok tunnels - doesnt work, have to go to ngrok dashboard and kill remotely which still doesnt work sometimes - try localhost solution from streamlit issue page (if this doesnt work - check drive for localhost eg)
    print('Killing any existing ngrok tunnels...')
    ngrok.kill()
    time.sleep(2) # Give it a moment to shut down

    # Start streamlit in the background
    print('Starting Streamlit (in background)...')
    # Use nohup to keep it running; redirect output to a log file.
    cmd = 'streamlit run app.py --server.port 8501 --server.enableCORS false'
    proc = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    # create a tunnel
    public_url = ngrok.connect(addr="8501", proto="http")
    print('Streamlit is running and exposed at:', public_url.public_url)
    print('To stop ngrok tunnel, run: ngrok.kill() or interrupt the Colab runtime.')

Configuring ngrok...
Killing any existing ngrok tunnels...
Starting Streamlit (in background)...
Streamlit is running and exposed at: https://disaffectedly-dithionic-jayna.ngrok-free.dev
To stop ngrok tunnel, run: ngrok.kill() or interrupt the Colab runtime.


---------------------------------------------------------
---------------------------------------------------------
NOT FOR APP.PY
--------------------------------------------------------
--------------------------------------------------------
--------------------------------------------------------

In [17]:
# Define basic_preprocess function in the notebook environment
import pandas as pd
import numpy as np

def basic_preprocess(df, target_col=None):
    df = df.copy()
    if target_col is None:
        cands = [c for c in df.columns if c.lower() in ['close','adj close','adj_close','price','close_usd','close*']]
        target_col = cands[0] if cands else df.columns[0]
    df[target_col] = pd.to_numeric(df[target_col], errors='coerce')
    df = df.dropna(subset=[target_col])
    df['lag_1'] = df[target_col].shift(1)
    df['lag_7'] = df[target_col].shift(7)
    df['rmean_7'] = df[target_col].rolling(7, min_periods=1).mean()
    df['rstd_7'] = df[target_col].rolling(7, min_periods=1).std().fillna(0)
    df = df.dropna()
    return df, target_col

In [29]:
# Performing optimized runs for all models and display results
if 'df_p' in locals() and 'tgt' in locals():
    features = [c for c in df_p.columns if c != tgt]
    H_local = 7 # You can change the test horizon if needed

    X_train, X_test, y_train, y_test = time_series_split(df_p, features, tgt, H_local)

    model_list = ["RandomForest", "CatBoost", "XGBoost", "LightGBM", "ANN_MLP", "LSTM", "SimpleRNN", "HoltWinters"]
    optimized_models_run = {}
    results = []

    print("Starting optimized runs for all models...")

    for model_name in model_list:
        print(f"\nRunning optimized {model_name}...")
        optimizer = "None" # Default to None, override for specific models
        optuna_trials = 30 # Default Optuna trials

        if model_name == "RandomForest":
            optimizer = "GridSearch"
        elif model_name == "CatBoost":
             if _CATBOOST and _OPTUNA: # Check if Optuna is available
                optimizer = "Optuna"
                optuna_trials = 50 # More trials for CatBoost Optuna

        # You can add more conditions here for other models and optimizers if their run_model supports it

        out = run_model(model_name, X_train, y_train, X_test, y_test, optimizer=optimizer, optuna_trials=optuna_trials, quick_mode=False)

        if out is None or "error" in out:
            print(f"Optimized {model_name} failed: {out.get('error','unknown')}")
            continue

        preds = out.get("pred"); train_preds = out.get("train_pred")
        test_m = float(out.get("mape")) if out.get("mape") is not None else (mape(y_test,preds) if preds is not None else np.nan)
        train_m = float(out.get("train_mape")) if out.get("train_mape") is not None else (mape(y_train,train_preds) if train_preds is not None else np.nan)
        mae_v = mae(y_test,preds) if preds is not None else np.nan
        rmse_v = rmse(y_test,preds) if preds is not None else np.nan

        # simple overfitting heuristic (same as in app.py)
        overfit = False
        if not np.isnan(train_m) and (test_m / (train_m + 1e-9)) > 1.4:
             overfit = True
        optimized_models_run[out.get("name", model_name)] = {"out": out, "overfit": overfit, "MAPE": test_m, "MAE": mae_v, "RMSE": rmse_v, "Train_MAPE": train_m}
        results.append({"Model": out.get("name", model_name), "MAPE": test_m, "MAE": mae_v, "RMSE": rmse_v, "Train_MAPE": train_m, "Optimizer": optimizer})

    print("\nOptimized Run Results:")
    if results:
        optimized_leaderboard = pd.DataFrame(results).sort_values("MAPE").reset_index(drop=True)
        display(optimized_leaderboard.style.format({"MAPE":"{:.3f}","MAE":"{:.3f}","RMSE":"{:.3f}","Train_MAPE":"{:.3f}"}))

        print("\nOverfitting Check (Optimized Runs):")
        for name, rec in optimized_models_run.items():
            overfit = rec.get("overfit")
            print(f"Model: {name}")
            if overfit:
                print("  Potential Overfitting: Yes")
            else:
                print("  Potential Overfitting: No")
            print("-" * 20)
    else:
        print("No optimized model runs completed successfully.")

else:
    print("Data (df_p, tgt) not found. Please load and preprocess the data first.")


DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.


DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.


Series.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.


Series.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.



Starting optimized runs for all models...

Running optimized RandomForest...



X has feature names, but RandomForestRegressor was fitted without feature names


X has feature names, but RandomForestRegressor was fitted without feature names




Running optimized CatBoost...

Running optimized XGBoost...

Running optimized LightGBM...
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000292 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 1175, number of used features: 8
[LightGBM] [Info] Start training from score 55583.876484

Running optimized ANN_MLP...



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 

Running optimized LSTM...



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 179ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step

Running optimized SimpleRNN...



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 244ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step

Running optimized HoltWinters...

Optimized Run Results:



An unsupported index was provided. As a result, forecasts cannot be generated. To use the model for forecasting, use one of the supported classes of index.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided. As a result, forecasts cannot be generated. To use the model for forecasting, use one of the supported classes of index.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided. As a result, forecasts cannot be generated. To use the model for forecasting, use one of the supported classes of index.


No supported inde

Unnamed: 0,Model,MAPE,MAE,RMSE,Train_MAPE,Optimizer
0,LightGBM,0.552,617.95,686.97,0.481,
1,XGBoost,0.666,743.939,919.126,0.361,
2,RandomForest,0.689,765.785,896.016,0.387,GridSearch
3,CatBoost,1.462,1624.023,2106.356,2.054,Optuna
4,ANN_MLP,1.503,1683.807,1996.835,4.447,
5,HoltWinters,2.031,2237.017,2853.033,4.277,
6,LSTM,2.073,2315.429,2488.103,4.843,
7,SimpleRNN,2.488,2777.412,3085.621,2.942,



Overfitting Check (Optimized Runs):
Model: RandomForest
  Potential Overfitting: Yes
--------------------
Model: CatBoost
  Potential Overfitting: No
--------------------
Model: XGBoost
  Potential Overfitting: Yes
--------------------
Model: LightGBM
  Potential Overfitting: No
--------------------
Model: ANN_MLP
  Potential Overfitting: No
--------------------
Model: LSTM
  Potential Overfitting: No
--------------------
Model: SimpleRNN
  Potential Overfitting: No
--------------------
Model: HoltWinters
  Potential Overfitting: No
--------------------


In [30]:
#visualizations for ppt
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import numpy as np

# For demonstration, let's use the df from the notebook state
df_p, tgt = basic_preprocess(df)

# Function to plot predictions vs actual
def plot_preds_vs_actual_notebook(actual_series, pred_series, title):
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=actual_series.index, y=actual_series.values, mode="lines+markers", name="Actual"))
    fig.add_trace(go.Scatter(x=pred_series.index, y=pred_series.values, mode="lines+markers", name="Predicted"))
    fig.update_layout(title=title, xaxis_title="Index", yaxis_title="Price", width=600, height=500) # Adjusted width and height
    return fig

# Function to plot residuals histogram
def plot_residuals_histogram_notebook(actual_series, pred_series, title):
    resid = actual_series.values - np.array(pred_series)
    fig = px.histogram(resid, nbins=40, title=title)
    fig.update_layout(width=600, height=500) # Adjusted width and height
    return fig

# have to run the optimized models cell first to populate these graphs

if 'optimized_models_run' in locals():
    print("Displaying plots for Optimized Runs:")
    for name, rec in optimized_models_run.items():
        out = rec.get("out")
        preds = out.get("pred")
        if preds is not None and isinstance(preds, (np.ndarray, list)):
            try:
                # Align with last N points of preprocessed DF
                actual = df_p[tgt].iloc[-len(preds):]
                pred_series = pd.Series(preds, index=actual.index)

                # Plot predictions vs actual
                fig_preds = plot_preds_vs_actual_notebook(actual, pred_series, f"{name} predictions vs actual (Optimized)")
                print(f"Displaying plot for {name}: Predictions vs Actual")
                fig_preds.show()

                # Plot residuals histogram
                fig_resid = plot_residuals_histogram_notebook(actual, pred_series, f"{name} Residuals histogram (actual - pred) (Optimized)")
                print(f"Displaying plot for {name}: Residuals Histogram")
                fig_resid.show()

            except Exception as e:
                print(f"Could not plot for {name} due to: {str(e)}")
else:
    print("optimized_models_run variable not found. Please run the optimized models cell first to populate it.")

Displaying plots for Optimized Runs:
Displaying plot for RandomForest: Predictions vs Actual


Displaying plot for RandomForest: Residuals Histogram


Displaying plot for CatBoost: Predictions vs Actual


Displaying plot for CatBoost: Residuals Histogram


Displaying plot for XGBoost: Predictions vs Actual


Displaying plot for XGBoost: Residuals Histogram


Displaying plot for LightGBM: Predictions vs Actual


Displaying plot for LightGBM: Residuals Histogram


Displaying plot for ANN_MLP: Predictions vs Actual


Displaying plot for ANN_MLP: Residuals Histogram


Displaying plot for LSTM: Predictions vs Actual


Displaying plot for LSTM: Residuals Histogram


Displaying plot for SimpleRNN: Predictions vs Actual


Displaying plot for SimpleRNN: Residuals Histogram


Displaying plot for HoltWinters: Predictions vs Actual


Displaying plot for HoltWinters: Residuals Histogram


In [28]:
# Display MAPE for all models and check for overfitting
if 'models_run' in locals():
    print("Model Performance and Overfitting Check (Quick Run):")
    for name, rec in models_run.items():
        mape_score = rec.get("MAPE")
        train_mape = rec.get("Train_MAPE")
        overfit = rec.get("overfit")

        print(f"Model: {name}")
        print(f"  MAPE (Test): {mape_score:.3f}%" if mape_score is not None else "  MAPE (Test): N/A")
        print(f"  MAPE (Train): {train_mape:.3f}%" if train_mape is not None else "  MAPE (Train): N/A")
        if overfit:
            print("  Potential Overfitting: Yes")
        else:
            print("  Potential Overfitting: No")
        print("-" * 20)
else:
    print("models_run variable not found. Please run the quick comparison cell first.")

Model Performance and Overfitting Check (Quick Run):
Model: RandomForest
  MAPE (Test): 0.689%
  MAPE (Train): 0.387%
  Potential Overfitting: No
--------------------
Model: XGBoost
  MAPE (Test): 0.666%
  MAPE (Train): 0.361%
  Potential Overfitting: No
--------------------
Model: CatBoost
  MAPE (Test): 1.462%
  MAPE (Train): 2.054%
  Potential Overfitting: No
--------------------
Model: LightGBM
  MAPE (Test): 0.552%
  MAPE (Train): 0.481%
  Potential Overfitting: No
--------------------
Model: HoltWinters
  MAPE (Test): 2.031%
  MAPE (Train): 4.277%
  Potential Overfitting: No
--------------------
Model: ANN_MLP
  MAPE (Test): 0.955%
  MAPE (Train): 6.595%
  Potential Overfitting: No
--------------------
Model: LSTM
  MAPE (Test): 2.261%
  MAPE (Train): 5.044%
  Potential Overfitting: No
--------------------
Model: SimpleRNN
  MAPE (Test): 3.312%
  MAPE (Train): 3.163%
  Potential Overfitting: No
--------------------


In [21]:
# Run quick comparison of all models to populate models_run
# This is needed for the plotting code to access model predictions

# Assuming df and basic_preprocess are already defined and available
# If not, load and preprocess the data first
# FILENAME = 'BTC-USD_2022-06-30_to_2025-09-30.csv'
# df = pd.read_csv(FILENAME)
# df_p, tgt = basic_preprocess(df)

# Define time_series_split and run_model functions if not already defined
# You can copy them from the app.py file if needed
def time_series_split(df, features, target, H_local):
    split_idx = max(1, len(df) - int(H_local))
    train = df.iloc[:split_idx]
    test = df.iloc[split_idx:]
    X_train = safe_numeric_df(train[features])
    X_test  = safe_numeric_df(test[features])
    y_train = train[target].fillna(method='ffill').fillna(0)
    y_test  = test[target].fillna(method='ffill').fillna(0)
    return X_train, X_test, y_train, y_test

def safe_numeric_df(df):
    num = df.select_dtypes(include=[np.number]).copy()
    num = num.fillna(method='ffill').fillna(0)
    return num

# Need to define run_model and its dependencies (mape, mae, rmse, etc.)
# Copying necessary parts from app.py
def mape(y_true, y_pred):
    y_true = np.array(y_true).astype(float)
    y_pred = np.array(y_pred).astype(float)
    eps = 1e-8
    return float(np.mean(np.abs((y_true - y_pred) / (np.abs(y_true) + eps))) * 100.0)

def mae(y_true, y_pred):
    return float(mean_absolute_error(y_true, y_pred))

def rmse(y_true, y_pred):
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))

# Import necessary libraries for run_model
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd

# Try importing optional libraries for run_model
try:
    from catboost import CatBoostRegressor
    _CATBOOST = True
except Exception:
    _CATBOOST = False
try:
    import xgboost as xgb
    _XGB = True
except Exception:
    _XGB = False
try:
    import lightgbm as lgb
    _LGB = True
except Exception:
    _LGB = False
try:
    import optuna
    _OPTUNA = True
except Exception:
    _OPTUNA = False
try:
    import tensorflow as tf
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, LSTM, SimpleRNN, Dropout
    from tensorflow.keras.callbacks import EarlyStopping
    _TF = True
except Exception:
    _TF = False
try:
    from statsmodels.tsa.holtwinters import ExponentialSmoothing
    _STATS = True
except Exception:
    _STATS = False


def run_model(model_name, X_train, y_train, X_test, y_test, optimizer="None", optuna_trials=20, quick_mode=False):
    try:
        # RANDOM FOREST
        if model_name == "RandomForest":
            if optimizer == "GridSearch":
                param_grid = {'n_estimators':[100,200], 'max_depth':[5,10,None], 'min_samples_split':[2,5]}
                tscv = TimeSeriesSplit(n_splits=3)
                gs = GridSearchCV(RandomForestRegressor(random_state=42), param_grid, cv=tscv, scoring='neg_mean_absolute_error', n_jobs=-1)
                gs.fit(X_train.values, y_train.values)
                model = gs.best_estimator_
            else:
                model = RandomForestRegressor(n_estimators=200, random_state=42)
                model.fit(X_train, y_train)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"RandomForest","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # CATBOOST
        if model_name == "CatBoost":
            if not _CATBOOST:
                return {"error":"CatBoost not installed"}
            model = CatBoostRegressor(iterations=300, learning_rate=0.05, depth=6, random_seed=42, verbose=0)
            model.fit(X_train, y_train, eval_set=(X_test,y_test), early_stopping_rounds=30, verbose=False)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"CatBoost","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_all_params()}

        # XGBOOST
        if model_name == "XGBoost":
            if not _XGB:
                return {"error":"XGBoost not installed"}
            model = xgb.XGBRegressor(n_estimators=300, learning_rate=0.05, random_state=42, verbosity=0)
            model.fit(X_train, y_train, eval_set=[(X_test,y_test)], verbose=False)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"XGBoost","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # LIGHTGBM
        if model_name == "LightGBM":
            if not _LGB:
                return {"error":"LightGBM not installed"}
            model = lgb.LGBMRegressor(n_estimators=300, learning_rate=0.05, random_state=42)
            model.fit(X_train, y_train)
            preds = model.predict(X_test); train_preds = model.predict(X_train)
            return {"name":"LightGBM","model":model,"pred":np.array(preds),"train_pred":np.array(train_preds),"mape":mape(y_test,preds),"train_mape":mape(y_train,train_preds),"params":model.get_params()}

        # ANN_MLP
        if model_name == "ANN_MLP":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            scaler_X = StandardScaler(); scaler_y = StandardScaler()
            Xtr = scaler_X.fit_transform(X_train); Xte = scaler_X.transform(X_test)
            ytr_orig = y_train.values.reshape(-1,1)
            ytr = scaler_y.fit_transform(ytr_orig).flatten()
            model = Sequential([Dense(64, activation='relu', input_shape=(Xtr.shape[1],)), Dropout(0.2), Dense(32, activation='relu'), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(Xtr, ytr, validation_split=0.1, epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(Xte).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(Xtr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"ANN_MLP","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(y_test,pred),"train_mape":mape(y_train,train_pred),"params":{"architecture":"Dense64-32","optimizer":"adam","epochs":None}}

        # LSTM
        if model_name == "LSTM":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            lookback=14
            scaler_X = StandardScaler(); scaler_y = StandardScaler()
            Xt = scaler_X.fit_transform(pd.concat([X_train, X_test]))
            Xtr = Xt[:len(X_train)]; Xte = Xt[len(X_train):]
            y_train_vals = y_train.values; y_test_vals = y_test.values
            def build_sequences(X_all, y_all, lb):
                Xs, ys = [], []
                for i in range(len(X_all)-lb):
                    Xs.append(X_all[i:(i+lb)])
                    ys.append(y_all[i+lb])
                return np.array(Xs), np.array(ys)
            seq_X_tr, seq_y_tr_orig = build_sequences(Xtr, y_train_vals, lookback)
            combined = np.vstack([Xtr[-lookback:], Xte])
            combined_y = np.hstack([y_train_vals[-lookback:], y_test_vals])
            seq_X_te, seq_y_te_orig = build_sequences(combined, combined_y, lookback)
            if len(seq_X_tr)==0 or len(seq_X_te)==0:
                return {"error":"Not enough data for LSTM"}
            seq_y_tr_scaled = scaler_y.fit_transform(seq_y_tr_orig.reshape(-1,1)).flatten()
            seq_y_te_scaled = scaler_y.transform(seq_y_te_orig.reshape(-1,1)).flatten()
            model = Sequential([LSTM(64, input_shape=(seq_X_tr.shape[1], seq_X_tr.shape[2])), Dropout(0.2), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(seq_X_tr, seq_y_tr_scaled, validation_data=(seq_X_te, seq_y_te_scaled), epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(seq_X_te).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(seq_X_tr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"LSTM","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(seq_y_te_orig,pred),"train_mape":mape(seq_y_tr_orig,train_pred),"params":{"lookback":lookback,"layers":"LSTM64"}}

        # SimpleRNN
        if model_name == "SimpleRNN":
            if not _TF:
                return {"error":"TensorFlow not installed"}
            lookback=14
            scaler_X = StandardScaler(); scaler_y = StandardScaler()
            Xt = scaler_X.fit_transform(pd.concat([X_train, X_test]))
            Xtr = Xt[:len(X_train)]; Xte = Xt[len(X_train):]
            y_train_vals = y_train.values; y_test_vals = y_test.values
            def build_sequences(X_all, y_all, lb):
                Xs, ys = [], []
                for i in range(len(X_all)-lb):
                    Xs.append(X_all[i:(i+lb)])
                    ys.append(y_all[i+lb])
                return np.array(Xs), np.array(ys)
            seq_X_tr, seq_y_tr_orig = build_sequences(Xtr, y_train_vals, lookback)
            combined = np.vstack([Xtr[-lookback:], Xte])
            combined_y = np.hstack([y_train_vals[-lookback:], y_test_vals])
            seq_X_te, seq_y_te_orig = build_sequences(combined, combined_y, lookback)
            if len(seq_X_tr)==0 or len(seq_X_te)==0:
                return {"error":"Not enough data for SimpleRNN"}
            seq_y_tr_scaled = scaler_y.fit_transform(seq_y_tr_orig.reshape(-1,1)).flatten()
            seq_y_te_scaled = scaler_y.transform(seq_y_te_orig.reshape(-1,1)).flatten()
            model = Sequential([SimpleRNN(64, input_shape=(seq_X_tr.shape[1], seq_X_tr.shape[2])), Dense(1)])
            model.compile(optimizer='adam', loss='mse')
            es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
            model.fit(seq_X_tr, seq_y_tr_scaled, validation_data=(seq_X_te, seq_y_te_scaled), epochs=100 if not quick_mode else 20, batch_size=32, callbacks=[es], verbose=0)
            pred_scaled = model.predict(seq_X_te).flatten()
            pred = scaler_y.inverse_transform(pred_scaled.reshape(-1,1)).flatten()
            train_pred_scaled = model.predict(seq_X_tr).flatten()
            train_pred = scaler_y.inverse_transform(train_pred_scaled.reshape(-1,1)).flatten()
            return {"name":"SimpleRNN","model":model,"pred":np.array(pred),"train_pred":np.array(train_pred),"mape":mape(seq_y_te_orig,pred),"train_mape":mape(seq_y_tr_orig,train_pred),"params":{"lookback":lookback,"layers":"SimpleRNN64"}}

        # HoltWinters
        if model_name == "HoltWinters":
            if not _STATS:
                return {"error":"statsmodels not installed"}
            y = pd.concat([y_train, y_test])
            train = y.iloc[:len(y_train)]; test = y.iloc[len(y_train):]
            best = {"mape": np.inf}
            for a in [0.1,0.2]:
                for b in [0.01,0.05]:
                    try:
                        holt = ExponentialSmoothing(train, trend='add', seasonal=None, initialization_method='heuristic').fit(smoothing_level=a, smoothing_trend=b, optimized=False)
                        fc = holt.forecast(len(test))
                        sc = mape(test, fc)
                        if sc < best['mape']:
                            best = {'mape':sc,'pred':fc.values,'train_pred':holt.fittedvalues.values,'params':{'smoothing_level':a,'smoothing_trend':b}}
                    except:
                        pass
            if best['mape'] < np.inf:
                return {"name":"HoltWinters","model":None,"pred":np.array(best['pred']),"train_pred":np.array(best['train_pred']),"mape":best['mape'],"train_mape":mape(train,best['train_pred']),"params":best.get('params',{})}

    except Exception as e:
        return {"error": str(e)}
    return {"error":"Unknown model"}

def naive_last(X_train, X_test, y_train, y_test):
    last = y_train.iloc[-1]
    return np.repeat(last, len(y_test))

def quick_all_models(df, target_col, features, H_local=7):
    X_train, X_test, y_train, y_test = time_series_split(df, features, target_col, H_local)
    model_list = ["RandomForest","XGBoost","CatBoost","LightGBM","HoltWinters","ANN_MLP","LSTM","SimpleRNN"]
    results = []
    models_run = {}
    naive_preds = naive_last(X_train, X_test, y_train, y_test)
    results.append({"Model":"NaiveLast","MAPE":mape(y_test, naive_preds),"MAE":mae(y_test,naive_preds),"RMSE":rmse(y_test,naive_preds),"Train_MAPE":mape(y_train,np.repeat(y_train.iloc[-1], len(y_train)))})
    for m in model_list:
        out = run_model(m, X_train, y_train, X_test, y_test, optimizer="None", quick_mode=True)
        if out is None or "error" in out:
            continue
        preds = out.get("pred"); train_preds = out.get("train_pred")
        test_m = float(out.get("mape")) if out.get("mape") is not None else (mape(y_test,preds) if preds is not None else np.nan)
        train_m = float(out.get("train_mape")) if out.get("train_mape") is not None else (mape(y_train,train_preds) if train_preds is not None else np.nan)
        mae_v = mae(y_test,preds) if preds is not None else np.nan
        rmse_v = rmse(y_test,preds) if preds is not None else np.nan
        # simple overfitting heuristic
        overfit = False
        if not np.isnan(train_m) and train_m + 5.0 < test_m:  # absolute gap heuristic
            overfit = True
        models_run[out.get("name",m)] = {"out":out,"overfit":overfit,"MAPE":test_m,"MAE":mae_v,"RMSE":rmse_v,"Train_MAPE":train_m}
        if not overfit:
            results.append({"Model":out.get("name",m),"MAPE":test_m,"MAE":mae_v,"RMSE":rmse_v,"Train_MAPE":train_m})
    lb = pd.DataFrame(results).sort_values("MAPE").reset_index(drop=True)
    return lb, models_run


# Assuming df is already loaded and basic_preprocess is defined
df_p, tgt = basic_preprocess(df)
features = [c for c in df_p.columns if c != tgt]
H_DEFAULT = 7 # Or the desired test horizon
leaderboard, models_run = quick_all_models(df_p, tgt, features, H_DEFAULT)

print("models_run populated. You can now run the plotting cell.")

  num = num.fillna(method='ffill').fillna(0)
  num = num.fillna(method='ffill').fillna(0)
  y_train = train[target].fillna(method='ffill').fillna(0)
  y_test  = test[target].fillna(method='ffill').fillna(0)


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000206 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 1175, number of used features: 8
[LightGBM] [Info] Start training from score 55583.876484


  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


  super().__init__(**kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 282ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step


  super().__init__(**kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 152ms/step
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
models_run populated. You can now run the plotting cell.


===========================================================

LOCALTUNNEL code in case ngrok tunnels hit limit, fyi, the ip address thing is the password if someone wants to run and check

===========================================================

In [4]:
# Run this cell to launch the app
# Install localtunnel
!npm install -g localtunnel

# Start streamlit in the background and redirect output to a log file
!streamlit run app.py --server.port 8501 &>./streamlit.log & curl ipv4.icanhazip.com

# Expose the port using localtunnel
# The output will include your public URL.
!lt --port 8501


[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K
added 22 packages in 6s
[1G[0K⠸[1G[0K
[1G[0K⠸[1G[0K3 packages are looking for funding
[1G[0K⠸[1G[0K  run `npm fund` for details
[1G[0K⠸[1G[0K35.224.124.227
your url is: https://metal-bobcats-hug.loca.lt
^C
