##Model reminder

### NeuralProphet Forecasting Methods Guide

#### Model Strategy
- **Daily Model**: Forecast daily sales → aggregate to weeks/months
- **Weekly Model**: Forecast weekly sales → aggregate to months/quarters
- Maximum flexibility with daily base + validation from weekly model

## Forecasting Methods

#### 1. Simple .predict(test_df)
```python
forecast = m.predict(test_df)
plot_data = forecast[['ds', 'yhat1']].dropna()  # Use yhat1 column
```
- **Use when**: You have test data with known dates
- **Extract**: `yhat1` column (each prediction is 1-step ahead)
- **Best for**: Model validation on historical test periods

#### 2. make_future_dataframe() - Single Shot
```python
future_df = m.make_future_dataframe(train_df, periods=30, events=holidays_df)
forecast = m.predict(future_df)

# Extract diagonal values (Day 1=yhat1, Day 2=yhat2, etc.)
forecast_clean = extract_diagonal_forecast(forecast, periods=30)
```
- **Use when**: Forecasting into future, periods ≤ model's n_forecasts
- **Extract**: Diagonal approach (yhat1, yhat2, yhat3...)
- **Best for**: Short to medium-term forecasting within model's native horizon

#### 3. Recursive Function
```python
forecast = recursive_predict(m, train_df, forecast_periods=90, holidays_df=holidays_df)
plot_data = forecast[['ds', 'yhat1']]  # Already clean format
```
- **Use when**: Need to forecast beyond model's n_forecasts parameter
- **Extract**: Returns clean format with `ds` and `yhat1` columns
- **Best for**: Long-term forecasting (90+ days)

### Decision Tree

**Do you have test data with known dates?**
→ YES: Use `.predict(test_df)` + `yhat1`

**Forecasting into unknown future?**
→ Short term (≤ n_forecasts): Use `make_future_dataframe` + diagonal extraction
→ Long term (> n_forecasts): Use `recursive_predict` + `yhat1`

### Data Aggregation Examples

#### Daily to Weekly/Monthly
```python
# From daily forecast
weekly_agg = daily_forecast.set_index('ds').resample('W').sum().reset_index()
monthly_agg = daily_forecast.set_index('ds').resample('M').sum().reset_index()
```

#### Weekly to Monthly/Quarterly  
```python
# From weekly forecast
monthly_agg = weekly_forecast.set_index('ds').resample('M').sum().reset_index()
quarterly_agg = weekly_forecast.set_index('ds').resample('Q').sum().reset_index()
```

### Model Setup Template

#### Daily Model
```python
daily_params = {'n_lags': 30, 'quantiles': [0.05, 0.95], 'weekly_seasonality': True}
daily_model = NeuralProphet(**daily_params)
daily_model.add_events('Holiday')
```

#### Weekly Model  
```python
# Prepare weekly data
weekly_df = daily_df.set_index('ds').resample('W').sum().reset_index()
weekly_df.columns = ['ds', 'y']

weekly_params = {'n_lags': 8, 'yearly_seasonality': True}
weekly_model = NeuralProphet(**weekly_params)
```

### Key Reminders
- **yhat1 vs Diagonal**: Use yhat1 for sequential predictions, diagonal for multi-step from single point
- **Holidays**: Always include `holidays_df` parameter in forecasting functions
- **n_forecasts**: Check your model's native forecast horizon to choose method
- **Validation**: Compare daily→weekly vs weekly direct forecasts for accuracy

##Imports

In [0]:
from neuralprophet import NeuralProphet, set_log_level
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
import warnings
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
import plotly.graph_objects as go
from copy import deepcopy
warnings.filterwarnings("ignore", category=FutureWarning)
set_log_level("ERROR")

##Model Eval

In [0]:
def model_testing (test_df, test_forecast):
  mse=np.sqrt(mean_squared_error(y_true=test_df['y'], y_pred=test_forecast['yhat1']))
  mae=mean_absolute_error(y_true=test_df['y'], y_pred=test_forecast['yhat1'])
  mape=mean_absolute_percentage_error(y_true=test_df['y'], y_pred=test_forecast['yhat1'])

  print('Mean Squared Error:', mse)
  print('Mean Absolute Error:', mae)
  print('Mean Absolute Percentage Error:', mape)

##Import dataframe

In [0]:
df = pd.read_csv('/Workspace/Repos/ryan@delve.systems/Prophet_AI/Wellness_Sales_Quantity_Grouped.csv')

df.rename(columns = {'trandate':'ds', 'AvgSale':'y'}, inplace = True)
df.tail()

In [0]:


fig = go.Figure()

# Add actual data
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Actual'))

# Update layout
fig.update_layout(
    title='Actuals',
    xaxis_title='Date',
    yaxis_title='Value'
)

# Show the plot
fig.show()

##Data quality checks

###Consistency

In [0]:
spark_df = spark.createDataFrame(df)
consistency_check = spark_df.groupby('ds').count().orderBy('count')
print (f"Data consistency: {consistency_check.display()}")

if consistency_check.filter(consistency_check['count'] < 1).count() > 0:
    print("Data consistency check failed: There are multiple records for the same date.")
else:
    print("Data consistency check passed: All dates are unique.")

###Accuracy check

In [0]:
accuracy_check_expression = "y < 0 OR y > 12000"

accuracy_check_result = spark_df.filter(accuracy_check_expression).count()

if accuracy_check_result > 0:
    print(f"Accuracy check failed: {accuracy_check_result} records have values outside the range [0, 12000].")
else:
    print("Accuracy check passed: All records have values within the range [0, 12000].")

spark_df[spark_df['y']>12000].display()

###Outliers

In [0]:
from pyspark.sql import functions as F

# Detect outliers using z-score
mean = spark_df.select(F.mean('y')).collect()[0][0]
std = spark_df.select(F.stddev('y')).collect()[0][0]
z_score_threshold = 5

# Calculate z-score for each record 
df_with_z_score = spark_df.withColumn('z_score', (F.col("y")-mean)/std)

outliers = df_with_z_score.filter(~F.col('z_score').between(-z_score_threshold, z_score_threshold))
cleaned_df = df_with_z_score.filter(F.col("z_score").between(-z_score_threshold, z_score_threshold))
# Filter outliers

# Mark as outliers
df_with_outlier = df_with_z_score.withColumn("_outlier",
    F.when(
        (F.col("z_score") < -z_score_threshold) |
        (F.col("z_score") > z_score_threshold), 1
    ).otherwise(0))
print(f"With outliers - count: {spark_df.count()}")
print(f"Global_active_power - mean: {mean}, stddev_value:{std}, z_score_threshold: {z_score_threshold}")
print(f"Without outliers - count: {cleaned_df.count()}")
print(f"Outliers - count: {outliers.count()}")
print("Outliers:")
outliers.display()

###Normalizing

In [0]:
min_max_values = cleaned_df.select( F.min(F.col('y')).alias("min_y"),
                                   F.max(F.col('y')).alias("max_y")).collect()[0]

# Normalize the columns
min_value = min_max_values[f"min_y"]
max_value = min_max_values[f"max_y"]

normalized_df = cleaned_df.withColumn('normalized_y', (F.col('y') - min_value) / (max_value - min_value))
normalized_df.display()

###Standardizing

In [0]:
#Standardizing the dataset:
#NeuralProphet uses gradient descent under the hood (like a neural network), and scaling helps with faster convergence and better stability.
#Especially helpful if your 'y' values are large (e.g. in the thousands or more), or have seasonal variations on different scales.

df_cleaned = cleaned_df.toPandas()

log_col = np.log(df_cleaned['y'])
mean_value = log_col.mean()
std_value = log_col.std()

log_stats = {'mean': mean_value, 'std': std_value}
df_cleaned[f'standardized_y'] = (log_col - mean_value) / std_value
df_cleaned['ds'] = pd.to_datetime(df_cleaned['ds'])
df_cleaned.head()

###Data Profile with Ydata

In [0]:
from ydata_profiling import ProfileReport

profile = ProfileReport(df_cleaned, title='Time Series Data Profiling',tsmode=True,sortby='ds',infer_dtypes=False,interactions=None,missing_diagrams=None,
                        correlations={"auto":{"calculate":False},
                                      "pearson": {"calculate": False},
                                      "spearman": {"calculate": False}})

profile.to_file("TimeSeriesProfiling.html")
report_html = profile.to_html()
displayHTML(report_html)

###Check for Stationarity

In [0]:
from statsmodels.tsa.stattools import adfuller
result = adfuller(df_cleaned['y'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

if result[1] < 0.05:
    print("Reject the null hypothesis. The time series is stationary.")
else:
    print("Fail to reject the null hypothesis. The time series is non-stationary")

In [0]:
df_cleaned['y_diff'] = df_cleaned['y'].diff().dropna()
result = adfuller(df_cleaned['y_diff'].dropna())
print(f'ADF Statistic (1st diff): {result[0]}')
print(f'p-value: {result[1]}')


###Skewness and Kurtosis

In [0]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Assume df_cleaned is already a Pandas DataFrame and has a 'ds' column (datetime)
df_cleaned['DayOfWeek'] = pd.to_datetime(df_cleaned['ds']).dt.dayofweek +1  # Monday = 0, Sunday = 6

# Plot the distribution
sns.histplot(df_cleaned['DayOfWeek'], kde=True, bins=7)
plt.xlabel("Day of Week (0=Monday, 6=Sunday)")
plt.title("Distribution of Day of Week")
plt.show()



In [0]:
sns.boxplot(x=df_cleaned['DayOfWeek'], y=df_cleaned['y'])
plt.xlabel("Day of Week (0=Monday, 6=Sunday)")
plt.ylabel("y")
plt.title("Boxplot of y by Day")

###Autocorrelation and partial correlation

In [0]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

acf = plot_acf(df_cleaned['y'], lags=3*24)
pacf = plot_pacf(df_cleaned['y'], lags=3*24)
plt.show()

#

###Seasonality

In [0]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Ensure 'ds' is datetime and 'y' is your target column
# df_cleaned['ds'] = pd.to_datetime(df_cleaned['ds'])
# df_cleaned.set_index('ds', inplace=True)

# Decompose 'y' over a daily frequency
results = seasonal_decompose(df_cleaned['y'], period = 30)  # weekly seasonality

# Plot the result
results.plot()
plt.tight_layout()
plt.show()

##Create Train/Test Split

In [0]:
# Make sure ds is datetime
df['ds'] = pd.to_datetime(df['ds'])

# Split by date
split_date = df_cleaned['ds'].max() - pd.Timedelta(days=90)

train_df = df_cleaned[df_cleaned['ds'] < split_date]
test_df = df_cleaned[df_cleaned['ds'] >= split_date]

In [0]:
train_df.head()

#SARIMA 

In [0]:
import pmdarima as pm

model = pm.auto_arima(train_df['y'],
                       seasonal=True,
                        m=7,
                        d=0,
                        D=1,
                        max_p= 3,
                        max_q=3,
                        max_P=3,
                        max_Q=3,
                        information_criterion='aic',
                        trace=True,
                        stepwise=True,
                        error_action='ignore', 
                        suppress_warnings=True)

print(model.summary())

In [0]:
def forecast_step ():
  forecast, conf_int = model.predict(n_periods=1,
                                    return_conf_int=True)
  return (
        forecast.tolist()[0],  # Convert forecast to scalar
        np.asarray(conf_int).tolist()[0])  # Convert confidence interval to list
  
  
  
forecasts = []  # Store forecasts
conf_intervals = []  # Store confidence intervals

# Iterate over each observation in the test dataset
for obs in test_df['y']:
    forecast, conf_int = forecast_step()  # Forecast next step
    forecasts.append(forecast)  # Append forecast to list
    conf_intervals.append(conf_int)  # Append confidence interval to list

    # Update the model with the new observation
    model.update(obs)

In [0]:
from sklearn.metrics import mean_squared_error
from pmdarima.metrics import smape
# Calculate and print the mean squared error of the forecasts
print(f"Mean squared error: {mean_squared_error(test_df['y'], forecasts)}")
# Calculate and print the Symmetric Mean Absolute Percentage Error (SMAPE)
print(f"SMAPE: {smape(test_df['y'], forecasts)}")

In [0]:
#Naive forecast means simply that yesterday = today
naive_forecast = test_df['y'].shift(1)
mse_naive = mean_squared_error(test_df['y'][1:], naive_forecast[1:])

print(f"Naive forecast MSE: {mse_naive}")

In [0]:
# Set the figure size for the plot
plt.figure(figsize=(10, 6))
# Plot the test data
plt.plot(test_df.index, test_df['y'], label='Test')
# Plot the forecasted values
plt.plot(test_df.index, forecasts, label='Forecast')
# Label the x-axis as 'Date'
plt.xlabel('Date')
# Label the y-axis as 'Global_active_power'
plt.ylabel('y')
# Set the title of the plot
plt.title('SARIMA Forecast vs Actuals')
# Display the legend
plt.legend()
# Show the plot
plt.show()

#Neural Prophet

##Build and train base model

In [0]:
m_base = NeuralProphet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
)

metrics_base = m_base.fit(train_df)
base_forecast = m_base.predict(test_df)
base_forecast.head()

In [0]:
fig, ax = plt.subplots(figsize=(10, 6))

# Plotting the test_df (actual values)
#ax.plot(test_df['ds'], test_df['y'], label='Actual', color='blue')

# Plotting the base_forecast (predicted values)
ax.plot(base_forecast['ds'], base_forecast['yhat1'], label='Forecast', color='red')
ax.plot(base_forecast['ds'], base_forecast['y'], label='Actual', color='blue')

# Adding labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.set_title('Actual vs. Forecasted Values')
ax.legend()

# Display the plot
plt.show()

In [0]:
model_testing(test_df,base_forecast)

##Build and train model with holidays

In [0]:
holidays_df = pd.DataFrame({
    "event": 'Holiday',
    "ds": pd.to_datetime([
        "2023-03-21",
        "2023-04-20",
        "2023-04-21",
        "2023-04-27",
        "2023-05-01",
        "2023-06-28",
        "2023-06-29",
        "2023-09-25",
        "2023-12-01",
        "2023-12-16",
        "2023-12-25",
        "2023-12-26",
        "2024-01-01",
        "2024-03-21",
        "2024-03-29",
        "2024-04-01",
        "2024-04-27",
        "2024-05-01",
        "2024-06-16",
        "2024-06-17",
        "2024-08-09",
        "2024-09-24",
        "2024-12-16",
        "2024-12-25",
        "2024-12-26",
        "2025-01-01",  # New Year’s Day
        "2025-03-21",  # Human Rights Day
        "2025-04-18",  # Good Friday
        "2025-04-21",  # Family Day
        "2025-04-27",  # Freedom Day
        "2025-04-28",  # Freedom Day Observed
        "2025-05-01",  # Workers' Day
        "2025-06-16",  # Youth Day
        "2025-08-09",  # National Women’s Day
        "2025-09-24",  # Heritage Day
        "2025-12-16",  # Day of Reconciliation
        "2025-12-25",  # Christmas Day
        "2025-12-26"   # Day of Goodwill
    ]),
    "lower_window": 0,
    "upper_window": 0
})

holidays_df.head()



In [0]:
m_holidays = NeuralProphet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    seasonality_mode='additive',
)

m_holidays.add_events('Holiday')

# Create a combined dataframe with events
df_with_holidays = m_holidays.create_df_with_events(train_df, holidays_df)

# Fit the model with the combined dataframe
metrics_holidays = m_holidays.fit(df_with_holidays)

# For prediction, you need to include events in the test period too
# First, filter holidays that fall within the test period
test_start = test_df['ds'].min()
test_end = test_df['ds'].max()
test_holidays = holidays_df[(holidays_df['ds'] >= test_start) & (holidays_df['ds'] <= test_end)]

# Create test dataframe with events
test_with_holidays = m_holidays.create_df_with_events(test_df, test_holidays)

# Make predictions
base_forecast_holidays = m_holidays.predict(test_with_holidays)
base_forecast_holidays.head()

In [0]:
fig, ax = plt.subplots(figsize=(10, 6))

# Plotting the test_df (actual values)
ax.plot(test_df['ds'], test_df['y'], label='Actual', color='blue')

# Plotting the base_forecast (predicted values)
ax.plot(base_forecast_holidays['ds'], base_forecast_holidays['yhat1'], label='Forecast', color='red')
# Adding labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.set_title('Actual vs. Forecasted Values')
ax.legend()

# Display the plot
plt.show()

In [0]:
model_testing(test_df[0:89], base_forecast_holidays[0:89])

##Build and train model with autoregression

In [0]:
holidays_df = pd.DataFrame({
    "event": 'Holiday',
    "ds": pd.to_datetime([
        "2023-03-21",
        "2023-04-20",
        "2023-04-21",
        "2023-04-27",
        "2023-05-01",
        "2023-06-28",
        "2023-06-29",
        "2023-09-25",
        "2023-12-01",
        "2023-12-16",
        "2023-12-25",
        "2023-12-26",
        "2024-01-01",
        "2024-03-21",
        "2024-03-29",
        "2024-04-01",
        "2024-04-27",
        "2024-05-01",
        "2024-06-16",
        "2024-06-17",
        "2024-08-09",
        "2024-09-24",
        "2024-12-16",
        "2024-12-25",
        "2024-12-26",
        "2025-01-01",  # New Year’s Day
        "2025-03-21",  # Human Rights Day
        "2025-04-18",  # Good Friday
        "2025-04-21",  # Family Day
        "2025-04-27",  # Freedom Day
        "2025-04-28",  # Freedom Day Observed
        "2025-05-01",  # Workers' Day
        "2025-06-16",  # Youth Day
        "2025-08-09",  # National Women’s Day
        "2025-09-24",  # Heritage Day
        "2025-12-16",  # Day of Reconciliation
        "2025-12-25",  # Christmas Day
        "2025-12-26"   # Day of Goodwill
    ]),
    "lower_window": 0,
    "upper_window": 0
})

In [0]:
train_df.head()

#train_df_standardized = train_df[['ds','standardized_y']]
#train_df = train_df[['ds','y']]
test_df = test_df[['ds','y']]


###7 Day Forecast model

In [0]:
train_df = train_df [['ds','y']]
test_df = test_df[['ds','y']]

In [0]:
# Create a NeuralProphet model with default parameters
# params = {
#     'n_lags': 30,
#     'n_forecasts': 7,
#     'ar_reg': 0.18351866767708624,
#     'seasonality_mode': 'additive',
#     'yearly_seasonality': True,
#     'weekly_seasonality': True,
#     'daily_seasonality': False,
# }

params = {'n_lags':30,
          'n_forecasts':7,
        'quantiles':[0.05,0.95],
        'weekly_seasonality': True}

m = NeuralProphet(**params)
m.add_events('Holiday')

# Create a combined dataframe with events
train_with_holidays = m.create_df_with_events(train_df, holidays_df)

# Fit the model with the combined dataframe
metrics = m.fit(train_with_holidays)

#Filter holidays that fall within the test period
test_start = test_df['ds'].min()
test_end = test_df['ds'].max()
test_holidays = holidays_df[(holidays_df['ds'] >= test_start) & (holidays_df['ds'] <= test_end)]

# Create test dataframe with events
test_with_holidays = m.create_df_with_events(test_df, test_holidays)

# Make predictions for the duration of the test_with_holidays df
forecast_holidays = m.predict(test_with_holidays)
forecast_holidays.head()

In [0]:
future_df = m.make_future_dataframe(train_with_holidays, periods=7, events_df=holidays_df)
forecast = m.predict(future_df)
forecast.head()

In [0]:
# Option 1: Use the diagonal values (proper multi-step forecast)
forecast_values = []
dates = []

for i, row in forecast_holidays.iterrows():
    # For each row, use the appropriate yhat column
    # If this is the first forecast day, use yhat1
    # If this is the second forecast day, use yhat2, etc.
    
    yhat_cols = [col for col in forecast_holidays.columns if col.startswith('yhat') and col[4:].replace('.', '').isdigit()]
    
    for j, col in enumerate(yhat_cols, 1):
        if pd.notna(row[col]):
            forecast_values.append(row[col])
            dates.append(row['ds'])
            break  # Use the first available forecast

# Create clean forecast dataframe
forecast_clean = pd.DataFrame({
    'ds': dates,
    'yhat1': forecast_values
})

# Plot
plt.figure(figsize=(12, 6))
plt.plot(test_df['ds'], test_df['y'], label='Actual', color='blue')
plt.plot(forecast_clean['ds'], forecast_clean['yhat1'], label='Forecast', color='red')
plt.title('Actual vs. Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()

In [0]:
model_testing(test_df[-61: ], forecast_clean)

##30 Day Model

In [0]:
# Split by date
split_date = df_cleaned['ds'].max() - pd.Timedelta(days=120)

train_df = df_cleaned[df_cleaned['ds'] < split_date]
test_df = df_cleaned[df_cleaned['ds'] >= split_date]

train_df = train_df[['ds','y']]
test_df = test_df[['ds','y']]

In [0]:
params = {'n_lags':90,
          'n_forecasts':30,
        'quantiles':[0.05,0.95],
        'weekly_seasonality': True}

m_30 = NeuralProphet(**params)
m_30.add_events('Holiday')

# Create a combined dataframe with events
train_with_holidays = m.create_df_with_events(train_df, holidays_df)

# Fit the model with the combined dataframe
metrics = m_30.fit(train_with_holidays)

#Filter holidays that fall within the test period
test_start = test_df['ds'].min()
test_end = test_df['ds'].max()
test_holidays = holidays_df[(holidays_df['ds'] >= test_start) & (holidays_df['ds'] <= test_end)]

# Create test dataframe with events
test_with_holidays = m_30.create_df_with_events(test_df, test_holidays)

# Make predictions for the duration of the test_with_holidays df
forecast_holidays = m_30.predict(test_with_holidays)
forecast_holidays.head()

In [0]:
future_df = m.make_future_dataframe(train_with_holidays, periods=1, events_df=holidays_df)
forecast = m.predict(future_df)
forecast.head()

In [0]:
# Option 1: Use the diagonal values (proper multi-step forecast)
forecast_values = []
dates = []

for i, row in forecast_holidays.iterrows():
    # For each row, use the appropriate yhat column
    # If this is the first forecast day, use yhat1
    # If this is the second forecast day, use yhat2, etc.
    
    yhat_cols = [col for col in forecast_holidays.columns if col.startswith('yhat') and col[4:].replace('.', '').isdigit()]
    
    for j, col in enumerate(yhat_cols, 1):
        if pd.notna(row[col]):
            forecast_values.append(row[col])
            dates.append(row['ds'])
            break  # Use the first available forecast

# Create clean forecast dataframe
forecast_clean = pd.DataFrame({
    'ds': dates,
    'yhat1': forecast_values
})

# Plot
plt.figure(figsize=(12, 6))
#plt.plot(test_df['ds'][-31:], test_df['y'][-31:], label='Actual', color='blue')
plt.plot(forecast_clean['ds'], forecast_clean['yhat1'], label='Forecast', color='red')
plt.title('Actual vs. Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()

In [0]:
model_testing(test_df[-31:], forecast_clean)

###Recursive prediction

In [0]:
def recursive_predict_old(model, history_df, forecast_periods, n_forecasts=None):
    """
    Recursively extend a NeuralProphet forecast beyond the native forecast horizon.

    Args:
        model: Trained NeuralProphet model.
        history_df (pd.DataFrame): The historical training data with columns ['ds', 'y'].
        forecast_periods (int): Total number of days to forecast into the future.
        n_forecasts (int, optional): Number of steps forecasted per iteration.
                                    If None, uses model.n_forecasts.

    Returns:
        pd.DataFrame: DataFrame with dates and forecasted values.
    """
    # Use model's n_forecasts if not specified
    if n_forecasts is None:
        # n_forecasts = params['n_forecasts']
        n_forecasts = model.n_forecasts

    # Create a copy of history to avoid modifying the original
    df_history = deepcopy(history_df)

    # Create a dataframe to store all forecasts
    all_forecasts = pd.DataFrame(columns=['ds', 'yhat1'])

    # Calculate how many iterations we need
    remaining_periods = forecast_periods

    while remaining_periods > 0:
        print(f"Forecasting {min(n_forecasts, remaining_periods)} days ahead...")

        # Generate future dataframe
        periods = min(n_forecasts, remaining_periods)
        df_future = model.make_future_dataframe(
            df=df_history,
            periods=periods,
            n_historic_predictions=False,
            events=holidays_df
        )

        # Predict
        df_forecast = model.predict(df_future)

        if df_forecast is None or df_forecast.empty:
            print(f"Warning: Empty forecast")
            break

        # Check available forecast columns
        forecast_cols = [col for col in df_forecast.columns if col.startswith('yhat')]
        print(f"Available forecast columns: {forecast_cols}")

        # Add forecasts to our collection
        forecast_rows = []
        new_history_rows = []

        # If we only have yhat1 column but need to forecast multiple days
        if len(forecast_cols) == 1 and forecast_cols[0] == 'yhat1':
            # Use each row's yhat1 value
            for i in range(len(df_forecast)):
                forecast_row = {
                    'ds': df_forecast['ds'].iloc[i],
                    'yhat1': df_forecast['yhat1'].iloc[i]
                }
                forecast_rows.append(forecast_row)

                # Also add to history for next iteration
                new_history_row = {
                    'ds': df_forecast['ds'].iloc[i],
                    'y': df_forecast['yhat1'].iloc[i]
                }
                new_history_rows.append(new_history_row)
        else:
            # Handle multi-column case (yhat1, yhat2, etc.)
            for j in range(min(len(forecast_cols), len(df_forecast))):
                if j < len(df_forecast):
                    yhat_col = forecast_cols[j]
                    forecast_row = {
                        'ds': df_forecast['ds'].iloc[j],
                        'yhat1': df_forecast[yhat_col].iloc[j]
                    }
                    forecast_rows.append(forecast_row)

                    new_history_row = {
                        'ds': df_forecast['ds'].iloc[j],
                        'y': df_forecast[yhat_col].iloc[j]
                    }
                    new_history_rows.append(new_history_row)

        # Add to forecasts
        if forecast_rows:
            all_forecasts = pd.concat([all_forecasts, pd.DataFrame(forecast_rows)], ignore_index=True)

            # Add to history for next iteration
            if new_history_rows:
                df_append = pd.DataFrame(new_history_rows)
                df_history = pd.concat([df_history, df_append], ignore_index=True)

                # Update remaining periods
                remaining_periods -= len(new_history_rows)
            else:
                print(f"Warning: No valid forecasts generated")
                break
        else:
            print(f"Warning: No valid forecasts generated")
            break

    # Limit to requested forecast periods
    all_forecasts = all_forecasts.head(forecast_periods)

    if len(all_forecasts) == 0:
        print("No forecasts generated.")
        return None

    return all_forecasts

In [0]:
import pandas as pd
from copy import deepcopy
import warnings

def recursive_predict(model, history_df, forecast_periods, n_forecasts=None, 
                     holidays_df=None, max_history_length=None):
    """
    Recursively extend a NeuralProphet forecast beyond the native forecast horizon.

    Args:
        model: Trained NeuralProphet model.
        history_df (pd.DataFrame): The historical training data with columns ['ds', 'y'].
        forecast_periods (int): Total number of days to forecast into the future.
        n_forecasts (int, optional): Number of steps forecasted per iteration.
                                    If None, uses model.n_forecasts.
        holidays_df (pd.DataFrame, optional): Holiday events dataframe.
        max_history_length (int, optional): Maximum length of history to keep for memory efficiency.

    Returns:
        pd.DataFrame: DataFrame with dates and forecasted values.
        
    Raises:
        ValueError: If inputs are invalid.
        RuntimeError: If forecasting fails.
    """
    
    # Input validation
    if model is None:
        raise ValueError("Model cannot be None")
    
    if history_df is None or history_df.empty:
        raise ValueError("history_df cannot be None or empty")
    
    required_cols = ['ds', 'y']
    if not all(col in history_df.columns for col in required_cols):
        raise ValueError(f"history_df must contain columns: {required_cols}")
    
    if forecast_periods <= 0:
        raise ValueError("forecast_periods must be positive")
    
    # Check for NaN values
    if history_df[required_cols].isnull().any().any():
        warnings.warn("history_df contains NaN values, which may affect forecast quality")
    
    # Use model's n_forecasts if not specified
    if n_forecasts is None:
        try:
            n_forecasts = getattr(model, 'n_forecasts', 1)
        except AttributeError:
            n_forecasts = 1
            warnings.warn("Could not determine model.n_forecasts, using 1")
    
    if n_forecasts <= 0:
        raise ValueError("n_forecasts must be positive")
    
    # Create a copy (assuming data is already clean and sorted)
    df_history = deepcopy(history_df)
    
    # Pre-allocate list for efficiency
    all_forecasts = []
    remaining_periods = forecast_periods
    iteration = 0
    max_iterations = forecast_periods  # Prevent infinite loops
    
    try:
        while remaining_periods > 0 and iteration < max_iterations:
            iteration += 1
            periods_to_forecast = min(n_forecasts, remaining_periods)
            
            print(f"Iteration {iteration}: Forecasting {periods_to_forecast} periods ahead...")
            
            # Generate future dataframe
            try:
                df_future = model.make_future_dataframe(
                    df=df_history,
                    periods=periods_to_forecast,
                    n_historic_predictions=False,
                    events=holidays_df
                )
            except Exception as e:
                raise RuntimeError(f"Failed to create future dataframe: {str(e)}")
            
            # Predict
            try:
                df_forecast = model.predict(df_future)
            except Exception as e:
                raise RuntimeError(f"Prediction failed: {str(e)}")
            
            if df_forecast is None or df_forecast.empty:
                warnings.warn(f"Empty forecast at iteration {iteration}")
                break
            
            # Extract forecast values more robustly
            forecast_data = extract_forecast_values(df_forecast, periods_to_forecast)
            
            if not forecast_data:
                warnings.warn(f"No valid forecast data extracted at iteration {iteration}")
                break
            
            # Add to results
            all_forecasts.extend(forecast_data)
            
            # Update history for next iteration
            new_history_rows = [
                {'ds': row['ds'], 'y': row['yhat1']} 
                for row in forecast_data
            ]
            
            if new_history_rows:
                df_new_history = pd.DataFrame(new_history_rows)
                df_history = pd.concat([df_history, df_new_history], ignore_index=True)
                
                # Trim history if specified to manage memory
                if max_history_length and len(df_history) > max_history_length:
                    df_history = df_history.tail(max_history_length).reset_index(drop=True)
                
                remaining_periods -= len(new_history_rows)
            else:
                break
                
    except Exception as e:
        raise RuntimeError(f"Recursive forecasting failed: {str(e)}")
    
    # Convert to DataFrame and limit to requested periods
    if not all_forecasts:
        warnings.warn("No forecasts generated")
        return pd.DataFrame(columns=['ds', 'yhat1'])
    
    result_df = pd.DataFrame(all_forecasts)
    result_df = result_df.head(forecast_periods)
    
    # Ensure proper data types
    result_df['ds'] = pd.to_datetime(result_df['ds'])
    result_df['yhat1'] = pd.to_numeric(result_df['yhat1'], errors='coerce')
    
    return result_df


def extract_forecast_values(df_forecast, expected_periods):
    """
    Extract forecast values from prediction DataFrame more robustly.
    
    Args:
        df_forecast (pd.DataFrame): Forecast results from model.predict()
        expected_periods (int): Expected number of forecast periods
        
    Returns:
        list: List of dictionaries with 'ds' and 'yhat1' keys
    """
    forecast_data = []
    
    # Find available forecast columns
    yhat_cols = [col for col in df_forecast.columns if col.startswith('yhat')]
    
    if not yhat_cols:
        warnings.warn("No yhat columns found in forecast")
        return forecast_data
    
    # Get the last N rows (future predictions)
    future_rows = df_forecast.tail(expected_periods)
    
    for _, row in future_rows.iterrows():
        # Use the first available yhat column or yhat1 specifically
        yhat_value = None
        
        if 'yhat1' in row and pd.notna(row['yhat1']):
            yhat_value = row['yhat1']
        elif yhat_cols and pd.notna(row[yhat_cols[0]]):
            yhat_value = row[yhat_cols[0]]
        
        if yhat_value is not None and pd.notna(row['ds']):
            forecast_data.append({
                'ds': row['ds'],
                'yhat1': float(yhat_value)
            })
    
    return forecast_data

In [0]:
forecast_30day = recursive_predict(m, train_df, forecast_periods=30)
print(forecast_30day)

In [0]:
plt.figure(figsize=(10, 6))
plt.plot(forecast_30day['ds'], forecast_30day['yhat1'], label='Forecast', color='orange')
plt.plot(test_df['ds'], test_df['y'], label='Actual', color='blue')
plt.xlabel('Date')
plt.ylabel('Forecasted Value')
plt.title('90 Day Forecasted Timeline')
plt.legend()
plt.grid(True)
plt.tight_layout()



In [0]:
model_testing(test_df[0:30],forecast_90day[0:30])

In [0]:
%restart_python

#Exporting the model

##Export the model

In [0]:
import mlflow
import mlflow.pyfunc
import pandas as pd
import pickle
import os
from mlflow.tracking import MlflowClient

# Create directory if it doesn't exist
os.makedirs("/dbfs/tmp/", exist_ok=True)

class NeuralProphetWrapper(mlflow.pyfunc.PythonModel):
    def __init__(self, model=None):
        self.model = model
        
    def load_context(self, context):
        # Load the pickled model
        with open(context.artifacts["model_path"], "rb") as f:
            self.model = pickle.load(f)
            
        # Load holidays if they exist
        if "holidays_path" in context.artifacts:
            with open(context.artifacts["holidays_path"], "rb") as f:
                self.holidays_df = pickle.load(f)
        else:
            self.holidays_df = None
            
    def predict_old(self, context, model_input):
        # Convert to DataFrame if it's not already
        if not isinstance(model_input, pd.DataFrame):
            model_input = pd.DataFrame(model_input)
            
        # Ensure 'ds' column exists and is datetime
        if 'ds' not in model_input.columns:
            raise ValueError("Input must contain 'ds' column with dates")
        
        model_input['ds'] = pd.to_datetime(model_input['ds'])
        
        # For models with holidays, add holiday events
        if hasattr(self, 'holidays_df') and self.holidays_df is not None:
            # Filter holidays for the prediction period
            pred_start = model_input['ds'].min()
            pred_end = model_input['ds'].max()
            relevant_holidays = self.holidays_df[
                (self.holidays_df['ds'] >= pred_start) & 
                (self.holidays_df['ds'] <= pred_end)
            ]
            
            if not relevant_holidays.empty:
                # Add events to the dataframe
                model_input = self.model.create_df_with_events(model_input, relevant_holidays)
        
        # Make prediction
        try:
            forecast = self.model.predict(model_input)
            return forecast
        except Exception as e:
            print(f"Prediction error: {e}")

    def predict(self, context, model_input):
        # Convert to DataFrame if it's not already
        if not isinstance(model_input, pd.DataFrame):
            model_input = pd.DataFrame(model_input)
            
        # Ensure 'ds' column exists and is datetime
        if 'ds' not in model_input.columns:
            raise ValueError("Input must contain 'ds' column with dates")
        
        model_input['ds'] = pd.to_datetime(model_input['ds'])
        
        # Check if this is historical data (has 'y' column) or future data (no 'y' column)
        has_y_values = 'y' in model_input.columns and model_input['y'].notna().any()
        
        # For models with holidays, add holiday events
        if hasattr(self, 'holidays_df') and self.holidays_df is not None:
            # Filter holidays for the prediction period
            pred_start = model_input['ds'].min()
            pred_end = model_input['ds'].max()
            relevant_holidays = self.holidays_df[
                (self.holidays_df['ds'] >= pred_start) & 
                (self.holidays_df['ds'] <= pred_end)
            ]
            
            if not relevant_holidays.empty:
                try:
                    # Only add events if we have historical data OR if this is pure forecasting
                    # For pure forecasting, we need to handle this differently
                    if has_y_values:
                        # Historical data with y values - safe to use create_df_with_events
                        model_input = self.model.create_df_with_events(model_input, relevant_holidays)
                    else:
                        # Pure forecasting - manually add holiday columns
                        model_input = self._add_holidays_manually(model_input, relevant_holidays)
                        
                except Exception as e:
                    print(f"Warning: Could not add events to dataframe: {e}")
                    # If events can't be added, proceed without them
                    pass
        
        # Make prediction
        try:
            # For pure forecasting (no y values), we might need to handle differently
            if not has_y_values and len(model_input) < 37:  # n_lags + n_forecasts
                raise ValueError("Insufficient data for forecasting. Need at least 37 rows for this model.")
                
            forecast = self.model.predict(model_input)
            return forecast
            
        except Exception as e:
            print(f"Prediction error: {e}")
            raise e

    def _add_holidays_manually(self, df, holidays_df):
        """
        Manually add holiday columns without using create_df_with_events
        which requires 'y' column
        """
        # Create a copy to avoid modifying original
        result_df = df.copy()
        
        # Add Holiday column, defaulting to 0
        result_df['Holiday'] = 0
        
        # Mark holidays
        holiday_dates = set(holidays_df['ds'].dt.date)
        result_df.loc[result_df['ds'].dt.date.isin(holiday_dates), 'Holiday'] = 1
        
        return result_df
    
    def make_future_dataframe_old(self, df, periods, freq='D'):
        if not isinstance(df, pd.DataFrame):
            df = pd.DataFrame(df)
            
        if 'ds' not in df.columns:
            raise ValueError("DataFrame must contain 'ds' column")
            
        df['ds'] = pd.to_datetime(df['ds'])
        last_date = df['ds'].max()
        
        # Create future dates
        future_dates = pd.date_range(
            start=last_date + pd.Timedelta(days=1),
            periods=periods,
            freq=freq
        )
        
        # Create future dataframe
        future_df = pd.DataFrame({'ds': future_dates})
        
        return future_df
    
    def make_future_dataframe(self, historical_df, periods, freq='D'):
        """
        Create a dataframe for forecasting that includes historical context
        This is the key method for proper forecasting
        """
        if not isinstance(historical_df, pd.DataFrame):
            historical_df = pd.DataFrame(historical_df)
            
        if 'ds' not in historical_df.columns:
            raise ValueError("DataFrame must contain 'ds' column")
            
        historical_df['ds'] = pd.to_datetime(historical_df['ds'])
        last_date = historical_df['ds'].max()
        
        # For NeuralProphet with lags, we need historical context
        # Get last 30 days of historical data (n_lags)
        context_start = last_date - pd.Timedelta(days=29)  # 30 days total including last_date
        historical_context = historical_df[
            historical_df['ds'] >= context_start
        ].copy()
        
        # Create future dates
        future_dates = pd.date_range(
            start=last_date + pd.Timedelta(days=1),
            periods=periods,
            freq=freq
        )
        
        # Create future dataframe (no y values for pure forecasting)
        future_df = pd.DataFrame({'ds': future_dates})
        
        # Combine historical context with future dates
        forecast_df = pd.concat([
            historical_context[['ds', 'y']],  # Include y for historical context
            future_df  # No y for future dates
        ], ignore_index=True)
        
        # Add holidays if available
        if hasattr(self, 'holidays_df') and self.holidays_df is not None:
            forecast_df = self._add_holidays_manually(forecast_df, self.holidays_df)
        
        return forecast_df

# Save models using pickle
def save_model(model, path):
    with open(path, 'wb') as f:
        pickle.dump(model, f)
    return path

# Save models
model_7_day_path = "/dbfs/tmp/WellnessSalesForecast_7_day.pkl"
holidays_path = "/dbfs/tmp/holidays_df.pkl"

# Save each model using pickle
save_model(m, model_7_day_path)

# Save holidays dataframe
holidays_path = "/dbfs/tmp/holidays_df.pkl"
with open(holidays_path, "wb") as f:
    pickle.dump(holidays_df, f)

# Function to register a model
def register_model_old(model_path, model_name, model_type, holidays_path=None):
    artifacts = {"model_path": model_path}
    
    # Add holidays path if needed
    if model_type in ["holidays", "autoregression"] and holidays_path:
        artifacts["holidays_path"] = holidays_path
    
    with mlflow.start_run(run_name=f"NeuralProphet_{model_type.capitalize()}_Model") as run:
        mlflow.pyfunc.log_model(
            artifact_path=f"neuralprophet_{model_type}_model",
            python_model=NeuralProphetWrapper(),
            artifacts=artifacts,
            conda_env={
                "channels": ["defaults", "conda-forge"],
                "dependencies": [
                    "python=3.8.0",
                    "pip",
                    {"pip": [
                        "neuralprophet>=0.5.0",
                        "pandas>=1.3.0",
                        "numpy>=1.20.0",
                        "matplotlib>=3.4.0",
                        "plotly>=5.0.0",
                        "torch>=1.9.0"
                    ]}
                ],
                "name": "neuralprophet_env"
            }
        )
        run_id = run.info.run_id
    
    # Register the model
    model_uri = f"runs:/{run_id}/neuralprophet_{model_type}_model"
    model_details = mlflow.register_model(model_uri=model_uri, name=model_name)
    
    # Transition the model to production
    client = MlflowClient()
    client.transition_model_version_stage(
        name=model_name,
        version=model_details.version,
        stage="Production"
    )
    
    return model_details

def register_model(model_path, model_name, model_type, holidays_path=None):
    artifacts = {"model_path": model_path}
    
    if holidays_path:
        artifacts["holidays_path"] = holidays_path
    
    with mlflow.start_run(run_name=f"NeuralProphet_{model_type.capitalize()}_Model_Fixed") as run:
        mlflow.pyfunc.log_model(
            artifact_path=f"neuralprophet_{model_type}_model",
            python_model=NeuralProphetWrapper(),
            artifacts=artifacts,
            conda_env={
                "channels": ["defaults", "conda-forge"],
                "dependencies": [
                    "python=3.8.0",
                    "pip",
                    {"pip": [
                        "neuralprophet>=1.0.0",
                        "pandas>=1.3.0",
                        "numpy>=1.20.0",
                        "matplotlib>=3.4.0",
                        "plotly>=5.0.0",
                        "torch>=1.9.0"
                    ]}
                ],
                "name": "neuralprophet_env"
            }
        )
        
        # Log model parameters
        mlflow.log_params({
            'n_lags': 30,
            'n_forecasts': 7,
            'weekly_seasonality': True,
            'has_holidays': holidays_path is not None,
            'quantiles': [0.05, 0.95],
            'forecasting_method': 'with_historical_context',
            'min_required_rows': 37
        })
        
        run_id = run.info.run_id
    
    # Register the model
    model_uri = f"runs:/{run_id}/neuralprophet_{model_type}_model"
    model_details = mlflow.register_model(model_uri=model_uri, name=model_name)
    
    # Transition the model to production
    client = MlflowClient()
    client.transition_model_version_stage(
        name=model_name,
        version=model_details.version,
        stage="Production"
    )
    
    return model_details


# Register the model
base_model_details = register_model(
    model_7_day_path, 
    "WellnessSalesForecast_7_day", 
    "holidays",
    holidays_path=holidays_path
)

##Use model saved to ML Flow

In [0]:
import mlflow
import mlflow.pyfunc
import pandas as pd
import pickle
import os
from mlflow.tracking import MlflowClient

model_name = "WellnessSalesForecast_7_Day"
model_stage = "1"  # or "Staging" or specific version like "1"

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_stage}")


In [0]:
# Example input for prediction
future_df = pd.DataFrame({
    "ds": pd.date_range(start="2025-06-10", periods=30)
})

# Predict
forecast = model.predict(test_df)
forecast.head()


##Using imported model

In [0]:
import mlflow
import mlflow.sklearn

In [0]:
expirement_name = "/Workspace/Users/ryan@delve.systems/FirstExperiment"
#mlflow.create_experiment(expirement_name)
mlflow.set_experiment(expirement_name)


In [0]:
%restart_python

In [0]:

import pickle
with open('/Workspace/Users/ryan@delve.systems/Prophet_AI/neuralprophet_base.pkl', 'rb') as f:
    model = pickle.load(f)

In [0]:
with mlflow.start_run():
    mlflow.log_param("model_type", "sklearn")
    mlflow.sklearn.log_model(model, "model")