## Tabular training

In [520]:
# Install libraries on first run
#! pip install -q ipynb fastai pathlib pandas import_ipynb numpy

In [521]:
from fastai.tabular.all import *
import yfinance as yf
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import import_ipynb
import numpy as np
import random

## Variables

In [522]:
modelName = 'stockScreenerV3.0'
trainingDataName = 'stockData.csv'
trainingFolder = Path.cwd().parent / 'TrainingData'
modelFolder = Path.cwd().parent.parent / 'TrainedModels'
testFolder = Path.cwd().parent / 'TestData'

# Training parameters
yNames = ['Future Year Change']
catNames = ['Industry']
contNames = [
    'Open',
    'High', 
    'Low', 
    'Close', 
    'Volume', 
    'Dividends', 
    'Stock Splits', 
    'EV/EBIT', 
    'ROIC'
]
epochs = 2

# Test parameters
testSize = 100 # Number of stocks to test

Then we can have a look at how the data is structured:

In [523]:
dataPath = Path()
df = pd.read_csv(trainingFolder/trainingDataName)
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Future Year Change,Ticker,Industry,Adj Close,Capital Gains,EV/EBIT,ROIC
0,2017-10-20 00:00:00-04:00,16.25,16.99,14.0,16.26,37563700.0,0.0,0.0,-0.194342,SE,Internet Retail,,,2.190711,0.360613
1,2017-10-23 00:00:00-04:00,16.1,16.399,15.12,15.26,5753800.0,0.0,0.0,-0.145478,SE,Internet Retail,,,1.96306,0.402433
2,2017-10-24 00:00:00-04:00,15.4,15.86,14.77,15.24,3748300.0,0.0,0.0,-0.179134,SE,Internet Retail,,,1.958507,0.403368
3,2017-10-25 00:00:00-04:00,15.1,15.43,13.62,13.73,4408100.0,0.0,0.0,-0.071377,SE,Internet Retail,,,1.614754,0.489238
4,2017-10-26 00:00:00-04:00,13.82,14.16,13.81,14.02,1850000.0,0.0,0.0,-0.099857,SE,Internet Retail,,,1.680773,0.470022


Some of the columns are continuous (like age) and we will treat them as float numbers we can feed our model directly. Others are categorical (like workclass or education) and we will convert them to a unique index that we will feed to embedding layers. We can specify our categorical and continuous column names, as well as the name of the dependent variable in TabularDataLoaders factory methods:

In [524]:
dls = TabularDataLoaders.from_csv(trainingFolder/trainingDataName, path=dataPath, 
    y_names=yNames,
    cat_names=catNames,
    cont_names=contNames,
    procs = [Categorify, FillMissing, Normalize])

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


The last part is the list of pre-processors we apply to our data:

* Categorify is going to take every categorical variable and make a map from integer to unique categories, then replace the values by the corresponding index.
* FillMissing will fill the missing values in the continuous variables by the median of existing values (you can choose a specific value if you prefer)
* Normalize will normalize the continuous variables (subtract the mean and divide by the std)

To further expose what’s going on below the surface, let’s rewrite this utilizing fastai’s TabularPandas class. We will need to make one adjustment, which is defining how we want to split our data. By default the factory method above used a random 80/20 split, so we will do the same:

In [525]:
splits = EndSplitter (valid_pct=0.2, valid_last=True)(range_of(df))

In [526]:
to = TabularPandas(df, procs=[Categorify, FillMissing, Normalize],
    y_names=yNames,
    cat_names = catNames,
    cont_names = contNames,
    splits=splits)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


Once we build our TabularPandas object, our data is completely preprocessed as seen below:

In [527]:
to.xs.iloc[:1]

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC
0,23,1,1,-0.161926,-0.157894,-0.183005,-0.162741,5.635154,-0.050866,-0.013349,-0.19608,-0.085114


Now we can build our DataLoaders again:

In [528]:
dls = to.dataloaders(bs=64)

The show_batch method works like for every other application:

In [529]:
dls.show_batch()

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC,Future Year Change
0,Aluminum,False,False,1.994418,2.010076,1.99129,2.010073,53932.99,-2.991955e-11,1.158889e-12,1.303352,0.60613,-0.028148
1,Specialty Business Services,False,False,5.199209,5.247439,5.189229,5.195881,91599.91,-2.991955e-11,1.158889e-12,11.792067,0.066994,0.149287
2,Software - Infrastructure,False,False,285.350009,287.14999,280.560002,286.799994,476000.0,-2.991955e-11,1.158889e-12,64.22067,0.012301,-0.287273
3,Banks - Diversified,False,False,29.064284,29.067932,28.91836,29.027804,213200.0,-2.991955e-11,1.158889e-12,-31.604159,-0.024996,0.139638
4,Internet Retail,False,False,160.236684,160.265663,156.026033,156.450954,10615200.0,-2.991955e-11,1.158889e-12,1.144411,0.690311,0.023272
5,Solar,False,False,55.880001,56.75,54.490001,55.75,1175100.0,-2.991955e-11,1.158889e-12,9.336225,0.084617,3.513005
6,Oil & Gas Integrated,False,False,28.890104,29.020035,28.714317,28.737245,6105900.0,-2.991955e-11,1.158889e-12,3.754714,0.210403,-0.395126
7,Banks - Diversified,False,False,15.756809,15.783613,15.500938,15.547238,373600.0,-2.991955e-11,1.158889e-12,2.868898,0.275367,0.298291
8,Beverages - Wineries & Distilleries,False,False,10.465933,10.627777,10.465932,10.519883,38599.98,-2.991955e-11,1.158889e-12,8.854311,0.089222,0.123077
9,Specialty Industrial Machinery,True,True,17.433594,17.766074,17.401421,17.648096,292800.1,-2.991955e-11,1.158889e-12,4.719044,0.095212,0.104912


We can define a model using the tabular_learner method. When we define our model, fastai will try to infer the loss function based on our y_names earlier.

Note: Sometimes with tabular data, your y’s may be encoded (such as 0 and 1). In such a case you should explicitly pass y_block = CategoryBlock in your constructor so fastai won’t presume you are doing regression.

In [530]:
learn = tabular_learner(dls, metrics=[rmse, mae])

And we can train that model with the fit_one_cycle method (the fine_tune method won’t be useful here since we don’t have a pretrained model).

In [531]:
print(f"Training {modelName} for {epochs} epochs")
learn.fit_one_cycle(epochs)

Training stockScreenerV3.0 for 2 epochs


epoch,train_loss,valid_loss,_rmse,mae,time
0,0.915538,635302.25,797.058716,11.074199,01:54
1,0.975554,5080544.0,2254.006348,28.492224,01:54


We can then have a look at some training predictions:

In [532]:
learn.show_results(max_n=15)

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC,Future Year Change,Future Year Change_pred
0,4.0,1.0,1.0,-0.240541,-0.242095,-0.240705,-0.240842,-0.325975,-0.050866,-0.013349,0.30947,-0.087906,-0.364208,0.044148
1,5.0,1.0,1.0,-0.211911,-0.213705,-0.212044,-0.212523,-0.482072,-0.050866,-0.013349,-5.320011,-0.088565,-0.574354,0.016379
2,43.0,1.0,1.0,-0.266936,-0.268429,-0.267422,-0.268192,1.352797,-0.050866,-0.013349,-0.036688,-0.087141,0.856337,-0.001043
3,23.0,1.0,1.0,-0.242728,-0.24402,-0.24427,-0.242862,0.54003,-0.050866,-0.013349,-0.349457,-0.096151,1.041816,1.161095
4,27.0,1.0,1.0,-0.242046,-0.243642,-0.241959,-0.242495,0.32702,-0.050866,-0.013349,-0.047646,-0.087083,0.129276,0.479999
5,17.0,1.0,1.0,-0.173874,-0.17436,-0.173094,-0.174174,1.436521,-0.050866,-0.013349,1.018593,-0.088222,-0.533724,0.033117
6,17.0,1.0,1.0,-0.253511,-0.255738,-0.254084,-0.25476,-0.197302,-0.050866,-0.013349,0.350216,-0.087943,-0.108614,0.061872
7,35.0,1.0,1.0,0.63338,0.637155,0.620665,0.646249,-0.112532,-0.050866,-0.013349,6.359062,-0.08844,-0.074592,-0.433975
8,23.0,1.0,1.0,0.517324,0.558773,0.50965,0.552436,-0.25217,-0.050866,-0.013349,-0.229739,-0.083551,-0.193242,0.782532
9,31.0,1.0,1.0,-0.029161,-0.031924,-0.029674,-0.029326,0.13632,-0.050866,-0.013349,0.18936,-0.087763,0.171889,-0.010915


## Evaluation

In [538]:
import stockFetcher

In [539]:
def evaluate_model(learn, test_tickers, model_name, model_folder, cont_names, cat_names):
    """
    Evaluate a fastai model on a list of test tickers and log the results.
    
    Args:
        learn: fastai Learner object
        test_tickers (list): List of ticker symbols to test on
        model_name (str): Name of the model for logging
        model_folder (Path): Path to save evaluation results
        cont_names (list): List of continuous feature names
        cat_names (list): List of categorical feature names
    """
    all_predictions = []
    all_actuals = []
    test_data_list = []
    
    # Collect test data for all tickers
    for ticker in test_tickers:
        try:
            # Get test data
            test_data = stockFetcher.getTickerDataFrom1YrAgo(ticker)
            if test_data.empty:
                print(f"Skipping {ticker} due to missing data")
                continue
            
            test_data_list.append(test_data)
            
        except Exception as e:
            print(f"Error fetching data for {ticker}: {e}")
            continue
    
    if not test_data_list:
        print("No valid test data collected")
        return None, None, None
    
    # Combine all test data
    combined_test_data = pd.concat(test_data_list, ignore_index=True)
    
    # Create fastai test dataloader
    test_dl = learn.dls.test_dl(combined_test_data)
    
    # Get predictions
    preds, targs = learn.get_preds(dl=test_dl)
    
    # Convert to numpy arrays
    predictions = preds.numpy()
    actuals = targs.numpy()
    
    # Create DataFrame for analysis
    results_df = pd.DataFrame({
        'Predicted': predictions.flatten(),
        'Actual': actuals.flatten()
    })

    # Calculate residuals
    results_df['Residual'] = results_df['Actual'] - results_df['Predicted']

    # Define outlier threshold (2 standard deviations)
    outlier_threshold = 2 * results_df['Residual'].std()

    # Filter outliers
    filtered_df = results_df[abs(results_df['Residual']) <= outlier_threshold]

    # Calculate metrics
    mae = np.mean(np.abs(filtered_df['Residual']))
    rmse = np.sqrt(np.mean(filtered_df['Residual']**2))
    r2 = 1 - (np.sum(filtered_df['Residual']**2) / 
              np.sum((filtered_df['Actual'] - filtered_df['Actual'].mean())**2))

    # Log results
    log_evaluation(model_name, mae, rmse, r2, model_folder)
    
    # Create visualizations
    plot_results(filtered_df, model_name, model_folder)
    
    return mae, rmse, r2

def log_evaluation(model_name, mae, rmse, r2, model_folder):
    """Log evaluation metrics to CSV file"""
    log_file = model_folder / "modelEvaluations.csv"
    
    new_entry_df = pd.DataFrame([{
        "Model Name": modelName,
        "Timestamp": datetime.now().strftime('%Y-%m-%d %H:%M'),
        "MAE": f'{mae:.3f}',
        "RMSE": f'{rmse:.3f}',
        "R2": f'{r2:.3f}',
        "Epochs": epochs,
        "Test Amount": len(get_random_test_tickers(n_tickers=testSize)),
        "Cat Names": catNames,
        "Cont Names": contNames,
    }])
    
    try:
        log_df = pd.read_csv(log_file)
        log_df = pd.concat([log_df, new_entry_df], ignore_index=True)
    except FileNotFoundError:
        log_df = new_entry_df
        
    log_df.to_csv(log_file, index=False)
    print(f"Logged evaluation results to {log_file}")

def plot_results(filtered_df, model_name, model_folder):
    """Create and save visualization plots"""
    plt.figure(figsize=(12, 8))
    
    # Scatter plot
    plt.subplot(2, 1, 1)
    actuals = filtered_df['Actual']
    predictions = filtered_df['Predicted']
    plt.scatter(actuals, predictions, alpha=0.7, label='Predictions')
    
    # Perfect prediction line
    min_val = min(actuals.min(), predictions.min())
    max_val = max(actuals.max(), predictions.max())
    plt.plot([min_val, max_val], [min_val, max_val], 
             color='red', linestyle='--', label='Perfect Prediction')
    
    plt.title(f'Predicted vs. Actual Returns - {model_name}', fontsize=14)
    plt.xlabel('Actual Returns', fontsize=12)
    plt.ylabel('Predicted Returns', fontsize=12)
    plt.legend()
    plt.grid(alpha=0.5)
    
    # Residual plot
    plt.subplot(2, 1, 2)
    plt.scatter(predictions, filtered_df['Residual'], alpha=0.7)
    plt.axhline(y=0, color='r', linestyle='--')
    plt.title('Residual Plot', fontsize=14)
    plt.xlabel('Predicted Returns', fontsize=12)
    plt.ylabel('Residual', fontsize=12)
    plt.grid(alpha=0.5)
    
    plt.tight_layout()
    
    # Show plot
    plt.show()

# Function to get random test tickers
def get_random_test_tickers(n_tickers):
    """
    Get random US-listed tickers that aren't in our training set.
    
    Args:
        n_tickers (int): Number of test tickers to return
        
    Returns:
        list: List of ticker symbols
    """
    training_tickers = set(stockFetcher.symbols)
    
    # Get US exchange tickers using pandas_datareader
    try:
        # Get ADR tickers
        adr_df = pd.read_csv(testFolder / 'tickers.csv')
        tickers = adr_df['Ticker'].tolist()
        
        # Clean tickers (remove warrants, preferred shares, etc.)
        clean_tickers = [
            ticker for ticker in tickers 
            if ticker not in training_tickers
        ]
        
        # Randomly select tickers
        if len(clean_tickers) < n_tickers:
            print(f"Warning: Only {len(clean_tickers)} tickers available")
            return clean_tickers
            
        return np.random.choice(clean_tickers, size=n_tickers, replace=False).tolist()
        
    except Exception as e:
        print(f"Error fetching tickers: {e}")
        # Fallback to a list of common US tickers not in training set
        fallback_tickers = [
            'KO', 'PEP', 'JNJ', 'PG', 'WMT', 'HD', 'MCD', 'NKE', 
            'DIS', 'SBUX', 'COST', 'TGT', 'LOW', 'MO', 'CVS'
        ]
        fallback_tickers = [t for t in fallback_tickers if t not in training_tickers]
        return np.random.choice(fallback_tickers, size=min(n_tickers, len(fallback_tickers)), replace=False).tolist()
    
if __name__ == "__main__":
    
    # Evaluate model (replace learn with your actual learner)
    mae, rmse, r2 = evaluate_model(
        learn=learn,  # Your fastai learner
        test_tickers = get_random_test_tickers(n_tickers=testSize),
        model_name=modelName,
        model_folder=modelFolder,
        cont_names=contNames,
        cat_names=catNames
    )
    
    if mae is not None and rmse is not None and r2 is not None:
        print(f"Evaluation Results:")
        print(f"MAE: {mae:.3f}")
        print(f"RMSE: {rmse:.3f}")
        print(f"R2: {r2:.3f}")
    else:
        print("Evaluation failed. Metrics are None.")


$EOG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$GSK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$IBKR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for EOG: No historical data available for EOG around 2024-01-21.
Skipping EOG due to missing data
Error fetching data for GSK: No historical data available for GSK around 2024-01-21.
Skipping GSK due to missing data
Error fetching data for IBKR: No historical data available for IBKR around 2024-01-21.
Skipping IBKR due to missing data


$DDOG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AMTD: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DEO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ING: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for DDOG: No historical data available for DDOG around 2024-01-21.
Skipping DDOG due to missing data
Error fetching data for AMTD: No historical data available for AMTD around 2024-01-21.
Skipping AMTD due to missing data
Error fetching data for DEO: No historical data available for DEO around 2024-01-21.
Skipping DEO due to missing data
Error fetching data for ING: No historical data available for ING around 2024-01-21.
Skipping ING due to missing data


$WMB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for WMB: No historical data available for WMB around 2024-01-21.
Skipping WMB due to missing data


$HAL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$PLUG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TECK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HAL: No historical data available for HAL around 2024-01-21.
Skipping HAL due to missing data
Error fetching data for PLUG: No historical data available for PLUG around 2024-01-21.
Skipping PLUG due to missing data
Error fetching data for TECK: No historical data available for TECK around 2024-01-21.
Skipping TECK due to missing data


$NESTLE: possibly delisted; no timezone found
$RDFN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$NTTYY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for NESTLE: No historical data available for NESTLE around 2024-01-21.
Skipping NESTLE due to missing data
Error fetching data for RDFN: No historical data available for RDFN around 2024-01-21.
Skipping RDFN due to missing data
Error fetching data for NTTYY: No historical data available for NTTYY around 2024-01-21.
Skipping NTTYY due to missing data


$DSY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TAL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$Z: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for DSY: No historical data available for DSY around 2024-01-21.
Skipping DSY due to missing data
Error fetching data for TAL: No historical data available for TAL around 2024-01-21.
Skipping TAL due to missing data
Error fetching data for Z: No historical data available for Z around 2024-01-21.
Skipping Z due to missing data


$AM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$HIG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AM: No historical data available for AM around 2024-01-21.
Skipping AM due to missing data
Error fetching data for HIG: No historical data available for HIG around 2024-01-21.
Skipping HIG due to missing data


$FRC: possibly delisted; no timezone found
$ENPH: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AFRM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AGI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for FRC: No historical data available for FRC around 2024-01-21.
Skipping FRC due to missing data
Error fetching data for ENPH: No historical data available for ENPH around 2024-01-21.
Skipping ENPH due to missing data
Error fetching data for AFRM: No historical data available for AFRM around 2024-01-21.
Skipping AFRM due to missing data
Error fetching data for AGI: No historical data available for AGI around 2024-01-21.
Skipping AGI due to missing data


$IMBBY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$HSBC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for IMBBY: No historical data available for IMBBY around 2024-01-21.
Skipping IMBBY due to missing data
Error fetching data for HSBC: No historical data available for HSBC around 2024-01-21.
Skipping HSBC due to missing data


$NTR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DASH: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$PM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SPGI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$WBK: possibly delisted; no timezone found


Error fetching data for NTR: No historical data available for NTR around 2024-01-21.
Skipping NTR due to missing data
Error fetching data for DASH: No historical data available for DASH around 2024-01-21.
Skipping DASH due to missing data
Error fetching data for PM: No historical data available for PM around 2024-01-21.
Skipping PM due to missing data
Error fetching data for SPGI: No historical data available for SPGI around 2024-01-21.
Skipping SPGI due to missing data
Error fetching data for WBK: No historical data available for WBK around 2024-01-21.
Skipping WBK due to missing data


$VIPS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for VIPS: No historical data available for VIPS around 2024-01-21.
Skipping VIPS due to missing data


$AA: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CSGN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BABA: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$EL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AA: No historical data available for AA around 2024-01-21.
Skipping AA due to missing data
Error fetching data for CSGN: No historical data available for CSGN around 2024-01-21.
Skipping CSGN due to missing data
Error fetching data for BABA: No historical data available for BABA around 2024-01-21.
Skipping BABA due to missing data
Error fetching data for EL: No historical data available for EL around 2024-01-21.
Skipping EL due to missing data


$ADSK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SNY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for ADSK: No historical data available for ADSK around 2024-01-21.
Skipping ADSK due to missing data
Error fetching data for SNY: No historical data available for SNY around 2024-01-21.
Skipping SNY due to missing data


$AU: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$VOD: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AU: No historical data available for AU around 2024-01-21.
Skipping AU due to missing data
Error fetching data for VOD: No historical data available for VOD around 2024-01-21.
Skipping VOD due to missing data


$BLDP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BAM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AMT: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SAP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SONY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for BLDP: No historical data available for BLDP around 2024-01-21.
Skipping BLDP due to missing data
Error fetching data for BAM: No historical data available for BAM around 2024-01-21.
Skipping BAM due to missing data
Error fetching data for AMT: No historical data available for AMT around 2024-01-21.
Skipping AMT due to missing data
Error fetching data for SAP: No historical data available for SAP around 2024-01-21.
Skipping SAP due to missing data
Error fetching data for SONY: No historical data available for SONY around 2024-01-21.
Skipping SONY due to missing data


$FSLY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$LMND: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BKR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for FSLY: No historical data available for FSLY around 2024-01-21.
Skipping FSLY due to missing data
Error fetching data for LMND: No historical data available for LMND around 2024-01-21.
Skipping LMND due to missing data
Error fetching data for BKR: No historical data available for BKR around 2024-01-21.
Skipping BKR due to missing data


$VWAGY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SAN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for VWAGY: No historical data available for VWAGY around 2024-01-21.
Skipping VWAGY due to missing data
Error fetching data for SAN: No historical data available for SAN around 2024-01-21.
Skipping SAN due to missing data


$UBS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$EDU: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for UBS: No historical data available for UBS around 2024-01-21.
Skipping UBS due to missing data
Error fetching data for CX: No historical data available for CX around 2024-01-21.
Skipping CX due to missing data
Error fetching data for EDU: No historical data available for EDU around 2024-01-21.
Skipping EDU due to missing data


$GFI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RLI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$STM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for GFI: No historical data available for GFI around 2024-01-21.
Skipping GFI due to missing data
Error fetching data for RLI: No historical data available for RLI around 2024-01-21.
Skipping RLI due to missing data
Error fetching data for STM: No historical data available for STM around 2024-01-21.
Skipping STM due to missing data


$HBAN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$JKS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HBAN: No historical data available for HBAN around 2024-01-21.
Skipping HBAN due to missing data
Error fetching data for JKS: No historical data available for JKS around 2024-01-21.
Skipping JKS due to missing data


$DLR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TSN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for DLR: No historical data available for DLR around 2024-01-21.
Skipping DLR due to missing data
Error fetching data for TSN: No historical data available for TSN around 2024-01-21.
Skipping TSN due to missing data


$NVS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SIEGY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for NVS: No historical data available for NVS around 2024-01-21.
Skipping NVS due to missing data
Error fetching data for SIEGY: No historical data available for SIEGY around 2024-01-21.
Skipping SIEGY due to missing data


$OKE: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RELX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CHKP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for OKE: No historical data available for OKE around 2024-01-21.
Skipping OKE due to missing data
Error fetching data for RELX: No historical data available for RELX around 2024-01-21.
Skipping RELX due to missing data
Error fetching data for CHKP: No historical data available for CHKP around 2024-01-21.
Skipping CHKP due to missing data


$BP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ALLY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for BP: No historical data available for BP around 2024-01-21.
Skipping BP due to missing data
Error fetching data for ALLY: No historical data available for ALLY around 2024-01-21.
Skipping ALLY due to missing data


$ET: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TWLO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$COF: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for ET: No historical data available for ET around 2024-01-21.
Skipping ET due to missing data
Error fetching data for TWLO: No historical data available for TWLO around 2024-01-21.
Skipping TWLO due to missing data
Error fetching data for COF: No historical data available for COF around 2024-01-21.
Skipping COF due to missing data


$MSCI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$U: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$GDS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for MSCI: No historical data available for MSCI around 2024-01-21.
Skipping MSCI due to missing data
Error fetching data for U: No historical data available for U around 2024-01-21.
Skipping U due to missing data
Error fetching data for GDS: No historical data available for GDS around 2024-01-21.
Skipping GDS due to missing data


$ABBV: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$LVMUY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for ABBV: No historical data available for ABBV around 2024-01-21.
Skipping ABBV due to missing data
Error fetching data for LVMUY: No historical data available for LVMUY around 2024-01-21.
Skipping LVMUY due to missing data


$FERR: possibly delisted; no timezone found
$ALB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for FERR: No historical data available for FERR around 2024-01-21.
Skipping FERR due to missing data
Error fetching data for ALB: No historical data available for ALB around 2024-01-21.
Skipping ALB due to missing data


$NEM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CYBR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BBVA: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for NEM: No historical data available for NEM around 2024-01-21.
Skipping NEM due to missing data
Error fetching data for CYBR: No historical data available for CYBR around 2024-01-21.
Skipping CYBR due to missing data
Error fetching data for BBVA: No historical data available for BBVA around 2024-01-21.
Skipping BBVA due to missing data


$ANZBY: possibly delisted; no timezone found
$PAAS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for ANZBY: No historical data available for ANZBY around 2024-01-21.
Skipping ANZBY due to missing data
Error fetching data for PAAS: No historical data available for PAAS around 2024-01-21.
Skipping PAAS due to missing data


$NESN: possibly delisted; no timezone found
$TFC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for NESN: No historical data available for NESN around 2024-01-21.
Skipping NESN due to missing data
Error fetching data for TFC: No historical data available for TFC around 2024-01-21.
Skipping TFC due to missing data


$AZO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AZO: No historical data available for AZO around 2024-01-21.
Skipping AZO due to missing data


$SANB: possibly delisted; no timezone found
$NET: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$WIX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SANB: No historical data available for SANB around 2024-01-21.
Skipping SANB due to missing data
Error fetching data for NET: No historical data available for NET around 2024-01-21.
Skipping NET due to missing data
Error fetching data for WIX: No historical data available for WIX around 2024-01-21.
Skipping WIX due to missing data


$CBAUF: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$MAXN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ACGL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for CBAUF: No historical data available for CBAUF around 2024-01-21.
Skipping CBAUF due to missing data
Error fetching data for MAXN: No historical data available for MAXN around 2024-01-21.
Skipping MAXN due to missing data
Error fetching data for ACGL: No historical data available for ACGL around 2024-01-21.
Skipping ACGL due to missing data


$FVRR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for FVRR: No historical data available for FVRR around 2024-01-21.
Skipping FVRR due to missing data


$CS: possibly delisted; no timezone found


Error fetching data for CS: No historical data available for CS around 2024-01-21.
Skipping CS due to missing data


$TOT: possibly delisted; no timezone found
$MKL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CME: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for TOT: No historical data available for TOT around 2024-01-21.
Skipping TOT due to missing data
Error fetching data for MKL: No historical data available for MKL around 2024-01-21.
Skipping MKL due to missing data
Error fetching data for CME: No historical data available for CME around 2024-01-21.
Skipping CME due to missing data


$IRM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$VLO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for IRM: No historical data available for IRM around 2024-01-21.
Skipping IRM due to missing data
Error fetching data for VLO: No historical data available for VLO around 2024-01-21.
Skipping VLO due to missing data


$GFL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$PGR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for GFL: No historical data available for GFL around 2024-01-21.
Skipping GFL due to missing data
Error fetching data for PGR: No historical data available for PGR around 2024-01-21.
Skipping PGR due to missing data


$HRL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HRL: No historical data available for HRL around 2024-01-21.
Skipping HRL due to missing data
No valid test data collected
Evaluation failed. Metrics are None.


## Export the model

In [535]:
learn.export(modelFolder / f'{modelName}.pkl')

## Tests (recommended to use the app instead, but feel free to use the tests below)


To get prediction on a new dataframe, you can use the test_dl method of the DataLoaders. That dataframe does not need to have the dependent variable in its column.

In [536]:
predictionTarget = 'AAPL'

test_df = stockFetcher.getTickerData(predictionTarget)

# Ensure test_df is a DataFrame
if isinstance(test_df, dict):
	test_df = pd.DataFrame([test_df])

dl = learn.dls.test_dl(test_df)
test_df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


Unnamed: 0,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,Market Cap,ROIC,Industry
0,232.119995,232.289993,228.479996,229.979996,68247100,0.0,0.0,59.880474,3458416000000.0,0.013193,Consumer Electronics


In [537]:
prediction = learn.get_preds(dl=dl)
print(f"Prediction for {predictionTarget}:")
print(f"{prediction[0][0][0].item() * 100:.2f}%")

Prediction for AAPL:
-31.78%


Note:
Since machine learning models can’t magically understand categories it was never trained on, the data should reflect this. If there are different missing values in your test data you should address this before training