## Tabular training

In [540]:
# Install libraries on first run
#! pip install -q ipynb fastai pathlib pandas import_ipynb numpy

In [541]:
from fastai.tabular.all import *
import yfinance as yf
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import import_ipynb
import numpy as np
import random

## Variables

In [542]:
modelName = 'stockScreenerV3.0'
trainingDataName = 'stockData.csv'
trainingFolder = Path.cwd().parent / 'TrainingData'
modelFolder = Path.cwd().parent.parent / 'TrainedModels'
testFolder = Path.cwd().parent / 'TestData'

# Training parameters
yNames = ['Future Year Change']
catNames = ['Industry']
contNames = [
    'Open',
    'High', 
    'Low', 
    'Close', 
    'Volume', 
    'Dividends', 
    'Stock Splits', 
    'EV/EBIT', 
    'ROIC'
]
epochs = 2

# Test parameters
testSize = 100 # Number of stocks to test

Then we can have a look at how the data is structured:

In [543]:
dataPath = Path()
df = pd.read_csv(trainingFolder/trainingDataName)
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Future Year Change,Ticker,Industry,Adj Close,Capital Gains,EV/EBIT,ROIC
0,2017-10-20 00:00:00-04:00,16.25,16.99,14.0,16.26,37563700.0,0.0,0.0,-0.194342,SE,Internet Retail,,,2.190711,0.360613
1,2017-10-23 00:00:00-04:00,16.1,16.399,15.12,15.26,5753800.0,0.0,0.0,-0.145478,SE,Internet Retail,,,1.96306,0.402433
2,2017-10-24 00:00:00-04:00,15.4,15.86,14.77,15.24,3748300.0,0.0,0.0,-0.179134,SE,Internet Retail,,,1.958507,0.403368
3,2017-10-25 00:00:00-04:00,15.1,15.43,13.62,13.73,4408100.0,0.0,0.0,-0.071377,SE,Internet Retail,,,1.614754,0.489238
4,2017-10-26 00:00:00-04:00,13.82,14.16,13.81,14.02,1850000.0,0.0,0.0,-0.099857,SE,Internet Retail,,,1.680773,0.470022


Some of the columns are continuous (like age) and we will treat them as float numbers we can feed our model directly. Others are categorical (like workclass or education) and we will convert them to a unique index that we will feed to embedding layers. We can specify our categorical and continuous column names, as well as the name of the dependent variable in TabularDataLoaders factory methods:

In [544]:
dls = TabularDataLoaders.from_csv(trainingFolder/trainingDataName, path=dataPath, 
    y_names=yNames,
    cat_names=catNames,
    cont_names=contNames,
    procs = [Categorify, FillMissing, Normalize])

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


The last part is the list of pre-processors we apply to our data:

* Categorify is going to take every categorical variable and make a map from integer to unique categories, then replace the values by the corresponding index.
* FillMissing will fill the missing values in the continuous variables by the median of existing values (you can choose a specific value if you prefer)
* Normalize will normalize the continuous variables (subtract the mean and divide by the std)

To further expose what’s going on below the surface, let’s rewrite this utilizing fastai’s TabularPandas class. We will need to make one adjustment, which is defining how we want to split our data. By default the factory method above used a random 80/20 split, so we will do the same:

In [545]:
splits = EndSplitter (valid_pct=0.2, valid_last=True)(range_of(df))

In [546]:
to = TabularPandas(df, procs=[Categorify, FillMissing, Normalize],
    y_names=yNames,
    cat_names = catNames,
    cont_names = contNames,
    splits=splits)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


Once we build our TabularPandas object, our data is completely preprocessed as seen below:

In [547]:
to.xs.iloc[:1]

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC
0,23,1,1,-0.161926,-0.157894,-0.183005,-0.162741,5.635154,-0.050866,-0.013349,-0.19608,-0.085114


Now we can build our DataLoaders again:

In [548]:
dls = to.dataloaders(bs=64)

The show_batch method works like for every other application:

In [549]:
dls.show_batch()

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC,Future Year Change
0,Software - Application,False,False,65.290001,66.5,64.157998,65.769997,3076500.0,-2.991955e-11,1.158889e-12,14.217635,0.055565,7.165122
1,Auto Manufacturers,False,False,5.537457,5.643947,5.537457,5.60135,21599.96,0.035,1.158889e-12,1.810714,0.436292,0.536588
2,Gold,False,False,4.405805,4.405807,4.048579,4.048578,9099.856,-2.991955e-11,1.158889e-12,4.829698,0.163571,0.161765
3,Consumer Electronics,False,False,0.233453,0.233451,0.225399,0.233451,218899.9,-2.991955e-11,1.158889e-12,1.663136,0.475006,0.13793
4,Drug Manufacturers - Specialty & Generic,False,False,29.278507,29.690546,29.270582,29.484526,4018100.0,-2.991955e-11,1.158889e-12,19.621371,0.040262,0.263167
5,Telecom Services,False,False,11.423454,11.492115,11.350502,11.483532,5185664.0,-2.991955e-11,1.158889e-12,13.312311,0.059343,0.140201
6,Specialty Business Services,False,False,3.162026,3.162027,3.081358,3.081361,12399.97,-2.991955e-11,1.158889e-12,8.982103,0.087953,0.366325
7,Banks - Diversified,False,False,9.457927,9.591164,9.383907,9.395012,127200.1,-2.991955e-11,1.158889e-12,1.166495,0.677243,0.145915
8,Software - Application,False,False,3.090002,3.11,3.045,3.1,11150700.0,-2.991955e-11,1.158889e-12,16.430627,0.048082,0.329032
9,Oil & Gas Equipment & Services,False,False,46.89896,48.029057,46.750637,47.923107,11961900.0,-2.991955e-11,1.158889e-12,14.110963,0.055984,0.26519


We can define a model using the tabular_learner method. When we define our model, fastai will try to infer the loss function based on our y_names earlier.

Note: Sometimes with tabular data, your y’s may be encoded (such as 0 and 1). In such a case you should explicitly pass y_block = CategoryBlock in your constructor so fastai won’t presume you are doing regression.

In [550]:
learn = tabular_learner(dls, metrics=[rmse, mae])

And we can train that model with the fit_one_cycle method (the fine_tune method won’t be useful here since we don’t have a pretrained model).

In [551]:
print(f"Training {modelName} for {epochs} epochs")
learn.fit_one_cycle(epochs)

Training stockScreenerV3.0 for 2 epochs


epoch,train_loss,valid_loss,_rmse,mae,time
0,1.458944,110645.234375,332.634064,5.32265,01:55
1,0.70661,22672816.0,4761.597656,70.884811,01:53


We can then have a look at some training predictions:

In [552]:
learn.show_results(max_n=15)

Unnamed: 0,Industry,EV/EBIT_na,ROIC_na,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,ROIC,Future Year Change,Future Year Change_pred
0,3.0,1.0,1.0,0.390352,0.38723,0.393953,0.392971,-0.404855,-0.050866,-0.013349,-0.145133,-0.086208,0.011517,0.207719
1,5.0,1.0,1.0,-0.262461,-0.264234,-0.263215,-0.263305,3.863284,-0.050866,-0.013349,0.139208,-0.08768,-0.108092,-0.050459
2,4.0,1.0,1.0,-0.288135,-0.289057,-0.288154,-0.288723,-0.503011,-0.050866,-0.013349,0.37312,-0.087961,0.141655,0.103646
3,4.0,1.0,1.0,-0.278466,-0.280291,-0.278849,-0.279571,-0.023821,-0.050866,-0.013349,0.019782,-0.087378,0.431009,0.134975
4,27.0,1.0,1.0,-0.221172,-0.222126,-0.221494,-0.220542,0.655362,-0.050866,-0.013349,-0.005178,-0.087284,0.145508,0.352113
5,4.0,1.0,1.0,-0.281279,-0.282771,-0.281292,-0.282035,-0.125137,-0.050866,-0.013349,0.001353,-0.08731,-0.469636,0.1401
6,17.0,1.0,1.0,-0.290506,-0.289142,-0.290553,-0.288182,-0.505915,-0.050866,-0.013349,0.073023,-0.087536,1.227273,0.207363
7,7.0,1.0,1.0,-0.286985,-0.288656,-0.288184,-0.288673,0.213455,-0.050866,-0.013349,-0.073078,-0.086926,0.910113,0.066598
8,27.0,1.0,1.0,-0.265556,-0.266046,-0.266835,-0.266474,2.780753,-0.050866,-0.013349,-0.094034,-0.086769,0.857711,0.355706
9,36.0,1.0,1.0,2.666928,2.796943,2.691123,2.798673,-0.203024,-0.050866,-0.013349,4.01159,-0.08841,0.250856,-0.101249


## Evaluation

In [553]:
import stockFetcher

In [554]:
def evaluate_model(learn, test_tickers, model_name, model_folder, cont_names, cat_names):
    """
    Evaluate a fastai model on a list of test tickers and log the results.
    
    Args:
        learn: fastai Learner object
        test_tickers (list): List of ticker symbols to test on
        model_name (str): Name of the model for logging
        model_folder (Path): Path to save evaluation results
        cont_names (list): List of continuous feature names
        cat_names (list): List of categorical feature names
    """
    all_predictions = []
    all_actuals = []
    test_data_list = []
    
    # Collect test data for all tickers
    for ticker in test_tickers:
        try:
            # Get test data
            test_data = stockFetcher.getTickerDataFrom1YrAgo(ticker)
            if test_data.empty:
                print(f"Skipping {ticker} due to missing data")
                continue
            
            test_data_list.append(test_data)
            
        except Exception as e:
            print(f"Error fetching data for {ticker}: {e}")
            continue
    
    if not test_data_list:
        print("No valid test data collected")
        return None, None, None
    
    # Combine all test data
    combined_test_data = pd.concat(test_data_list, ignore_index=True)
    
    # Create fastai test dataloader
    test_dl = learn.dls.test_dl(combined_test_data)
    
    # Get predictions
    preds, targs = learn.get_preds(dl=test_dl)
    
    # Convert to numpy arrays
    predictions = preds.numpy()
    actuals = targs.numpy()
    
    # Create DataFrame for analysis
    results_df = pd.DataFrame({
        'Predicted': predictions.flatten(),
        'Actual': actuals.flatten()
    })

    # Calculate residuals
    results_df['Residual'] = results_df['Actual'] - results_df['Predicted']

    # Define outlier threshold (2 standard deviations)
    outlier_threshold = 2 * results_df['Residual'].std()

    # Filter outliers
    filtered_df = results_df[abs(results_df['Residual']) <= outlier_threshold]

    # Calculate metrics
    mae = np.mean(np.abs(filtered_df['Residual']))
    rmse = np.sqrt(np.mean(filtered_df['Residual']**2))
    r2 = 1 - (np.sum(filtered_df['Residual']**2) / 
              np.sum((filtered_df['Actual'] - filtered_df['Actual'].mean())**2))

    # Log results
    log_evaluation(model_name, mae, rmse, r2, model_folder)
    
    # Create visualizations
    plot_results(filtered_df, model_name, model_folder)
    
    return mae, rmse, r2

def log_evaluation(model_name, mae, rmse, r2, model_folder):
    """Log evaluation metrics to CSV file"""
    log_file = model_folder / "modelEvaluations.csv"
    
    new_entry_df = pd.DataFrame([{
        "Model Name": modelName,
        "Timestamp": datetime.now().strftime('%Y-%m-%d %H:%M'),
        "MAE": f'{mae:.3f}',
        "RMSE": f'{rmse:.3f}',
        "R2": f'{r2:.3f}',
        "Epochs": epochs,
        "Test Amount": len(get_random_test_tickers(n_tickers=testSize)),
        "Cat Names": catNames,
        "Cont Names": contNames,
    }])
    
    try:
        log_df = pd.read_csv(log_file)
        log_df = pd.concat([log_df, new_entry_df], ignore_index=True)
    except FileNotFoundError:
        log_df = new_entry_df
        
    log_df.to_csv(log_file, index=False)
    print(f"Logged evaluation results to {log_file}")

def plot_results(filtered_df, model_name, model_folder):
    """Create and save visualization plots"""
    plt.figure(figsize=(12, 8))
    
    # Scatter plot
    plt.subplot(2, 1, 1)
    actuals = filtered_df['Actual']
    predictions = filtered_df['Predicted']
    plt.scatter(actuals, predictions, alpha=0.7, label='Predictions')
    
    # Perfect prediction line
    min_val = min(actuals.min(), predictions.min())
    max_val = max(actuals.max(), predictions.max())
    plt.plot([min_val, max_val], [min_val, max_val], 
             color='red', linestyle='--', label='Perfect Prediction')
    
    plt.title(f'Predicted vs. Actual Returns - {model_name}', fontsize=14)
    plt.xlabel('Actual Returns', fontsize=12)
    plt.ylabel('Predicted Returns', fontsize=12)
    plt.legend()
    plt.grid(alpha=0.5)
    
    # Residual plot
    plt.subplot(2, 1, 2)
    plt.scatter(predictions, filtered_df['Residual'], alpha=0.7)
    plt.axhline(y=0, color='r', linestyle='--')
    plt.title('Residual Plot', fontsize=14)
    plt.xlabel('Predicted Returns', fontsize=12)
    plt.ylabel('Residual', fontsize=12)
    plt.grid(alpha=0.5)
    
    plt.tight_layout()
    
    # Show plot
    plt.show()

# Function to get random test tickers
def get_random_test_tickers(n_tickers):
    """
    Get random US-listed tickers that aren't in our training set.
    
    Args:
        n_tickers (int): Number of test tickers to return
        
    Returns:
        list: List of ticker symbols
    """
    training_tickers = set(stockFetcher.symbols)
    
    # Get US exchange tickers using pandas_datareader
    try:
        # Get ADR tickers
        adr_df = pd.read_csv(testFolder / 'tickers.csv')
        tickers = adr_df['Ticker'].tolist()
        
        # Clean tickers (remove warrants, preferred shares, etc.)
        clean_tickers = [
            ticker for ticker in tickers 
            if ticker not in training_tickers
        ]
        
        # Randomly select tickers
        if len(clean_tickers) < n_tickers:
            print(f"Warning: Only {len(clean_tickers)} tickers available")
            return clean_tickers
            
        return np.random.choice(clean_tickers, size=n_tickers, replace=False).tolist()
        
    except Exception as e:
        print(f"Error fetching tickers: {e}")
        # Fallback to a list of common US tickers not in training set
        fallback_tickers = [
            'KO', 'PEP', 'JNJ', 'PG', 'WMT', 'HD', 'MCD', 'NKE', 
            'DIS', 'SBUX', 'COST', 'TGT', 'LOW', 'MO', 'CVS'
        ]
        fallback_tickers = [t for t in fallback_tickers if t not in training_tickers]
        return np.random.choice(fallback_tickers, size=min(n_tickers, len(fallback_tickers)), replace=False).tolist()
    
if __name__ == "__main__":
    
    # Evaluate model (replace learn with your actual learner)
    mae, rmse, r2 = evaluate_model(
        learn=learn,  # Your fastai learner
        test_tickers = get_random_test_tickers(n_tickers=testSize),
        model_name=modelName,
        model_folder=modelFolder,
        cont_names=contNames,
        cat_names=catNames
    )
    
    if mae is not None and rmse is not None and r2 is not None:
        print(f"Evaluation Results:")
        print(f"MAE: {mae:.3f}")
        print(f"RMSE: {rmse:.3f}")
        print(f"R2: {r2:.3f}")
    else:
        print("Evaluation failed. Metrics are None.")


$HRL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$NTR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HRL: No historical data available for HRL around 2024-01-21.
Skipping HRL due to missing data
Error fetching data for NTR: No historical data available for NTR around 2024-01-21.
Skipping NTR due to missing data


$AMTD: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BEPC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AMTD: No historical data available for AMTD around 2024-01-21.
Skipping AMTD due to missing data
Error fetching data for BEPC: No historical data available for BEPC around 2024-01-21.
Skipping BEPC due to missing data
Error fetching data for DB: No historical data available for DB around 2024-01-21.
Skipping DB due to missing data


$PM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ZS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for PM: No historical data available for PM around 2024-01-21.
Skipping PM due to missing data
Error fetching data for ZS: No historical data available for ZS around 2024-01-21.
Skipping ZS due to missing data


$EXPI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for EXPI: No historical data available for EXPI around 2024-01-21.
Skipping EXPI due to missing data
Error fetching data for TM: No historical data available for TM around 2024-01-21.
Skipping TM due to missing data


$TRGP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SAP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for TRGP: No historical data available for TRGP around 2024-01-21.
Skipping TRGP due to missing data
Error fetching data for SAP: No historical data available for SAP around 2024-01-21.
Skipping SAP due to missing data


$MOS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AGI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TAL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$WIX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for MOS: No historical data available for MOS around 2024-01-21.
Skipping MOS due to missing data
Error fetching data for AGI: No historical data available for AGI around 2024-01-21.
Skipping AGI due to missing data
Error fetching data for TAL: No historical data available for TAL around 2024-01-21.
Skipping TAL due to missing data
Error fetching data for WIX: No historical data available for WIX around 2024-01-21.
Skipping WIX due to missing data


$EOG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RACE: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for EOG: No historical data available for EOG around 2024-01-21.
Skipping EOG due to missing data
Error fetching data for RACE: No historical data available for RACE around 2024-01-21.
Skipping RACE due to missing data


$AXP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AFRM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for AXP: No historical data available for AXP around 2024-01-21.
Skipping AXP due to missing data
Error fetching data for AFRM: No historical data available for AFRM around 2024-01-21.
Skipping AFRM due to missing data


$SCHW: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$MO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SCHW: No historical data available for SCHW around 2024-01-21.
Skipping SCHW due to missing data
Error fetching data for MO: No historical data available for MO around 2024-01-21.
Skipping MO due to missing data


$VIPS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ITUB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BBVA: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for VIPS: No historical data available for VIPS around 2024-01-21.
Skipping VIPS due to missing data
Error fetching data for ITUB: No historical data available for ITUB around 2024-01-21.
Skipping ITUB due to missing data
Error fetching data for BBVA: No historical data available for BBVA around 2024-01-21.
Skipping BBVA due to missing data


$SQM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RUN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CF: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SQM: No historical data available for SQM around 2024-01-21.
Skipping SQM due to missing data
Error fetching data for RUN: No historical data available for RUN around 2024-01-21.
Skipping RUN due to missing data
Error fetching data for CF: No historical data available for CF around 2024-01-21.
Skipping CF due to missing data


$KGC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TSN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for KGC: No historical data available for KGC around 2024-01-21.
Skipping KGC due to missing data
Error fetching data for TSN: No historical data available for TSN around 2024-01-21.
Skipping TSN due to missing data


$PKX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BTI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for PKX: No historical data available for PKX around 2024-01-21.
Skipping PKX due to missing data
Error fetching data for BTI: No historical data available for BTI around 2024-01-21.
Skipping BTI due to missing data


$ENIC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CME: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$CSIQ: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$GFL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DASH: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for ENIC: No historical data available for ENIC around 2024-01-21.
Skipping ENIC due to missing data
Error fetching data for CME: No historical data available for CME around 2024-01-21.
Skipping CME due to missing data
Error fetching data for CSIQ: No historical data available for CSIQ around 2024-01-21.
Skipping CSIQ due to missing data
Error fetching data for GFL: No historical data available for GFL around 2024-01-21.
Skipping GFL due to missing data
Error fetching data for DASH: No historical data available for DASH around 2024-01-21.
Skipping DASH due to missing data


$BP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$IPI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for BP: No historical data available for BP around 2024-01-21.
Skipping BP due to missing data
Error fetching data for IPI: No historical data available for IPI around 2024-01-21.
Skipping IPI due to missing data


$RLI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RNR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for RLI: No historical data available for RLI around 2024-01-21.
Skipping RLI due to missing data
Error fetching data for RNR: No historical data available for RNR around 2024-01-21.
Skipping RNR due to missing data


$LAC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$EL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$MSCI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for LAC: No historical data available for LAC around 2024-01-21.
Skipping LAC due to missing data
Error fetching data for EL: No historical data available for EL around 2024-01-21.
Skipping EL due to missing data
Error fetching data for MSCI: No historical data available for MSCI around 2024-01-21.
Skipping MSCI due to missing data


$MT: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DDOG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$VLO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for MT: No historical data available for MT around 2024-01-21.
Skipping MT due to missing data
Error fetching data for DDOG: No historical data available for DDOG around 2024-01-21.
Skipping DDOG due to missing data
Error fetching data for VLO: No historical data available for VLO around 2024-01-21.
Skipping VLO due to missing data


$HBAN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$COP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HBAN: No historical data available for HBAN around 2024-01-21.
Skipping HBAN due to missing data
Error fetching data for COP: No historical data available for COP around 2024-01-21.
Skipping COP due to missing data


$MKTX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$PAAS: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$NTDOY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for MKTX: No historical data available for MKTX around 2024-01-21.
Skipping MKTX due to missing data
Error fetching data for PAAS: No historical data available for PAAS around 2024-01-21.
Skipping PAAS due to missing data
Error fetching data for NTDOY: No historical data available for NTDOY around 2024-01-21.
Skipping NTDOY due to missing data


$HMC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SPGI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for HMC: No historical data available for HMC around 2024-01-21.
Skipping HMC due to missing data
Error fetching data for SPGI: No historical data available for SPGI around 2024-01-21.
Skipping SPGI due to missing data


$MFG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$SOFI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AMT: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for MFG: No historical data available for MFG around 2024-01-21.
Skipping MFG due to missing data
Error fetching data for SOFI: No historical data available for SOFI around 2024-01-21.
Skipping SOFI due to missing data
Error fetching data for AMT: No historical data available for AMT around 2024-01-21.
Skipping AMT due to missing data


$GFI: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$LC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DLR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for GFI: No historical data available for GFI around 2024-01-21.
Skipping GFI due to missing data
Error fetching data for LC: No historical data available for LC around 2024-01-21.
Skipping LC due to missing data
Error fetching data for DLR: No historical data available for DLR around 2024-01-21.
Skipping DLR due to missing data


$CBAUF: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TWLO: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$VOD: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for CBAUF: No historical data available for CBAUF around 2024-01-21.
Skipping CBAUF due to missing data
Error fetching data for TWLO: No historical data available for TWLO around 2024-01-21.
Skipping TWLO due to missing data
Error fetching data for VOD: No historical data available for VOD around 2024-01-21.
Skipping VOD due to missing data


$BLDP: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TECK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for BLDP: No historical data available for BLDP around 2024-01-21.
Skipping BLDP due to missing data
Error fetching data for TECK: No historical data available for TECK around 2024-01-21.
Skipping TECK due to missing data


$RELX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$IMBBY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for RELX: No historical data available for RELX around 2024-01-21.
Skipping RELX due to missing data
Error fetching data for IMBBY: No historical data available for IMBBY around 2024-01-21.
Skipping IMBBY due to missing data


$PNC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$KEY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for PNC: No historical data available for PNC around 2024-01-21.
Skipping PNC due to missing data
Error fetching data for KEY: No historical data available for KEY around 2024-01-21.
Skipping KEY due to missing data


$IBKR: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ATHM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$NOW: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for IBKR: No historical data available for IBKR around 2024-01-21.
Skipping IBKR due to missing data
Error fetching data for ATHM: No historical data available for ATHM around 2024-01-21.
Skipping ATHM due to missing data
Error fetching data for NOW: No historical data available for NOW around 2024-01-21.
Skipping NOW due to missing data


$SPWR: possibly delisted; no timezone found


Error fetching data for SPWR: No historical data available for SPWR around 2024-01-21.
Skipping SPWR due to missing data


$NESN: possibly delisted; no timezone found


Error fetching data for NESN: No historical data available for NESN around 2024-01-21.
Skipping NESN due to missing data


$XOM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$TFC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for XOM: No historical data available for XOM around 2024-01-21.
Skipping XOM due to missing data
Error fetching data for TFC: No historical data available for TFC around 2024-01-21.
Skipping TFC due to missing data


$CVX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BABA: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$EQIX: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for CVX: No historical data available for CVX around 2024-01-21.
Skipping CVX due to missing data
Error fetching data for BABA: No historical data available for BABA around 2024-01-21.
Skipping BABA due to missing data
Error fetching data for EQIX: No historical data available for EQIX around 2024-01-21.
Skipping EQIX due to missing data


$COF: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$EDU: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$AM: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for COF: No historical data available for COF around 2024-01-21.
Skipping COF due to missing data
Error fetching data for EDU: No historical data available for EDU around 2024-01-21.
Skipping EDU due to missing data
Error fetching data for AM: No historical data available for AM around 2024-01-21.
Skipping AM due to missing data


$SLB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$BK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SLB: No historical data available for SLB around 2024-01-21.
Skipping SLB due to missing data
Error fetching data for BK: No historical data available for BK around 2024-01-21.
Skipping BK due to missing data


$FANUY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for FANUY: No historical data available for FANUY around 2024-01-21.
Skipping FANUY due to missing data


$SPLK: possibly delisted; no timezone found
$HIG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SPLK: No historical data available for SPLK around 2024-01-21.
Skipping SPLK due to missing data
Error fetching data for HIG: No historical data available for HIG around 2024-01-21.
Skipping HIG due to missing data


$WMB: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$DSY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$YY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for WMB: No historical data available for WMB around 2024-01-21.
Skipping WMB due to missing data
Error fetching data for DSY: No historical data available for DSY around 2024-01-21.
Skipping DSY due to missing data
Error fetching data for YY: No historical data available for YY around 2024-01-21.
Skipping YY due to missing data


$RDS.A: possibly delisted; no timezone found
$ERIC: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for RDS.A: No historical data available for RDS.A around 2024-01-21.
Skipping RDS.A due to missing data
Error fetching data for ERIC: No historical data available for ERIC around 2024-01-21.
Skipping ERIC due to missing data


$OKE: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$STT: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for OKE: No historical data available for OKE around 2024-01-21.
Skipping OKE due to missing data
Error fetching data for STT: No historical data available for STT around 2024-01-21.
Skipping STT due to missing data


$SAN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$NTTYY: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for SAN: No historical data available for SAN around 2024-01-21.
Skipping SAN due to missing data
Error fetching data for NTTYY: No historical data available for NTTYY around 2024-01-21.
Skipping NTTYY due to missing data


$PLUG: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$ING: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$RDFN: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for PLUG: No historical data available for PLUG around 2024-01-21.
Skipping PLUG due to missing data
Error fetching data for ING: No historical data available for ING around 2024-01-21.
Skipping ING due to missing data
Error fetching data for RDFN: No historical data available for RDFN around 2024-01-21.
Skipping RDFN due to missing data


$UL: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)
$VRSK: possibly delisted; no price data found  (1d 2024-01-20 -> 2024-01-22)


Error fetching data for UL: No historical data available for UL around 2024-01-21.
Skipping UL due to missing data
Error fetching data for VRSK: No historical data available for VRSK around 2024-01-21.
Skipping VRSK due to missing data
No valid test data collected
Evaluation failed. Metrics are None.


## Export the model

In [555]:
learn.export(modelFolder / f'{modelName}.pkl')

## Tests (recommended to use the app instead, but feel free to use the tests below)


To get prediction on a new dataframe, you can use the test_dl method of the DataLoaders. That dataframe does not need to have the dependent variable in its column.

In [556]:
predictionTarget = 'AAPL'

test_df = stockFetcher.getTickerData(predictionTarget)

# Ensure test_df is a DataFrame
if isinstance(test_df, dict):
	test_df = pd.DataFrame([test_df])

dl = learn.dls.test_dl(test_df)
test_df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  to[n].fillna(self.na_dict[n], inplace=True)


Unnamed: 0,Open,High,Low,Close,Volume,Dividends,Stock Splits,EV/EBIT,Market Cap,ROIC,Industry
0,232.119995,232.289993,228.479996,229.979996,68247100,0.0,0.0,59.880474,3458416000000.0,0.013193,Consumer Electronics


In [557]:
prediction = learn.get_preds(dl=dl)
print(f"Prediction for {predictionTarget}:")
print(f"{prediction[0][0][0].item() * 100:.2f}%")

Prediction for AAPL:
-69.96%


Note:
Since machine learning models can’t magically understand categories it was never trained on, the data should reflect this. If there are different missing values in your test data you should address this before training