###   __1__: Import necessary packages

In [1]:
include("include.jl");

###  __2__: Load the JLD2 file

We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2014` until `02-07-2025`, along with data for a few exchange-traded funds and volatility products during that time.


In [18]:
data = MyOriginalPortfolioDataSet();

###  __3__: Extract the DataFrame from the loaded file

In [3]:
original_dataset = data["dataset"]

Dict{String, DataFrame} with 515 entries:
  "TPR"  => [1m1828×8 DataFrame[0m[0m…
  "EMR"  => [1m2792×8 DataFrame[0m[0m…
  "CTAS" => [1m2792×8 DataFrame[0m[0m…
  "HSIC" => [1m2792×8 DataFrame[0m[0m…
  "KIM"  => [1m2792×8 DataFrame[0m[0m…
  "PLD"  => [1m2792×8 DataFrame[0m[0m…
  "IEX"  => [1m2792×8 DataFrame[0m[0m…
  "KSU"  => [1m2001×8 DataFrame[0m[0m…
  "BAC"  => [1m2792×8 DataFrame[0m[0m…
  "CBOE" => [1m2792×8 DataFrame[0m[0m…
  "EXR"  => [1m2792×8 DataFrame[0m[0m…
  "NCLH" => [1m2792×8 DataFrame[0m[0m…
  "CVS"  => [1m2792×8 DataFrame[0m[0m…
  "DRI"  => [1m2792×8 DataFrame[0m[0m…
  "DTE"  => [1m2792×8 DataFrame[0m[0m…
  "ZION" => [1m2792×8 DataFrame[0m[0m…
  "AVY"  => [1m2792×8 DataFrame[0m[0m…
  "EW"   => [1m2792×8 DataFrame[0m[0m…
  "EA"   => [1m2792×8 DataFrame[0m[0m…
  "NWSA" => [1m2792×8 DataFrame[0m[0m…
  "BBWI" => [1m884×8 DataFrame[0m[0m…
  "CAG"  => [1m2792×8 DataFrame[0m[0m…
  "GPC"  => [1m2792×8 DataFrame[0

###  __4__: Extract tickers with the maximum number of trading days as `AAPL`

Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of traditional days. First, let's compute the number of records for a company that we know has a maximum value, e.g., `AAPL,` and save that value in the `maximum_number_trading_days` variable:

In [4]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow

2792

Then, iterate through our data and collect only tickers with `maximum_number_trading_days` records. This will make it easier to perform analysis on the full dataset such as computing the daily growth rates. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [5]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end

Lastly, let's get a sorted list of firms that we have in cleaned up `dataset` and save it in the `list_of_all_tickers::Array{String,1}` array

In [6]:
list_of_all_tickers = keys(dataset) |> collect |> x->sort(x)

424-element Vector{String}:
 "A"
 "AAL"
 "AAP"
 "AAPL"
 "ABBV"
 "ABT"
 "ACN"
 "ADBE"
 "ADI"
 "ADM"
 "ADP"
 "ADSK"
 "AEE"
 ⋮
 "WST"
 "WU"
 "WY"
 "WYNN"
 "XEL"
 "XOM"
 "XRAY"
 "XYL"
 "YUM"
 "ZBRA"
 "ZION"
 "ZTS"

### __5__: Split and save each ticker's data into two dictionaries

In [7]:
train_start = DateTime(2014, 1, 1)
train_end = DateTime(2023, 12, 31, 23, 59, 59)

train_dataset = Dict{String, DataFrame}()
test_dataset = Dict{String, DataFrame}()

for ticker in list_of_all_tickers
    df = dataset[ticker]
    df.timestamp = DateTime.(df.timestamp)  # ensure the timestamp column is DateTime

    train_data = filter(row -> train_start <= row.timestamp <= train_end, df)
    test_data = filter(row -> row.timestamp > train_end, df)

    train_dataset[ticker] = train_data
    test_dataset[ticker] = test_data
end

train_file = joinpath(_PATH_TO_DATA, "train_dataset_2014_2023.jld2")
test_file = joinpath(_PATH_TO_DATA, "test_dataset_2024_onward.jld2")

@save train_file train_dataset
@save test_file test_dataset

### Disclaimer: For Educational and Research Purposes Only

The content in this repository is provided strictly for informational, educational, and research purposes. It is not intended as, and should not be construed as, financial advice, an offer, or a solicitation to buy or sell any securities or derivative products.

#### Risk Warning

Trading and investing involve substantial risk. The models and strategies demonstrated here are for illustrative purposes only. Past performance, whether actual or backtested, is not a guarantee of future results.

You are solely responsible for any investment or trading decisions you make. Always conduct your own research and carefully assess your financial situation, investment objectives, and risk tolerance before trading or investing. You should only risk capital that you can afford to lose and that is not essential for your living expenses.