<a href="https://colab.research.google.com/github/Jimmynycu/finrl_for_VICI/blob/main/FINRL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Generate a Python script for Google Colab using the FINRL library to train a Deep Reinforcement Learning agent for trading TSMC stock, following the specified workflow, including environment setup, data downloading, feature engineering, environment creation, agent training, and backtesting, with all configurations centralized in a dictionary and extensive comments.

## Environment setup

### Subtask:
Install necessary libraries using `!pip install`.


**Reasoning**:
The subtask requires executing the provided code cell to install the necessary libraries.



In [1]:
!pip install --quiet swig
!pip install --quiet wrds
!pip install --quiet pyportfolioopt
!pip install --quiet git+https://github.com/AI4Finance-Foundation/FinRL.git

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m63.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.1/222.1 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.7/108.7 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

## Library imports and configuration

### Subtask:
Import required libraries and define a configuration dictionary with all parameters.


**Reasoning**:
Import the necessary libraries and define the configuration dictionary as instructed.



In [28]:
import pandas as pd
import numpy as np
import datetime
import yfinance as yf

from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl import config_tickers
from finrl.config import INDICATORS

import itertools
from stable_baselines3 import A2C, DDPG, PPO, SAC, TD3
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv

# Define the configuration dictionary
config = {
    'TICKER_LIST': ["TSM"],
    # --- ADJUST THESE DATES ---
    'START_DATE': '2010-01-01',
    'END_DATE': '2025-08-26',        # Extend data download to yesterday
    'INDICATORS': INDICATORS,

    'TRAIN_START_DATE': '2010-01-01',
    'TRAIN_END_DATE': '2019-12-31',    # Train on the first 10 years

    'TRADE_START_DATE': '2020-01-01',  # Test on the subsequent ~5.5 years
    'TRADE_END_DATE': '2025-08-26',
    'ERL_PARAMS': {
        'learning_rate': 1e-5,
        'batch_size': 1024,
        'gamma': 0.99,
        'seed': 312,
        'net_dimension': 512,
        'target_tau': 0.001,
        'activation_fn': 'relu',
        'use_batch_norm': False,
        'use_layer_norm': False,
        'state_memory_size': 100,
        'buffer_size': 1000000,
        'train_freq': 1,
        'gradient_steps': 1,
        'ent_coef': 0.01,
        'action_noise': None,
        'optimize_memory_usage': False,
        'policy_kwargs': None,
        'device': 'auto'
    },
    'AGENT': 'ppo',
    'TRAIN_STEPS': 200000
}

# Display the config dictionary to verify
display(config)

{'TICKER_LIST': ['TSM'],
 'START_DATE': '2010-01-01',
 'END_DATE': '2025-08-26',
 'INDICATORS': ['macd',
  'boll_ub',
  'boll_lb',
  'rsi_30',
  'cci_30',
  'dx_30',
  'close_30_sma',
  'close_60_sma'],
 'TRAIN_START_DATE': '2010-01-01',
 'TRAIN_END_DATE': '2019-12-31',
 'TRADE_START_DATE': '2020-01-01',
 'TRADE_END_DATE': '2025-08-26',
 'ERL_PARAMS': {'learning_rate': 1e-05,
  'batch_size': 1024,
  'gamma': 0.99,
  'seed': 312,
  'net_dimension': 512,
  'target_tau': 0.001,
  'activation_fn': 'relu',
  'use_batch_norm': False,
  'use_layer_norm': False,
  'state_memory_size': 100,
  'buffer_size': 1000000,
  'train_freq': 1,
  'gradient_steps': 1,
  'ent_coef': 0.01,
  'action_noise': None,
  'optimize_memory_usage': False,
  'policy_kwargs': None,
  'device': 'auto'},
 'AGENT': 'ppo',
 'TRAIN_STEPS': 200000}

## Data downloading

### Subtask:
Download historical stock data for TSMC using `YahooDownloader` based on the dates in the configuration.


**Reasoning**:
Download the historical stock data for TSMC using the YahooDownloader based on the dates and ticker in the config dictionary, and display the head and shape of the resulting dataframe.



In [29]:
# Data Downloading Cell

# Instantiate YahooDownloader with parameters from the config dictionary
downloader = YahooDownloader(start_date=config['START_DATE'],
                             end_date=config['END_DATE'],
                             ticker_list=config['TICKER_LIST'])

# Fetch the data
df = downloader.fetch_data()

# Display the first few rows and the shape of the dataframe
display(df.head())
display(df.shape)

[*********************100%***********************]  1 of 1 completed

Shape of DataFrame:  (3935, 8)


  return datetime.utcnow().replace(tzinfo=utc)

  return datetime.utcnow().replace(tzinfo=utc)


Price,date,close,high,low,open,volume,tic,day
0,2010-01-04,7.298707,7.368039,7.229376,7.241982,8096400,TSM,0
1,2010-01-05,7.267192,7.349129,7.223072,7.311312,14375900,TSM,1
2,2010-01-06,7.241983,7.317617,7.172652,7.286103,13608400,TSM,2
3,2010-01-07,7.002474,7.210469,6.977263,7.19156,27346600,TSM,3
4,2010-01-08,6.996171,7.071805,6.95205,7.021382,16895300,TSM,4


(3935, 8)

In [30]:
# --- Data Sanity Check Cell ---
# Run this cell immediately after downloading the data

print("--- Inspecting Raw Downloaded Data ---")

# Check the first 5 and last 5 rows to verify the date range
print("First 5 rows:")
display(df.head())
print("\nLast 5 rows:")
display(df.tail())

# Get a summary of the data types and check for missing values
print("\nDataFrame Info (non-null counts):")
df.info()

# Get a statistical summary of the data
print("\nStatistical Summary:")
display(df.describe())

--- Inspecting Raw Downloaded Data ---
First 5 rows:


  return datetime.utcnow().replace(tzinfo=utc)


Price,date,close,high,low,open,volume,tic,day
0,2010-01-04,7.298707,7.368039,7.229376,7.241982,8096400,TSM,0
1,2010-01-05,7.267192,7.349129,7.223072,7.311312,14375900,TSM,1
2,2010-01-06,7.241983,7.317617,7.172652,7.286103,13608400,TSM,2
3,2010-01-07,7.002474,7.210469,6.977263,7.19156,27346600,TSM,3
4,2010-01-08,6.996171,7.071805,6.95205,7.021382,16895300,TSM,4



Last 5 rows:


  return datetime.utcnow().replace(tzinfo=utc)


Price,date,close,high,low,open,volume,tic,day
3930,2025-08-19,232.699997,240.169998,232.580002,240.020004,14594700,TSM,1
3931,2025-08-20,228.600006,229.029999,223.699997,228.139999,17165200,TSM,2
3932,2025-08-21,227.330002,230.330002,226.259995,228.149994,7449100,TSM,3
3933,2025-08-22,232.990005,234.449997,226.169998,228.0,10299500,TSM,4
3934,2025-08-25,235.589996,237.279999,232.25,234.300003,7655000,TSM,0



DataFrame Info (non-null counts):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3935 entries, 0 to 3934
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   date    3935 non-null   object 
 1   close   3935 non-null   float64
 2   high    3935 non-null   float64
 3   low     3935 non-null   float64
 4   open    3935 non-null   float64
 5   volume  3935 non-null   int64  
 6   tic     3935 non-null   object 
 7   day     3935 non-null   int32  
dtypes: float64(4), int32(1), int64(1), object(2)
memory usage: 230.7+ KB

Statistical Summary:


  return datetime.utcnow().replace(tzinfo=utc)


Price,close,high,low,open,volume,day
count,3935.0,3935.0,3935.0,3935.0,3935.0,3935.0
mean,53.149442,53.768826,52.524919,53.172122,10791540.0,2.023634
std,53.79733,54.530332,53.058363,53.843651,5779225.0,1.39947
min,5.943594,6.006622,5.861657,5.98141,1499700.0,0.0
25%,13.268132,13.363146,13.195692,13.269564,7064050.0,1.0
50%,31.291338,31.558044,31.016018,31.256924,9621800.0,2.0
75%,85.716553,86.657143,84.732396,85.469645,12959850.0,3.0
max,245.600006,248.279999,241.699997,246.429993,68667500.0,4.0


## Feature engineering

### Subtask:
Apply technical indicators and other features to the downloaded data using `FeatureEngineer`.


**Reasoning**:
The previous code failed because `FeatureEngineer` does not have an `add_tradable_day` method. The day of the week column is already present in the dataframe from the YahooDownloader. Also, the error indicates that the `add_turbulence` method also doesn't exist. I will remove the calls to these non-existent methods and only use `add_technical_indicator`.



In [31]:
# Feature Engineering Cell

# Import the original INDICATORS list from finrl.config
from finrl.config import INDICATORS as ORIGINAL_INDICATORS

# Instantiate FeatureEngineer with the original list of indicators
fe = FeatureEngineer(use_technical_indicator=True,
                     tech_indicator_list = ORIGINAL_INDICATORS, # Use original indicator names
                     use_turbulence=False,
                     user_defined_feature=False)

# Add technical indicators
# IMPORTANT: Ensure you run this cell only ONCE after running the data downloading cell (6062dff2)
df = fe.add_technical_indicator(df)

# Fill NaN values with 0 after adding technical indicators
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.fillna(0, inplace=True)

# Display the head and shape of the modified dataframe
display(df.head())
display(df.shape)

Unnamed: 0,date,close,high,low,open,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2010-01-04,7.298707,7.368039,7.229376,7.241982,8096400,TSM,0,0.0,0.0,0.0,0.0,0.0,0.0,7.298707,7.298707
1,2010-01-05,7.267192,7.349129,7.223072,7.311312,14375900,TSM,1,-0.000707,7.327519,7.23838,0.0,-66.666667,100.0,7.28295,7.28295
2,2010-01-06,7.241983,7.317617,7.172652,7.286103,13608400,TSM,2,-0.001683,7.326135,7.212453,0.0,-100.0,100.0,7.269294,7.269294
3,2010-01-07,7.002474,7.210469,6.977263,7.19156,27346600,TSM,3,-0.010733,7.473415,6.931763,0.0,-133.333333,100.0,7.202589,7.202589
4,2010-01-08,6.996171,7.071805,6.95205,7.021382,16895300,TSM,4,-0.015675,7.459797,6.862814,0.0,-99.805137,100.0,7.161305,7.161305


  return datetime.utcnow().replace(tzinfo=utc)


(3935, 16)

## Environment creation

### Subtask:
Create a custom trading environment compatible with Stable Baselines3 using the processed data.


**Reasoning**:
Create the custom trading environment using the processed data and the configuration parameters.



In [32]:
# Environment Creation Cell

# Import the StockTradingEnv class from the identified path
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv


from stable_baselines3.common.vec_env import DummyVecEnv

# Define environment parameters
stock_dimension = len(config['TICKER_LIST'])
state_space = 1 + 2*stock_dimension + len(config['INDICATORS'])
print(f"State space dimension: {state_space}")
# Define buy and sell costs (0.1% of the transaction amount)
# Since we have only one stock, the list will have one element
buy_cost_list = sell_cost_list = [0.001] * stock_dimension

# Fill any remaining NaN values with 0 just before splitting the data
df.fillna(0, inplace=True)


# Split the df DataFrame into training and trading datasets
train_data = data_split(df, config['TRAIN_START_DATE'], config['TRAIN_END_DATE'])
trade_data = data_split(df, config['TRADE_START_DATE'], config['TRADE_END_DATE'])

# Instantiate the StockTradingEnv for the training data
e_train_gym = StockTradingEnv(df = train_data,
                              stock_dim = stock_dimension,
                              hmax = 100, # Max shares to trade
                              initial_amount = 100000, # Starting cash
                              num_stock_shares = [0] * stock_dimension, # Add the num_stock_shares argument
                              state_space = state_space,
                              tech_indicator_list = config['INDICATORS'],
                              action_space = stock_dimension, # Action space is the number of stocks
                              buy_cost_pct = buy_cost_list,
                              sell_cost_pct = sell_cost_list,
                              reward_scaling = 1e-4, # Scale the reward
                              print_verbosity = 5 # Print frequency
                             )

# Wrap the training environment using DummyVecEnv
env_train = DummyVecEnv([lambda: e_train_gym])

# Instantiate the StockTradingEnv for the trading data
# Instantiate the StockTradingEnv for the trading data
e_trade_gym = StockTradingEnv(df = trade_data,
                              stock_dim = stock_dimension,
                              hmax = 100,
                              initial_amount = 100000,
                              # --- ADD THIS LINE ---
                              num_stock_shares = [0] * stock_dimension,
                              state_space = state_space,
                              tech_indicator_list = config['INDICATORS'],
                              action_space = stock_dimension,
                              buy_cost_pct = buy_cost_list,
                              sell_cost_pct = sell_cost_list,
                              reward_scaling = 1e-4,
                              print_verbosity = 5
                             )

# Wrap the trading environment using DummyVecEnv
env_trade = DummyVecEnv([lambda: e_trade_gym])

# Print a message indicating successful creation
print("Training and trading environments created successfully.")

State space dimension: 11
Training and trading environments created successfully.


  return datetime.utcnow().replace(tzinfo=utc)


## Agent training

### Subtask:
Initialize and train a DRL agent (e.g., A2C, PPO, DDPG) using Stable Baselines3 on the training data.


**Reasoning**:
Import the necessary DRL agent classes from stable_baselines3 and train the agent based on the config dictionary.



In [33]:
# Import the selected DRL agent class
if config['AGENT'] == 'a2c':
    from stable_baselines3 import A2C
    Agent = A2C
elif config['AGENT'] == 'ppo':
    from stable_baselines3 import PPO
    Agent = PPO
elif config['AGENT'] == 'ddpg':
    from stable_baselines3 import DDPG
    Agent = DDPG
else:
    raise ValueError(f"Agent {config['AGENT']} not supported.")

# Prepare parameters for the agent constructor, keeping only accepted arguments
agent_params = {k: v for k, v in config['ERL_PARAMS'].items() if k in ['learning_rate', 'n_steps', 'batch_size', 'gamma', 'gae_lambda', 'clip_range', 'clip_range_vf', 'normalize_advantage', 'ent_coef', 'vf_coef', 'max_grad_norm', 'use_sde', 'sde_sample_freq', 'enable_experiencing_repay', 'target_kl', 'create_eval_env', 'policy_kwargs', 'verbose', 'seed', 'device', '_init_setup_model']}

# Instantiate the agent
# We pass the env_train which is already a VecEnv (DummyVecEnv in this case)
# We unpack the filtered agent_params dictionary as keyword arguments
model = Agent("MlpPolicy", env_train, verbose=0, **agent_params)

# Train the agent
print(f"Training agent {config['AGENT']} for {config['TRAIN_STEPS']} steps...")
model.learn(total_timesteps=config['TRAIN_STEPS'])

# Print a message indicating training is complete
print("Agent training complete.")

model.save("ppo_tsm_model.zip")

print("Agent training complete and model saved.")

Training agent ppo for 200000 steps...
day: 2514, episode: 5
begin_total_asset: 100000.00
end_total_asset: 209447.43
total_reward: 109447.43
total_cost: 3094.97
total_trades: 2466
Sharpe: 0.784


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 10
begin_total_asset: 100000.00
end_total_asset: 379786.75
total_reward: 279786.75
total_cost: 2908.85
total_trades: 2487
Sharpe: 0.892


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 15
begin_total_asset: 100000.00
end_total_asset: 476901.21
total_reward: 376901.21
total_cost: 2960.26
total_trades: 2487
Sharpe: 0.921


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 20
begin_total_asset: 100000.00
end_total_asset: 403804.30
total_reward: 303804.30
total_cost: 2872.91
total_trades: 2489
Sharpe: 0.905


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 25
begin_total_asset: 100000.00
end_total_asset: 460191.20
total_reward: 360191.20
total_cost: 2913.17
total_trades: 2492
Sharpe: 0.931


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 30
begin_total_asset: 100000.00
end_total_asset: 404660.67
total_reward: 304660.67
total_cost: 2923.88
total_trades: 2477
Sharpe: 0.895


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 35
begin_total_asset: 100000.00
end_total_asset: 460597.59
total_reward: 360597.59
total_cost: 2929.01
total_trades: 2489
Sharpe: 0.942


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 40
begin_total_asset: 100000.00
end_total_asset: 490106.03
total_reward: 390106.03
total_cost: 2900.67
total_trades: 2495
Sharpe: 0.922


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 45
begin_total_asset: 100000.00
end_total_asset: 430118.24
total_reward: 330118.24
total_cost: 2905.15
total_trades: 2499
Sharpe: 0.894


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 50
begin_total_asset: 100000.00
end_total_asset: 508341.91
total_reward: 408341.91
total_cost: 2828.22
total_trades: 2501
Sharpe: 0.935


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 55
begin_total_asset: 100000.00
end_total_asset: 472840.43
total_reward: 372840.43
total_cost: 2773.22
total_trades: 2491
Sharpe: 0.906


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 60
begin_total_asset: 100000.00
end_total_asset: 445141.26
total_reward: 345141.26
total_cost: 2851.44
total_trades: 2488
Sharpe: 0.922


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 65
begin_total_asset: 100000.00
end_total_asset: 448420.76
total_reward: 348420.76
total_cost: 2788.30
total_trades: 2495
Sharpe: 0.898


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 70
begin_total_asset: 100000.00
end_total_asset: 578812.74
total_reward: 478812.74
total_cost: 2875.12
total_trades: 2496
Sharpe: 0.942


  return datetime.utcnow().replace(tzinfo=utc)


day: 2514, episode: 75
begin_total_asset: 100000.00
end_total_asset: 526837.18
total_reward: 426837.18
total_cost: 2905.34
total_trades: 2490
Sharpe: 0.936


  return datetime.utcnow().replace(tzinfo=utc)


Agent training complete.
Agent training complete and model saved.


  return datetime.utcnow().replace(tzinfo=utc)


In [34]:
# Backtesting Cell

# Import necessary libraries for backtesting
from finrl.plot import backtest_stats, backtest_plot
import matplotlib.pyplot as plt # Import matplotlib for plotting
import pandas as pd # Import pandas to ensure Series is available
import numpy as np # Import numpy
from datetime import datetime # Import datetime for index creation

# Ensure the trading environment is reset for the backtest
# The env_trade was created in the Environment Creation Cell (a2ac2c9a)
# The trained model was created in the Agent Training Cell (d0980032)

obs = env_trade.reset()

# Lists to store account value during backtesting
account_value_history = []

print("Running backtesting simulation manually...")

# Iterate through the trading environment
# We will use the env_trade (DummyVecEnv) directly for stepping
# The underlying env can still be accessed to get metrics
env = env_trade.envs[0] # Access the underlying StockTradingEnv


done = False
# Get the initial observation from the VecEnv
obs = env_trade.reset()

# Ensure obs is a numpy array before passing to predict
# DummyVecEnv.reset() usually returns a numpy array directly, but adding a check for robustness
if isinstance(obs, tuple):
    obs = obs[0]

# Collect the initial account value from the environment's asset_memory
# The asset_memory is likely initialized with the initial_amount
# Append the initial account value at the start of the backtest period
account_value_history.append(env.asset_memory[0])


# Loop through the trading days
# The number of trading days is the number of unique dates in the trade_data
num_trading_days = len(trade_data['date'].unique())

# Loop through the trading days/steps
for i in range(num_trading_days):
    # Use the trained model to predict the action
    action, _states = model.predict(obs, deterministic=True)

    # Step the environment with the predicted action using the VecEnv
    obs, reward, done, info = env_trade.step(action)

    # Handle potential tuple observations from newer Gym versions after stepping
    if isinstance(obs, tuple):
        obs = obs[0]

    # Append the account value after the step
    # Ensure asset_memory has at least one element before accessing the last one
    if env.asset_memory:
         account_value_history.append(env.asset_memory[-1])
    else:
         # Fallback if asset_memory is unexpectedly empty
         account_value_history.append(account_value_history[-1] if account_value_history else env.initial_amount)

    # If done is True before the loop finishes (shouldn't happen in a full backtest), break
    if done:
        print(f"Backtesting loop finished early at step {i}.")
        break

# After the loop, the account_value_history should have num_trading_days + 1 entries
# (initial + value after each step).

# Create the index for the account value series
# It should be the trade start date + all trade dates
trade_start_date_str = config['TRADE_START_DATE']
trade_start_date = pd.Timestamp(trade_start_date_str) # Use pandas.Timestamp

trade_dates_full = trade_data['date'].unique()
# Convert trade dates to pandas.Timestamp objects
trade_dates_ts = [pd.Timestamp(d) for d in trade_dates_full]

# The full index includes the start date and all trading dates as pandas.Timestamp
full_index = [trade_start_date] + trade_dates_ts

# Ensure the length of the index matches the length of the history
if len(full_index) != len(account_value_history):
    print(f"Warning: Length of index ({len(full_index)}) does not match length of account value history ({len(account_value_history)}).")
    # If lengths mismatch, we cannot create the series correctly.
    # In a real scenario, you would investigate why the history length is incorrect.
    # For now, we will print a warning and skip statistics/plotting.
    account_value_series = None
    daily_return_series = None
else:
    # Create the account value series
    account_value_series = pd.Series(account_value_history, index=full_index)

    # Calculate daily returns from the account value series
    daily_return_series = account_value_series.pct_change().dropna()

    # Create a DataFrame from the account value series for backtest_stats and backtest_plot
    # These functions expect a DataFrame with an 'account_value' column
    results_df = pd.DataFrame({'account_value': account_value_series})

    # Reset the index to make the date a column named 'index' or 'date'
    results_df = results_df.reset_index()

    # Rename the index column to 'date' as expected by backtest_stats/plot
    results_df.rename(columns={'index': 'date'}, inplace=True)

# Check if we successfully created the series and DataFrame
if 'results_df' in locals() and results_df is not None and len(results_df) > 1: # Need at least two points for returns

    # Generate backtest statistics
    print("\nBacktest Statistics:")
    # Pass the DataFrame to backtest_stats
    display(backtest_stats(results_df))

    # Generate backtest plot
    print("\nBacktest Plot:")
    # Pass the DataFrame to backtest_plot
    backtest_plot(results_df, baseline_ticker = str(config['TICKER_LIST'][0]), baseline_start = config['TRADE_START_DATE'], baseline_end = config['TRADE_END_DATE'])

    print("\nBacktesting complete.")

else:
    print("Not enough account value history collected (need at least 2 points) or index mismatch occurred for statistics and plotting.")
    print(f"Collected {len(account_value_history)} data points.")

Running backtesting simulation manually...
Backtesting loop finished early at step 1418.

Backtest Statistics:
Annual return         -2.220446e-16
Cumulative returns    -1.443290e-15
Annual volatility      5.016388e-01
Sharpe ratio           4.175560e-01
Calmar ratio          -2.763810e-16
Stability              5.994224e-01
Max drawdown          -8.034005e-01
Omega ratio            1.098735e+00
Sortino ratio          5.078169e-01
Skew                            NaN
Kurtosis                        NaN
Tail ratio             1.157887e+00
Daily value at risk   -6.236935e-02
dtype: float64


  return datetime.utcnow().replace(tzinfo=utc)


Unnamed: 0,0
Annual return,-2.220446e-16
Cumulative returns,-1.44329e-15
Annual volatility,0.5016388
Sharpe ratio,0.417556
Calmar ratio,-2.76381e-16
Stability,0.5994224
Max drawdown,-0.8034005
Omega ratio,1.098735
Sortino ratio,0.5078169
Skew,


[*********************100%***********************]  1 of 1 completed


Backtest Plot:
Shape of DataFrame:  (1419, 8)


  return datetime.utcnow().replace(tzinfo=utc)

  baseline_df = baseline_df.fillna(method="ffill").fillna(method="bfill")
  perf_stats.loc[stat, column] = str(np.round(value * 100, 3)) + "%"
  return datetime.utcnow().replace(tzinfo=utc)


Start date,2020-01-01,2020-01-01
End date,2025-08-25,2025-08-25
Total months,67,67
Unnamed: 0_level_3,Backtest,Unnamed: 2_level_3
Annual return,-0.0%,
Cumulative returns,-0.0%,
Annual volatility,50.164%,
Sharpe ratio,0.42,
Calmar ratio,-0.00,
Stability,0.60,
Max drawdown,-80.34%,
Omega ratio,1.10,
Sortino ratio,0.51,
Skew,,


Worst drawdown periods,Net drawdown in %,Peak date,Valley date,Recovery date,Duration
0,80.34,2025-07-16,2025-08-25,NaT,
1,56.46,2022-01-13,2022-11-02,2024-03-01,557.0
2,36.82,2025-01-22,2025-04-07,2025-06-25,111.0
3,22.56,2024-07-09,2024-08-02,2024-10-10,68.0
4,22.44,2021-02-12,2021-05-11,2022-01-12,239.0


  return datetime.utcnow().replace(tzinfo=utc)
  d = d.astype('datetime64[us]')
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  ax.set_xticklabels(["Daily", "Weekly", "Monthly"])


Stress Events,mean,min,max
Covid,0.08%,-79.50%,12.65%


  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)



Backtesting complete.


# FinRL Trading Agent for TSMC (TSM): Project Summary

## Project Overview

This project implemented a Deep Reinforcement Learning (DRL) agent to learn an automated trading strategy for Taiwan Semiconductor Manufacturing Company (TSM) stock. The primary goal was to train an agent using historical data and then evaluate its performance on unseen future data through a rigorous backtest.

The project utilized the FinRL library, leveraging the Proximal Policy Optimization (PPO) algorithm from Stable Baselines3. The agent was trained on market data from 2010 to 2019 and subsequently backtested on the period from January 2020 to August 2025.

---

## Final Backtesting Results

After a comprehensive training and debugging process, the agent was backtested over a 5.5-year period. The final performance metrics indicate that while the agent learned an active trading strategy, it was ultimately **unprofitable and carried a high level of risk**.

**Key Performance Metrics:**

* **Cumulative Returns:** -0.0% (Broke even)
* **Annual Volatility:** 50.16%
* **Sharpe Ratio:** 0.42
* **Max Drawdown:** -80.34%

The most critical result is the **-80.34% maximum drawdown**, which signifies that at its worst point, the agent's strategy led to a loss of over 80% of the portfolio's value from its peak. This level of risk is unacceptable for a viable trading strategy.

*(You can paste a screenshot of your backtest plot here)*

---

## Analysis & Conclusion

The backtesting results demonstrate a classic challenge in algorithmic trading: a strategy that appears profitable during training may not generalize to new, unseen market conditions. The agent learned to trade actively but failed to produce positive returns, indicating that the patterns it identified in the training data were not robust enough to succeed in the test period.

The project was a success from a technical and debugging perspective. The final code represents a stable, end-to-end pipeline for training and evaluating DRL trading agents. The key challenges overcome included:

1.  **Data Cleaning:** Implementing robust procedures to handle `NaN` and `inf` values generated during feature engineering.
2.  **Simulation Logic:** Correcting the backtesting loop to properly handle the `done` signals from a vectorized environment.
3.  **Library Versioning:** Ensuring a stable and up-to-date version of the FinRL library was used to prevent silent failures.

---

## Future Work

The current result serves as a baseline. The focus now shifts from code debugging to improving the agent's intelligence. Future work should explore:

* **Hyperparameter Tuning:** Experimenting with different PPO settings (e.g., `learning_rate`, `gamma`) to find a more optimal configuration.
* **Feature Engineering:** Testing different combinations of technical indicators to provide the agent with more effective market signals.
* **Alternative Models:** Evaluating other DRL algorithms like A2C or SAC, which might be better suited for this financial task.
* **Longer Training:** Increasing the number of training steps to allow the agent more time to learn from the extensive historical data.