# Part 1. Install Packages

In [27]:
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
## install additional packages for dynamic weighting
!pip install torch scikit-learn seaborn

Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to c:\users\ronit khanna\appdata\local\temp\pip-req-build-ytfif_5g
  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit efc711f35e19b9ca5c5d0da2f8812f4719d613a8
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.8)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to c:\users\ronit khanna\appdata\local\temp\pip-install-elwktfk2\elegantrl_0bf708ac06084cd4993e662062d5ddff
  Resolved https://github.com/AI4Finance-Foundation/ElegantRL.git to commit b2ec84b19a82e

  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git 'C:\Users\Ronit Khanna\AppData\Local\Temp\pip-req-build-ytfif_5g'
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/ElegantRL.git 'C:\Users\Ronit Khanna\AppData\Local\Temp\pip-install-elwktfk2\elegantrl_0bf708ac06084cd4993e662062d5ddff'




In [28]:
import pandas as pd
import yfinance as yf

from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl import config_tickers
from finrl.config import INDICATORS
from finrl.config import *
import itertools

# Part 2. Fetch data

[yfinance](https://github.com/ranaroussi/yfinance) is an open-source library that provides APIs fetching historical data form Yahoo Finance. In FinRL, we have a class called [YahooDownloader](https://github.com/AI4Finance-Foundation/FinRL/blob/master/finrl/meta/preprocessor/yahoodownloader.py) that use yfinance to fetch data from Yahoo Finance.

**OHLCV**: Data downloaded are in the form of OHLCV, corresponding to **open, high, low, close, volume,** respectively. OHLCV is important because they contain most of numerical information of a stock in time series. From OHLCV, traders can get further judgement and prediction like the momentum, people's interest, market trends, etc.

## Data for a single ticker

Here we provide two ways to fetch data with single ticker, let's take Apple Inc. (AAPL) as an example.

### Using yfinance

In [29]:
TRAIN_START_DATE = '2020-01-01'
TRADE_END_DATE = '2020-01-31'
aapl_df_yf = yf.download(tickers = "aapl", start=TRAIN_START_DATE, end=TRADE_END_DATE)

[*********************100%***********************]  1 of 1 completed


In [30]:
aapl_df_yf.head()

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,72.620842,72.681289,71.373218,71.627092,135480400
2020-01-03,71.914818,72.676447,71.689957,71.847118,146322800
2020-01-06,72.487831,72.526518,70.783234,71.034694,118387200
2020-01-07,72.146942,72.753823,71.926915,72.497529,108872000
2020-01-08,73.307518,73.609752,71.84954,71.84954,132079200


### Using FinRL

In FinRL's YahooDownloader, we modified the data frame to the form that convenient for further data processing process. We use adjusted close price instead of close price, and add a column representing the day of a week (0-4 corresponding to Monday-Friday).

In [31]:
aapl_df_finrl = YahooDownloader(start_date = TRAIN_START_DATE,
                                end_date = TRAIN_END_DATE,
                                ticker_list = ['aapl']).fetch_data()

[*********************100%***********************]  1 of 1 completed

Shape of DataFrame:  (146, 8)





In [32]:
aapl_df_finrl.head()

Price,date,close,high,low,open,volume,tic,day
0,2020-01-02,72.620842,72.681289,71.373218,71.627092,135480400,aapl,3
1,2020-01-03,71.91481,72.676439,71.68995,71.84711,146322800,aapl,4
2,2020-01-06,72.487846,72.526533,70.783248,71.034709,118387200,aapl,0
3,2020-01-07,72.146935,72.753816,71.926907,72.497522,108872000,aapl,1
4,2020-01-08,73.307533,73.609768,71.849555,71.849555,132079200,aapl,2


## Data for the chosen tickers

In [33]:
NSE_TICKERS =  [
    "RELIANCE.NS", "TCS.NS", "INFY.NS", "HDFCBANK.NS", "ICICIBANK.NS",
    "HINDUNILVR.NS", "LT.NS", "SBIN.NS", "BHARTIARTL.NS", "ITC.NS",
    "KOTAKBANK.NS", "ASIANPAINT.NS", "AXISBANK.NS", "HCLTECH.NS", "BAJFINANCE.NS",
    "MARUTI.NS", "SUNPHARMA.NS", "WIPRO.NS", "ULTRACEMCO.NS", "NESTLEIND.NS",
    "NTPC.NS", "POWERGRID.NS", "JSWSTEEL.NS", "TITAN.NS", "TATAMOTORS.NS",
    "ADANIENT.NS", "ADANIPORTS.NS", "COALINDIA.NS", "BAJAJFINSV.NS", "TATASTEEL.NS",
    "TECHM.NS", "HDFCLIFE.NS", "GRASIM.NS", "HINDALCO.NS", "BPCL.NS",
    "EICHERMOT.NS", "BRITANNIA.NS", "HEROMOTOCO.NS", "DRREDDY.NS", "DIVISLAB.NS",
    "CIPLA.NS", "INDUSINDBK.NS", "BAJAJ-AUTO.NS", "ONGC.NS", "SBILIFE.NS",
    "ICICIPRULI.NS", "TATACONSUM.NS", "APOLLOHOSP.NS", "HDFCAMC.NS", "M&M.NS"
]

In [34]:
TRAIN_START_DATE = '2016-01-01'
TRAIN_END_DATE = '2020-01-01'
TRADE_START_DATE = '2020-01-01'
TRADE_END_DATE = '2023-01-01'


In [35]:
df_raw = YahooDownloader(start_date = TRAIN_START_DATE,
                     end_date = TRADE_END_DATE,
                     ticker_list = NSE_TICKERS).fetch_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Shape of DataFrame:  (84778, 8)


In [36]:
df_raw.head()

Price,date,close,high,low,open,volume,tic,day
0,2016-01-01,47.546936,47.863565,43.457161,44.116801,10963906,ADANIENT.NS,4
1,2016-01-01,253.466049,254.176569,246.408238,247.260855,1347893,ADANIPORTS.NS,4
2,2016-01-01,1414.610962,1433.203829,1403.184019,1412.383638,107024,APOLLOHOSP.NS,4
3,2016-01-01,816.870239,823.237854,815.150535,819.891381,294006,ASIANPAINT.NS,4
4,2016-01-01,439.014679,441.942094,434.769921,438.868314,3345654,AXISBANK.NS,4


# Part 3: Preprocess Data
We need to check for missing data and do feature engineering to convert the data point into a state.
* **Adding technical indicators**. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.
* **Adding turbulence index**. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price.

Hear let's take **MACD** as an example. Moving average convergence/divergence (MACD) is one of the most commonly used indicator showing bull and bear market. Its calculation is based on EMA (Exponential Moving Average indicator, measuring trend direction over a period of time.)

In [37]:
fe = FeatureEngineer(use_technical_indicator=True,
                     tech_indicator_list = INDICATORS,
                     use_vix=False,#volatility index
                     use_turbulence=True,
                     user_defined_feature = False)

processed = fe.preprocess_data(df_raw)

Successfully added technical indicators
Successfully added turbulence index


In [38]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

In [39]:
processed_full.head()

Unnamed: 0,date,tic,close,high,low,open,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,turbulence
0,2016-01-01,ADANIENT.NS,47.546936,47.863565,43.457161,44.116801,10963906.0,4.0,0.0,49.959148,42.496159,0.0,-66.666667,100.0,47.546936,47.546936,0.0
1,2016-01-01,ADANIPORTS.NS,253.466049,254.176569,246.408238,247.260855,1347893.0,4.0,0.0,49.959148,42.496159,0.0,-66.666667,100.0,253.466049,253.466049,0.0
2,2016-01-01,APOLLOHOSP.NS,1414.610962,1433.203829,1403.184019,1412.383638,107024.0,4.0,0.0,49.959148,42.496159,0.0,-66.666667,100.0,1414.610962,1414.610962,0.0
3,2016-01-01,ASIANPAINT.NS,816.870239,823.237854,815.150535,819.891381,294006.0,4.0,0.0,49.959148,42.496159,0.0,-66.666667,100.0,816.870239,816.870239,0.0
4,2016-01-01,AXISBANK.NS,439.014679,441.942094,434.769921,438.868314,3345654.0,4.0,0.0,49.959148,42.496159,0.0,-66.666667,100.0,439.014679,439.014679,0.0


# Part 4: Save the Data

### Split the data for training and trading

In [40]:
train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)
print(len(train))
print(len(trade))

45218
34362


### Save data to csv file

For Colab users, you can open the virtual directory in colab and manually download the files.

For users running on your local environment, the csv files should be at the same directory of this notebook.

In [41]:
train.to_csv('train_data.csv')
trade.to_csv('trade_data.csv')

# Part 5: Additional Data Processing for Dynamic Weighting

For dynamic weighting, we need to prepare additional features that will help predict strategy performance. This includes volatility measures, momentum indicators, and market regime features.

In [42]:
# Additional imports for dynamic weighting
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

print("✅ Dynamic weighting libraries loaded successfully!")

✅ Dynamic weighting libraries loaded successfully!


In [43]:
def create_market_regime_features(df):
    """
    Create market regime features for dynamic weighting
    These features help predict which RL strategies will perform better
    """
    print("🔄 Creating market regime features...")

    # Calculate market-wide indicators
    market_data = df.groupby('date').agg({
        'close': 'mean',
        'volume': 'sum',
        'turbulence': 'mean'
    }).reset_index()

    # Market volatility (rolling 5-day)
    market_data['market_volatility'] = market_data['close'].pct_change().rolling(5).std()

    # Market momentum (rolling 10-day)
    market_data['market_momentum'] = market_data['close'].pct_change().rolling(10).mean()

    # Market trend (20-day moving average slope)
    market_data['ma_20'] = market_data['close'].rolling(20).mean()
    market_data['market_trend'] = market_data['ma_20'].diff()

    # Volume pressure
    market_data['volume_pressure'] = market_data['volume'].rolling(5).mean() / market_data['volume'].rolling(20).mean()

    # Market regime classification
    market_data['regime_bull'] = (market_data['market_momentum'] > 0.001).astype(int)
    market_data['regime_bear'] = (market_data['market_momentum'] < -0.001).astype(int)
    market_data['regime_sideways'] = ((market_data['market_momentum'] >= -0.001) &
                                     (market_data['market_momentum'] <= 0.001)).astype(int)

    # Fill NaN values
    market_data = market_data.fillna(0)

    # Merge back with original data
    df_enhanced = df.merge(market_data[['date', 'market_volatility', 'market_momentum',
                                       'market_trend', 'volume_pressure', 'regime_bull',
                                       'regime_bear', 'regime_sideways']],
                          on='date', how='left')

    print(f"✅ Added {len(market_data.columns) - 4} market regime features")
    return df_enhanced


In [44]:
def create_technical_features(df):
    """
    Create additional technical features for each stock
    """
    print("🔄 Creating additional technical features...")

    df_tech = df.copy()

    # Group by ticker to calculate stock-specific features
    for ticker in df_tech['tic'].unique():
        mask = df_tech['tic'] == ticker

        # Price-based features
        df_tech.loc[mask, 'price_momentum_3'] = df_tech.loc[mask, 'close'].pct_change(3)
        df_tech.loc[mask, 'price_momentum_5'] = df_tech.loc[mask, 'close'].pct_change(5)
        df_tech.loc[mask, 'price_volatility_5'] = df_tech.loc[mask, 'close'].pct_change().rolling(5).std()
        df_tech.loc[mask, 'price_volatility_10'] = df_tech.loc[mask, 'close'].pct_change().rolling(10).std()

        # Volume-based features
        df_tech.loc[mask, 'volume_ma_ratio'] = (df_tech.loc[mask, 'volume'] /
                                               df_tech.loc[mask, 'volume'].rolling(20).mean())

        # Bollinger Bands
        rolling_mean = df_tech.loc[mask, 'close'].rolling(20).mean()
        rolling_std = df_tech.loc[mask, 'close'].rolling(20).std()
        df_tech.loc[mask, 'bb_upper'] = rolling_mean + (rolling_std * 2)
        df_tech.loc[mask, 'bb_lower'] = rolling_mean - (rolling_std * 2)
        df_tech.loc[mask, 'bb_position'] = ((df_tech.loc[mask, 'close'] - df_tech.loc[mask, 'bb_lower']) /
                                           (df_tech.loc[mask, 'bb_upper'] - df_tech.loc[mask, 'bb_lower']))

        # Support/Resistance levels
        df_tech.loc[mask, 'support_level'] = df_tech.loc[mask, 'low'].rolling(20).min()
        df_tech.loc[mask, 'resistance_level'] = df_tech.loc[mask, 'high'].rolling(20).max()
        df_tech.loc[mask, 'support_distance'] = (df_tech.loc[mask, 'close'] - df_tech.loc[mask, 'support_level']) / df_tech.loc[mask, 'close']
        df_tech.loc[mask, 'resistance_distance'] = (df_tech.loc[mask, 'resistance_level'] - df_tech.loc[mask, 'close']) / df_tech.loc[mask, 'close']

    # Fill NaN values
    df_tech = df_tech.fillna(0)

    print("✅ Additional technical features created")
    return df_tech


In [45]:
def prepare_dynamic_weighting_data(df):
    """
    Prepare comprehensive dataset for dynamic weighting system
    """
    print("🚀 Preparing data for dynamic weighting system...")

    # Create market regime features
    df_market = create_market_regime_features(df)

    # Create additional technical features
    df_final = create_technical_features(df_market)

    # Create feature summary
    feature_cols = [col for col in df_final.columns if col not in ['date', 'tic', 'day']]

    print(f"\n📊 Dynamic Weighting Dataset Summary:")
    print(f"   • Total features: {len(feature_cols)}")
    print(f"   • Market regime features: 7")
    print(f"   • Technical features: {len([col for col in feature_cols if any(x in col for x in ['momentum', 'volatility', 'bb_', 'support', 'resistance'])])}")
    print(f"   • Original FinRL features: {len(feature_cols) - 7 - 12}")
    print(f"   • Date range: {df_final['date'].min()} to {df_final['date'].max()}")
    print(f"   • Total records: {len(df_final)}")

    return df_final


In [46]:
# Apply dynamic weighting data preparation
print("🔄 Enhancing dataset for dynamic weighting...")
processed_dw = prepare_dynamic_weighting_data(processed_full)

print("✅ Dynamic weighting data preparation complete!")

🔄 Enhancing dataset for dynamic weighting...
🚀 Preparing data for dynamic weighting system...
🔄 Creating market regime features...
✅ Added 8 market regime features
🔄 Creating additional technical features...
✅ Additional technical features created

📊 Dynamic Weighting Dataset Summary:
   • Total features: 33
   • Market regime features: 7
   • Technical features: 13
   • Original FinRL features: 14
   • Date range: 2016-01-01 to 2022-12-30
   • Total records: 79580
✅ Dynamic weighting data preparation complete!


# Part 6: Enhanced Data Splitting and Saving

### Split the data for training and trading (Enhanced)

In [47]:
# Split data with enhanced features
train_dw = data_split(processed_dw, TRAIN_START_DATE, TRAIN_END_DATE)
trade_dw = data_split(processed_dw, TRADE_START_DATE, TRADE_END_DATE)

print(f"Enhanced Training Data: {len(train_dw)} records")
print(f"Enhanced Trading Data: {len(trade_dw)} records")

# Also keep original data for compatibility
train = data_split(processed_full, TRAIN_START_DATE, TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE, TRADE_END_DATE)

print(f"Original Training Data: {len(train)} records")
print(f"Original Trading Data: {len(trade)} records")

Enhanced Training Data: 45218 records
Enhanced Trading Data: 34362 records
Original Training Data: 45218 records
Original Trading Data: 34362 records


In [48]:
# Save enhanced datasets
print("💾 Saving enhanced datasets...")

# Save enhanced data for dynamic weighting
train_dw.to_csv('train_data_enhanced.csv')
trade_dw.to_csv('trade_data_enhanced.csv')

# Save original data (for compatibility)
train.to_csv('train_data.csv')
trade.to_csv('trade_data.csv')

print("✅ All datasets saved successfully!")
print("\n📁 Files created:")
print("   • train_data.csv (original)")
print("   • trade_data.csv (original)")
print("   • train_data_enhanced.csv (for dynamic weighting)")
print("   • trade_data_enhanced.csv (for dynamic weighting)")

💾 Saving enhanced datasets...
✅ All datasets saved successfully!

📁 Files created:
   • train_data.csv (original)
   • trade_data.csv (original)
   • train_data_enhanced.csv (for dynamic weighting)
   • trade_data_enhanced.csv (for dynamic weighting)


# Part 7: Data Visualization for Dynamic Weighting

In [49]:
# Visualize market regime features
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Market volatility over time
market_summary = processed_dw.groupby('date').first().reset_index()
axes[0, 0].plot(pd.to_datetime(market_summary['date']), market_summary['market_volatility'])
axes[0, 0].set_title('Market Volatility Over Time')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Volatility')
axes[0, 0].grid(True, alpha=0.3)

# Market momentum
axes[0, 1].plot(pd.to_datetime(market_summary['date']), market_summary['market_momentum'])
axes[0, 1].set_title('Market Momentum Over Time')
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Momentum')
axes[0, 1].grid(True, alpha=0.3)

# Market regimes
regime_data = market_summary[['regime_bull', 'regime_bear', 'regime_sideways']].sum()
axes[1, 0].bar(regime_data.index, regime_data.values, color=['green', 'red', 'gray'])
axes[1, 0].set_title('Market Regime Distribution')
axes[1, 0].set_ylabel('Days')

# Turbulence vs Market Volatility
axes[1, 1].scatter(market_summary['turbulence'], market_summary['market_volatility'], alpha=0.6)
axes[1, 1].set_title('Turbulence vs Market Volatility')
axes[1, 1].set_xlabel('Turbulence')
axes[1, 1].set_ylabel('Market Volatility')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("📊 Market regime visualization complete!")

📊 Market regime visualization complete!


In [50]:
# Feature correlation heatmap
print("🔍 Analyzing feature correlations...")

# Select key features for correlation analysis
key_features = ['market_volatility', 'market_momentum', 'market_trend', 'volume_pressure',
               'turbulence', 'macd', 'rsi_30', 'cci_30', 'dx_30']

# Get correlation matrix
sample_data = processed_dw.groupby('date').first()[key_features].fillna(0)
correlation_matrix = sample_data.corr()

# Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
            fmt='.2f', square=True, cbar_kws={'label': 'Correlation'})
plt.title('Feature Correlation Matrix for Dynamic Weighting')
plt.tight_layout()
plt.show()

print("✅ Feature correlation analysis complete!")
print("\n🎯 Key Insights:")
print("   • Market volatility and momentum are key predictive features")
print("   • Technical indicators provide complementary information")
print("   • Low correlation between features suggests good feature diversity")
print("   • Data is ready for dynamic weighting model training!")


🔍 Analyzing feature correlations...
✅ Feature correlation analysis complete!

🎯 Key Insights:
   • Market volatility and momentum are key predictive features
   • Technical indicators provide complementary information
   • Low correlation between features suggests good feature diversity
   • Data is ready for dynamic weighting model training!
