# 🤖 AI Trading Agent with Deep Q-Learning

## 📋 Project Overview

This notebook implements an intelligent trading agent using **Deep Q-Network (DQN)** reinforcement learning to automate stock trading decisions. The AI agent learns to maximize profits by making optimal BUY, SELL, or HOLD decisions based on real-time market data and technical indicators.

## 🛠️ Technology Stack

- **Python**: Core programming language
- **PyTorch**: Deep learning framework for neural network implementation
- **yfinance**: Real-time stock market data retrieval
- **NumPy & Pandas**: Data manipulation and analysis

## 📚 Import Libraries:



In [1]:
!pip install torch



In [2]:
!pip install yfinance



In [3]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import random
from collections import deque
import yfinance as yf

In [17]:
# 🔧 Easy Stock Selection - Just change the symbol below!

print("🤖 AI Trading Agent - Stock Configuration")
print("=" * 50)
print("📈 Popular Stock Options:")
print("  💻 Tech Giants: AAPL, MSFT, GOOGL, META")
print("  🎮 AI/Semiconductors: NVDA, AMD, INTC") 
print("  🚗 Electric/Innovation: TSLA, RIVN")
print("  🛒 E-commerce/Streaming: AMZN, NFLX")
print("  📊 Market ETFs: SPY, QQQ")
print("=" * 50)

# 🎯 CHANGE THIS LINE TO SELECT YOUR STOCK:
symbol = "MSFT"  # ← Change this to any stock symbol you want!

# Examples (uncomment the one you want):
# symbol = "AAPL"    # Apple
# symbol = "TSLA"    # Tesla  
# symbol = "GOOGL"   # Google
# symbol = "MSFT"    # Microsoft
# symbol = "AMZN"    # Amazon
# symbol = "META"    # Meta/Facebook
# symbol = "SPY"     # S&P 500 ETF

# Date range configuration
start_date = "2020-01-01"
end_date = "2025-02-14"


print(f"\n🎯 Selected Configuration:")
print(f"   📈 Stock Symbol: {symbol}")
print(f"   📅 Date Range: {start_date} to {end_date}")
print(f"   🤖 AI will learn to trade {symbol} stock!")
print("🚀 Ready to download data and train your AI agent!")

🤖 AI Trading Agent - Stock Configuration
📈 Popular Stock Options:
  💻 Tech Giants: AAPL, MSFT, GOOGL, META
  🎮 AI/Semiconductors: NVDA, AMD, INTC
  🚗 Electric/Innovation: TSLA, RIVN
  🛒 E-commerce/Streaming: AMZN, NFLX
  📊 Market ETFs: SPY, QQQ

🎯 Selected Configuration:
   📈 Stock Symbol: MSFT
   📅 Date Range: 2020-01-01 to 2025-02-14
   🤖 AI will learn to trade MSFT stock!
🚀 Ready to download data and train your AI agent!


## 📥 Importing the Dataset:

Downloading the historical data:

In [18]:
data = yf.download(symbol, start = start_date, end = end_date)

  data = yf.download(symbol, start = start_date, end = end_date)
[*********************100%***********************]  1 of 1 completed


In [19]:
data.head(5)

Price,Close,High,Low,Open,Volume
Ticker,MSFT,MSFT,MSFT,MSFT,MSFT
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,152.791138,152.895777,150.612762,151.040826,22622100
2020-01-03,150.888626,152.153802,150.355924,150.60326,21116200
2020-01-06,151.278625,151.345221,148.88145,149.423674,20813700
2020-01-07,149.899338,151.887465,149.652016,151.554533,21634100
2020-01-08,152.286987,152.962387,150.251294,151.183524,27746500


In [20]:
data.tail(5)

Price,Close,High,Low,Open,Volume
Ticker,MSFT,MSFT,MSFT,MSFT,MSFT
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2025-02-07,407.510773,416.362129,405.869796,414.204005,22886800
2025-02-10,409.967285,413.189569,408.674402,411.449133,20817900
2025-02-11,409.191528,410.235778,407.063209,407.401377,18140600
2025-02-12,406.804657,408.505304,402.160165,404.984641,19121700
2025-02-13,408.296478,408.753956,404.139298,404.775815,23891700


In [21]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1287 entries, 2020-01-02 to 2025-02-13
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   (Close, MSFT)   1287 non-null   float64
 1   (High, MSFT)    1287 non-null   float64
 2   (Low, MSFT)     1287 non-null   float64
 3   (Open, MSFT)    1287 non-null   float64
 4   (Volume, MSFT)  1287 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 60.3 KB


In [22]:
data.describe()

Price,Close,High,Low,Open,Volume
Ticker,MSFT,MSFT,MSFT,MSFT,MSFT
count,1287.0,1287.0,1287.0,1287.0,1287.0
mean,290.539006,293.328775,287.490299,290.464356,28494570.0
std,83.06181,83.345099,82.69291,83.121283,12684390.0
min,129.171295,134.083642,126.405093,130.6879,7164500.0
25%,229.263847,232.728552,226.43922,228.973963,20329200.0
50%,277.421692,280.525861,274.028938,277.534621,25296100.0
75%,346.300491,350.519538,339.587586,344.906182,32768100.0
max,463.240967,464.023678,460.16961,462.686155,97012700.0


In [23]:
len(data)

1287

## ⚙️ Feature Engineering:

Calculating technical indicators that help the AI agent make better trading decisions:

In [24]:
data['SMA_5'] = data['Close'].rolling(window = 5).mean()
data['SMA_20'] = data['Close'].rolling(window = 20).mean()
data['Returns'] = data['Close'].pct_change()

In [25]:
data

Price,Close,High,Low,Open,Volume,SMA_5,SMA_20,Returns
Ticker,MSFT,MSFT,MSFT,MSFT,MSFT,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
2020-01-02,152.791138,152.895777,150.612762,151.040826,22622100,,,
2020-01-03,150.888626,152.153802,150.355924,150.603260,21116200,,,-0.012452
2020-01-06,151.278625,151.345221,148.881450,149.423674,20813700,,,0.002585
2020-01-07,149.899338,151.887465,149.652016,151.554533,21634100,,,-0.009118
2020-01-08,152.286987,152.962387,150.251294,151.183524,27746500,151.428943,,0.015928
...,...,...,...,...,...,...,...,...
2025-02-07,407.510773,416.362129,405.869796,414.204005,22886800,410.176135,423.348254,-0.014598
2025-02-10,409.967285,413.189569,408.674402,411.449133,20817900,410.434717,423.013594,0.006028
2025-02-11,409.191528,410.235778,407.063209,407.401377,18140600,410.249731,422.727664,-0.001892
2025-02-12,406.804657,408.505304,402.160165,404.984641,19121700,409.404370,422.397975,-0.005833


## 🧹 Data Cleaning:

Dropping the missing values in the first 4 rows of SMA_5, first 19 rows of SMA_20 and first row of returns. 

Also reset the index by replacing dates with index numbers:

In [26]:
data.dropna(inplace = True)
data.reset_index(drop = True, inplace = True)

In [27]:
data

Price,Close,High,Low,Open,Volume,SMA_5,SMA_20,Returns
Ticker,MSFT,MSFT,MSFT,MSFT,MSFT,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,164.358414,165.566516,162.465404,165.566516,51597500,158.593790,155.709109,0.028208
1,161.932739,163.996969,161.314427,163.816242,36142700,159.581198,156.166189,-0.014758
2,165.880417,165.994563,162.094399,162.122935,30107000,161.883243,156.915778,0.024379
3,171.340652,171.835311,167.716360,168.505907,36433300,164.672333,157.918880,0.032917
4,171.131393,175.221808,169.714028,175.060096,39186300,166.928723,158.980482,-0.001221
...,...,...,...,...,...,...,...,...
1263,407.510773,416.362129,405.869796,414.204005,22886800,410.176135,423.348254,-0.014598
1264,409.967285,413.189569,408.674402,411.449133,20817900,410.434717,423.013594,0.006028
1265,409.191528,410.235778,407.063209,407.401377,18140600,410.249731,422.727664,-0.001892
1266,406.804657,408.505304,402.160165,404.984641,19121700,409.404370,422.397975,-0.005833


## 🎯 Action Space:

The AI agent has three possible actions:

**HOLD:** Do nothing.

**BUY:** Purchase stocks.

**SELL:** Sell held stocks.

This action space is used to train the **Reinforcement Learning model**


In [28]:
ACTIONS = {0: "HOLD", 1: "BUY", 2: "SELL"}

## 🤖 State Of The Agent:

This function extracts the state representation from the dataset at a given time index. The state is an array containing:

- **Closing price**
- **5-day SMA**
- **20-day SMA**
- **Daily return percentage**


In [29]:
# Function to get the state of the agent:

def get_state(data, index):
    return np.array([
        float(data.loc[index, 'Close']),
        float(data.loc[index, 'SMA_5']),
        float(data.loc[index, 'SMA_20']),
        float(data.loc[index, 'Returns'])
    ])

## 🏢 Trading Environment:

Building a trading environment to interact with the **Deep Q-Network (DQN) AI agent**, which will allow it to learn how to trade stocks profitably:

In [30]:
class TradingEnvironment:
    def __init__(self, data):
        self.data = data
        self.initial_balance = 10000
        self.balance = self.initial_balance
        self.holdings = 0
        self.index = 0

    def reset(self):
        self.balance = self.initial_balance
        self.holdings = 0
        self.index = 0
        return get_state(self.data, self.index)

    def step(self, action):
        price = float(self.data.loc[self.index, 'Close'])
        reward = 0

        # Buy:
        if action == 1 and self.balance >= price: 
            self.holdings = self.balance // price
            self.balance -= self.holdings * price
        # Sell:
        elif action == 2 and self.holdings > 0:
            self.balance += self.holdings * price
            self.holdings = 0

        self.index += 1
        done = self.index >= len(self.data) - 1

        if done:
            reward = self.balance - self.initial_balance

        next_state = get_state(self.data, self.index) if not done else None
        return next_state, reward, done, {}
            

## 🧠 Deep Q-Network (DQN):

DQN is a neural network that approximates the Q-values for each state-action pair. 

Defining the neural network architecture for our Deep Q-Network, which will be responsible for predicting the best trading actions based on the stock market state:

In [31]:
class DQN(nn.Module):
    def __init__(self, state_size, action_size):
        super(DQN, self).__init__()
        # Three Layer Neural Network:
        self.fc1 = nn.Linear(state_size, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, action_size)

    def forward(self, x):
        # ReLU activation:
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

It outputs Q-values, which the agent will utilize to determine the best action: buy, sell, or hold, based on market conditions.

## 🎮 DQN Agent:

Implementing the AI agent that learns how to trade stocks using Deep Q-Learning. 

The DQN Agent will interact with the trading environment, make trading decisions (BUY, SELL, HOLD), store experiences, and learn from past experiences to improve future decisions:

In [32]:
class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95  # Discount factor
        self.epsilon = 1.0  # Exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = DQN(state_size, action_size)
        self.optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)
        self.criterion = nn.MSELoss()

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if random.uniform(0, 1) < self.epsilon:
            return random.choice(list(ACTIONS.keys()))
        state = torch.FloatTensor(state).unsqueeze(0)
        with torch.no_grad():
            q_values = self.model(state)
        return torch.argmax(q_values).item()

    def replay(self, batch_size):
        if len(self.memory) < batch_size:
            return
        minibatch = random.sample(self.memory, batch_size)

        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                next_state_tensor = torch.FloatTensor(next_state).unsqueeze(0)
                target += self.gamma * torch.max(self.model(next_state_tensor)).item()

            state_tensor = torch.FloatTensor(state).unsqueeze(0)
            target_tensor = self.model(state_tensor).clone().detach()
            target_tensor[0][action] = target

            self.optimizer.zero_grad()
            output = self.model(state_tensor)
            loss = self.criterion(output, target_tensor)
            loss.backward()
            self.optimizer.step()

        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

## 🏋️ Training the AI Agent:

Training involves running multiple episodes where the agent interacts with the environment, learns from experience, and updates its model. 

In [33]:
env = TradingEnvironment(data)
agent = DQNAgent(state_size = 4, action_size = 3)
batch_size = 32
episodes = 500
total_rewards = []

for episode in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0

    while not done:
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        agent.remember(state, action, reward, next_state, done)
        state = next_state
        total_reward += reward

    agent.replay(batch_size)
    total_rewards.append(total_reward)
    print(f"Episode {episode+1}/{episodes}, Total Reward: {total_reward}")

print("Training Complete!")

  float(data.loc[index, 'Close']),
  float(data.loc[index, 'SMA_5']),
  float(data.loc[index, 'SMA_20']),
  float(data.loc[index, 'Returns'])
  price = float(self.data.loc[self.index, 'Close'])


Episode 1/500, Total Reward: -9695.471099853516
Episode 2/500, Total Reward: -9812.841506958008
Episode 3/500, Total Reward: -9776.794815063477
Episode 4/500, Total Reward: -9860.119888305664
Episode 5/500, Total Reward: 6574.522933959961
Episode 6/500, Total Reward: -9787.687606811523
Episode 7/500, Total Reward: -9748.887634277344
Episode 8/500, Total Reward: -9791.44482421875
Episode 9/500, Total Reward: -9800.838317871094
Episode 10/500, Total Reward: -9839.546798706055
Episode 11/500, Total Reward: -9730.093246459961
Episode 12/500, Total Reward: 7410.813034057617
Episode 13/500, Total Reward: -9952.489974975586
Episode 14/500, Total Reward: -9786.74478149414
Episode 15/500, Total Reward: -9646.296310424805
Episode 16/500, Total Reward: -9848.02555847168
Episode 17/500, Total Reward: -9780.802047729492
Episode 18/500, Total Reward: -9869.890899658203
Episode 19/500, Total Reward: -9851.046890258789
Episode 20/500, Total Reward: -9967.473693847656
Episode 21/500, Total Reward: -966

## 🔬 Inferencing the AI Agent:

Testing the agent on new market data by allowing it to make decisions without random exploration:



In [38]:
# Creating a fresh environment instance for testing:
test_env = TradingEnvironment(data)
state = test_env.reset()
done = False

# Simulating a trading session using the trained AI Agent:
while not done:
    # Always choose the best action (exploitation):
    action = agent.act(state)
    next_state, reward, done, _ = test_env.step(action)
    state = next_state if next_state is not None else state

final_balance = test_env.balance
profit = final_balance - test_env.initial_balance
print(f"Final Balance After Testing: ${final_balance:.2f}")
print(f"Total Profit: ${profit:.2f}")

  float(data.loc[index, 'Close']),
  float(data.loc[index, 'SMA_5']),
  float(data.loc[index, 'SMA_20']),
  float(data.loc[index, 'Returns'])
  price = float(self.data.loc[self.index, 'Close'])


Final Balance After Testing: $32908.82
Total Profit: $22908.82


## 📊 Results:

The agent started with: **$10,000**

And ended with: **$32,908.82** 

Profit = $22908.82, meaning the agent made a **great positive return**