<a href="https://colab.research.google.com/github/Karthikbalaji99/AgenticAI-Trader/blob/main/AgenticAITrading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Building an AI Agent using Agentic AI**

Necessary Imports
- Yahoo finance - yf our dataset
- Apple - APPL is the stock name
- torch is the dl framework

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import random
from collections import deque

# define stock symbol and time period
symbol = "AAPL"
start_date = "2020-01-01"
end_date = "2025-02-14"

# download historical data
data = yf.download(symbol, start=start_date, end=end_date)

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  1 of 1 completed


Feature Engineering

In [2]:
data['SMA_5'] = data['Close'].rolling(window=5).mean()
data['SMA_20'] = data['Close'].rolling(window=20).mean()
data['Returns'] = data['Close'].pct_change()

Now, let’s drop missing values and reset the index:

In [3]:
data.dropna(inplace=True)
data.reset_index(drop=True, inplace=True)

##Action Space
**This Agentic AI will have 3 possible actions as follows:**
- *Hold* : Do Nothing
- *Buy* : Purchase Stocks
- *Sell* : Sell held stocks

This action space will be used to train the reinforcement learning model.

In [4]:
ACTIONS = {0: "HOLD", 1: "BUY", 2: "SELL"}

##Extraction of State from the data

**This function extracts the state representation from the dataset at a given time index. The state is an array containing:**

- Closing Price
- 5-day SMA
- 20-day SMA
- Daily Return percentage

This numerical representation of the stock market is fed into the AI model to make trading decisions

In [5]:
def get_state(data, index):
    return np.array([
        float(data.loc[index, 'Close']),
        float(data.loc[index, 'SMA_5']),
        float(data.loc[index, 'SMA_20']),
        float(data.loc[index, 'Returns'])
    ])

##Building The Trading Environment for our AI Agent

We will define a trading environment to interact with the Deep Q-Network (DQN) AI agent, which will allow it to learn how to trade stocks profitably. The environment is implemented as a class that simulates the stock market. It tracks the agent’s balance, holdings, and current market index, and it provides new states and rewards in response to the agent’s actions.

In [6]:
class TradingEnvironment:
    def __init__(self, data):
        self.data = data
        self.initial_balance = 10000
        self.balance = self.initial_balance
        self.holdings = 0
        self.index = 0

    def reset(self):
        self.balance = self.initial_balance
        self.holdings = 0
        self.index = 0
        return get_state(self.data, self.index)

    def step(self, action):
        price = float(self.data.loc[self.index, 'Close'])
        reward = 0

        if action == 1 and self.balance >= price:  # BUY
            self.holdings = self.balance // price
            self.balance -= self.holdings * price
        elif action == 2 and self.holdings > 0:  # SELL
            self.balance += self.holdings * price
            self.holdings = 0

        self.index += 1
        done = self.index >= len(self.data) - 1

        if done:
            reward = self.balance - self.initial_balance

        next_state = get_state(self.data, self.index) if not done else None
        return next_state, reward, done, {}

##The Deep Q-Network (DQN)
DQN is a neural network that approximates the Q-values for each state-action pair. We will now define the neural network architecture for our Deep Q-Network. It will be responsible for predicting the best trading actions based on the stock market state. Here we are building a **Deep Q-Network** using **PyTorch** to optimize stock trading decisions. The model features a **three-layer neural network** to predict trading actions, leveraging **ReLU activation** to enhance learning efficiency.

It outputs **Q-values**, which the agent utilizes to determine the best action: buy, sell, or hold, based on market conditions.

In [7]:
class DQN(nn.Module):
    def __init__(self, state_size, action_size):
        super(DQN, self).__init__()
        self.fc1 = nn.Linear(state_size, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, action_size)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

##THE DQN AGENT

Implement the AI agent that learns how to trade stocks using Deep Q-Learning. The DQN Agent will interact with the trading environment, make trading decisions (BUY, SELL, HOLD), store experiences, and learn from past experiences to improve future decisions. So, we are developing a Deep Q-Learning Agent to interact with the stock market environment to enhance its decision-making through Experience Replay, which stores and reuses past experiences for training. The agent effectively balances Exploration vs. Exploitation, taking random actions initially and making smarter decisions as learning progresses.

In [8]:
# DQN agent
class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95  # Discount factor
        self.epsilon = 1.0  # Exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = DQN(state_size, action_size)
        self.optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)
        self.criterion = nn.MSELoss()

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if random.uniform(0, 1) < self.epsilon:
            return random.choice(list(ACTIONS.keys()))
        state = torch.FloatTensor(state).unsqueeze(0)
        with torch.no_grad():
            q_values = self.model(state)
        return torch.argmax(q_values).item()

    def replay(self, batch_size):
        if len(self.memory) < batch_size:
            return
        minibatch = random.sample(self.memory, batch_size)

        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                next_state_tensor = torch.FloatTensor(next_state).unsqueeze(0)
                target += self.gamma * torch.max(self.model(next_state_tensor)).item()

            state_tensor = torch.FloatTensor(state).unsqueeze(0)
            target_tensor = self.model(state_tensor).clone().detach()
            target_tensor[0][action] = target

            self.optimizer.zero_grad()
            output = self.model(state_tensor)
            loss = self.criterion(output, target_tensor)
            loss.backward()
            self.optimizer.step()

        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

##TRAINING THE AI AGENT

Training involves running multiple episodes where the agent interacts with the environment, learns from experience, and updates its model. Let’s train the agent.
Here, we are going to train the AI Trading Agent using Deep Q-Learning, simulating 500 trading sessions where the agent will learn from experience. It will leverage Exploration & Exploitation, initially taking random actions before making more informed decisions as training progresses.

Experience Replay will be used to store past experiences, allowing the neural network to learn through batch training. Throughout the process, we are going to track rewards to measure the agent’s performance improvements over time.

In [9]:
env = TradingEnvironment(data)
agent = DQNAgent(state_size=4, action_size=3)
batch_size = 32
episodes = 500
total_rewards = []

for episode in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0

    while not done:
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        agent.remember(state, action, reward, next_state, done)
        state = next_state
        total_reward += reward

    agent.replay(batch_size)
    total_rewards.append(total_reward)
    print(f"Episode {episode+1}/{episodes}, Total Reward: {total_reward}")

print("Training Complete!")

  float(data.loc[index, 'Close']),
  float(data.loc[index, 'SMA_5']),
  float(data.loc[index, 'SMA_20']),
  float(data.loc[index, 'Returns'])
  price = float(self.data.loc[self.index, 'Close'])


Episode 1/500, Total Reward: -9878.509773254395
Episode 2/500, Total Reward: -9899.969707489014
Episode 3/500, Total Reward: 13075.737785339355
Episode 4/500, Total Reward: -9985.201580047607
Episode 5/500, Total Reward: -9929.798652648926
Episode 6/500, Total Reward: -9908.572212219238
Episode 7/500, Total Reward: -9890.78530883789
Episode 8/500, Total Reward: -9785.760837554932
Episode 9/500, Total Reward: -9991.82260131836
Episode 10/500, Total Reward: -9842.891246795654
Episode 11/500, Total Reward: -9753.45905303955
Episode 12/500, Total Reward: -9849.069747924805
Episode 13/500, Total Reward: -9930.69102859497
Episode 14/500, Total Reward: -9840.638595581055
Episode 15/500, Total Reward: -9887.850200653076
Episode 16/500, Total Reward: -9889.471538543701
Episode 17/500, Total Reward: -9933.722175598145
Episode 18/500, Total Reward: -9942.960487365723
Episode 19/500, Total Reward: -9868.194023132324
Episode 20/500, Total Reward: -9871.885261535645
Episode 21/500, Total Reward: -99

##TESTING


In [10]:
test_env = TradingEnvironment(data)
state = test_env.reset()
done = False

# simulate a trading session using the trained agent
while not done:
    # always choose the best action (exploitation)
    action = agent.act(state)
    next_state, reward, done, _ = test_env.step(action)
    state = next_state if next_state is not None else state

final_balance = test_env.balance
profit = final_balance - test_env.initial_balance
print(f"Final Balance after testing: ${final_balance:.2f}")
print(f"Total Profit: ${profit:.2f}")

  float(data.loc[index, 'Close']),
  float(data.loc[index, 'SMA_5']),
  float(data.loc[index, 'SMA_20']),
  float(data.loc[index, 'Returns'])
  price = float(self.data.loc[self.index, 'Close'])


Final Balance after testing: $13723.74
Total Profit: $3723.74


##SUMMARY
The agent started with **10,000 dollars**, and ended with **13723.74 dollars**. **Profit = $3723.74**, meaning the agent made a great positive return.

So we explored how to build an AI trading agent using Agentic AI and Deep Q-Learning, enabling it to make autonomous trading decisions. After training, our AI agent successfully generated a positive profit, which demonstrates its ability to navigate market fluctuations.