### A Simple AI Trading Bot (Q-Learning Based)

Here‚Äôs how your ultra-simple Q-learning trading robot works:  

1Ô∏è‚É£ **Reads Stock Data** ‚Äì It loads historical stock prices from `sample_stock_data.csv`, where each row represents a day's market data.  

2Ô∏è‚É£ **Defines Actions** ‚Äì The robot can either **Buy** or **Sell** at each time step.  

3Ô∏è‚É£ **Initializes Q-Table** ‚Äì A table with **(rows-1) states** storing values for Buy & Sell decisions, initialized to zeros.  

4Ô∏è‚É£ **Chooses Actions** ‚Äì Uses **exploration (random choice) or exploitation (max Q-value)** to decide Buy/Sell.  

5Ô∏è‚É£ **Calculates Rewards** ‚Äì If buying results in future profit, **positive reward**; if loss, **negative reward**.  

6Ô∏è‚É£ **Updates Q-Table** ‚Äì Uses **Q-learning formula** to refine Buy/Sell values over iterations.  

7Ô∏è‚É£ **Learns Over Time** ‚Äì By iterating across stock data multiple times, it **adjusts Q-values** for better decisions.  

8Ô∏è‚É£ **Makes Trading Decisions** ‚Äì At each row (day), it **compares Q(Buy) & Q(Sell)** and picks the action with the higher value.  

9Ô∏è‚É£ **Handles Market Closure** ‚Äì Decisions are taken **at day-end**, so execution happens at the **next market open**.  

üîü **Optimizes with Hyperparameters** ‚Äì Learning rate, discount factor, and exploration rate impact **how well it adapts** to stock trends.  

### Generate data in the required format

In [3]:
import pandas as pd  # Import Pandas for handling stock data efficiently  
import random  # Import Random for selecting actions in Q-learning exploration  
from datetime import datetime, timedelta  # Import DateTime for handling date-based stock data  

# List of 50 real-world stock symbols
stocks = [
    "AAPL", "GOOGL", "AMZN", "TSLA", "MSFT", "NFLX", "META", "NVDA", "BABA", "DIS",
    "V", "JPM", "PYPL", "MA", "INTC", "IBM", "ORCL", "CSCO", "ADBE", "AMD",
    "UBER", "LYFT", "SQ", "SHOP", "TWTR", "SNAP", "PINS", "ZM", "DOCU", "ROKU",
    "BA", "GE", "CAT", "MMM", "F", "GM", "NKE", "KO", "PEP", "MCD",
    "WMT", "TGT", "HD", "LOW", "COST", "PG", "JNJ", "MRNA", "PFE", "BMY"
]

# Generate dummy stock market data
num_days = 10  # Number of days per stock
start_date = datetime(2024, 1, 1)  # Start date
data = []

# Generate data for each stock over 10 days
for stock in stocks:
    for i in range(num_days):
        date = start_date + timedelta(days=i)  # Increment date
        open_price = round(random.uniform(100, 1000), 2)  # Random open price
        high_price = round(open_price + random.uniform(5, 50), 2)  # High slightly above open
        low_price = round(open_price - random.uniform(5, 50), 2)  # Low slightly below open
        close_price = round(random.uniform(low_price, high_price), 2)  # Close within range
        volume = random.randint(500000, 50000000)  # Random trading volume
        
        # Append row to data list
        data.append([date.strftime("%Y-%m-%d"), stock, open_price, high_price, low_price, close_price, volume])

# Create DataFrame
df = pd.DataFrame(data, columns=["Date", "Stock", "Open", "High", "Low", "Close", "Volume"])

# Display first few rows
#print(df)
df.to_csv("sample_stock_data_2.csv", index=False)

### A Simple AI Trading Bot (Q-Learning Based)

##### How is AI Used?
- AI Agent: Uses a simple Q-learning approach.
- Random Decision Making: Chooses actions using random exploration.
- Stock Price Learning: Stores price change rewards in a Q-table.
- Ultra-Simple: No complex formulas, just basic learning by reward.
##### Features of This Code:
- Super small (minimal functions).
- No complex formulas, only essential AI logic.
- 100% Procedural, No Classes!
- Fully commented line-by-line.

In [1]:
import numpy as np  # Import NumPy for numerical calculations
import pandas as pd  # Import Pandas to handle stock data
import random  # Import Random for exploration in AI learning

# Load stock data from a CSV file
def load_data(file_path):
    """Reads stock market data from a CSV file."""
    return pd.read_csv(file_path)  # Load and return CSV as Pandas DataFrame

# Initialize Q-table with zeros
def initialize_q_table(states, actions):
    """Creates a Q-table with random initial values."""
    return np.zeros((states, actions))  # Create a table filled with zeros

# Choose action using a simple random approach (basic AI)
def choose_action():
    """Selects a random action (0 = Hold, 1 = Buy/Sell)."""
    return random.choice([0, 1])  # Randomly return Hold (0) or Buy/Sell (1)

# Compute reward based on price change
def get_reward(prev_close, curr_close):
    """Returns reward as price difference (profit/loss)."""
    return curr_close - prev_close  # Profit = Current price - Previous price

# Train the AI agent using Q-learning
def train_ai_agent(data, episodes=100, alpha=0.1, gamma=0.9):
    """Trains AI using Q-learning on stock data."""
    
    num_states = len(data) - 1  # Number of states = number of rows in dataset
    num_actions = 2  # Two actions: Hold (0), Buy/Sell (1)
    
    q_table = initialize_q_table(num_states, num_actions)  # Create Q-table

    for _ in range(episodes):  # Repeat for multiple learning iterations
        state = 0  # Start from the first row
        
        while state < num_states - 1:  # Iterate through the dataset
            action = choose_action()  # Pick a random action (basic AI decision)
            reward = get_reward(data['Close'][state], data['Close'][state + 1])  # Compute reward

            # Q-learning update rule
            q_table[state, action] = (1 - alpha) * q_table[state, action] + alpha * (reward + gamma * np.max(q_table[state + 1]))

            state += 1  # Move to the next state

    return q_table  # Return the trained Q-table

# Run the AI trading bot
def run_trading_bot(file_path):
    """Loads data and trains the AI trading bot."""
    data = load_data(file_path)  # Load stock data
    q_table = train_ai_agent(data)  # Train AI on stock prices
    return q_table  # Return trained model

# Example usage (Replace 'sample_stock_data.csv' with actual file)
q_table = run_trading_bot('sample_stock_data.csv')  # Run the bot with sample data

# Print Q-table for analysis
print(q_table)  # Display learned values

[[ 2.35021325e+02  2.37888906e+02]
 [ 2.37247727e+02  2.43248714e+02]
 [-2.10297740e+02 -2.00083855e+02]
 [-1.21835578e+02 -1.31137521e+02]
 [-4.34331582e+02 -4.34723720e+02]
 [ 3.96614120e+02  3.98426967e+02]
 [ 3.30819447e+02  3.37473019e+02]
 [-1.18617238e+02 -1.40815258e+02]
 [-1.22055915e+02 -1.30237558e+02]
 [ 1.80348950e+01  1.87301990e+01]
 [-5.74975949e+02 -5.72581233e+02]
 [ 5.42224967e+01  3.34421985e+01]
 [ 1.27353766e+02  1.45109353e+02]
 [ 2.17312021e+02  2.29516729e+02]
 [ 5.69659900e+02  5.63720637e+02]
 [ 1.71749349e+02  1.73245738e+02]
 [ 3.45728031e+01  4.10489462e+01]
 [ 2.95939818e+02  2.94020384e+02]
 [-1.49785976e+02 -1.47953013e+02]
 [-2.74009209e+02 -2.79687981e+02]
 [ 1.96238984e+02  1.97893872e+02]
 [ 3.40893825e+02  3.39878278e+02]
 [ 3.20479459e+02  3.22100579e+02]
 [ 1.08223256e+02  8.73171649e+01]
 [-2.04009351e+02 -2.00470549e+02]
 [-3.10810747e+02 -3.12091530e+02]
 [-6.11749366e+01 -6.52846795e+01]
 [ 6.02730680e+01  6.20682903e+01]
 [-1.41760830e+02 -1

The result represents the **Q-table** from a **Q-learning** algorithm.  

##### How to Interpret It:  
1. **Each Row** ‚Üí Represents a **state** in the environment.  
2. **Each Column** ‚Üí Represents a **Q-value** for an action at that state.  
3. **Values** ‚Üí Indicate the **expected reward** for taking that action in that state.  

##### If each column has the same value, it suggests that:  
- The Q-learning algorithm is updating symmetrically.  
- The environment might be deterministic or the learning rate is high.  

Your **Q-table output** suggests that each row represents a **state**, and each column represents an **action**. Since you have **two columns**, we can take it as:  

- **Column 1** ‚Üí `Q(Buy)`  
- **Column 2** ‚Üí `Q(Sell)`  

### **How to Interpret This?**  
For each state (row):  
- If `Q(Buy) > Q(Sell)`, **BUY** the stock.  
- If `Q(Sell) > Q(Buy)`, **SELL** the stock.  
- If they are equal, **Hold or reevaluate** the strategy.  

### **Example Interpretation**  
For row **[ 4.1320e+02  4.1320e+02]`**,  
- `Q(Buy) = 413.20`  
- `Q(Sell) = 413.20`  
üîπ **Hold**, as both actions have the same value.  

For row **[ 5.3687e+02  5.3687e+02]`**,  
- `Q(Buy) = 536.87`  
- `Q(Sell) = 536.87`  
üîπ **Hold**, as both are equal.  

For row **[-5.9064e+02 -5.9064e+02]`,**  
- `Q(Buy) = -590.64`  
- `Q(Sell) = -590.64`  
üîπ **Hold**, since both are negative.  

In [6]:
print(len(q_table))

499


- Buy/sell decisions are made for each row in `sample_stock_data.csv` based on `q_table`. Here's how:  

1. Each row in `q_table` corresponds to a stock state derived from the CSV file.  
2. The two values in each row represent **Q(Sell) and Q(Buy)** for that state.  
3. The action chosen is `argmax(q_table[row])`:  
   - If `Q(Buy) > Q(Sell)`, **Buy** the stock.  
   - If `Q(Sell) > Q(Buy)`, **Sell** the stock.  
4. This decision is applied iteratively across all rows (except possibly the last one).

Each row in `sample_stock_data.csv` represents a **snapshot of stock market conditions at a specific time** (e.g., a day or minute). In business terms, it likely includes:  

1. **Stock Price Data** ‚Äì Opening, closing, high, and low prices.  
2. **Technical Indicators** ‚Äì Moving averages, RSI, MACD, etc.  
3. **Market State** ‚Äì Volume, volatility, trends.  
4. **Q-learning Context** ‚Äì The environment used to decide whether to **Buy, Sell, or Hold** based on learned rewards.  

Each row is a **state** in Q-learning, helping optimize trading strategies.

#### Each row represents a day's snapshot (e.g., 01-01-2024), the buy or sell decision is based on that day's data but executed on the next trading day when the market opens.

### Q-learning algorithm

- **Initialize Q-table**: Create a table with states as rows and actions as columns, filled with zeros.  
- **Choose action**: Use an exploration strategy (random or Œµ-greedy) to select an action at each state.  
- **Take action**: Execute the chosen action and observe the resulting reward and new state.  
- **Update Q-value**: Apply the Q-learning formula
- **Repeat for episodes**: Continue the process over multiple episodes to refine Q-values.  
- **Balance exploration and exploitation**: Use Œµ-greedy strategy to balance trying new actions vs. using known best actions.  
- **Learn optimal policy**: After many iterations, the Q-table converges to an optimal decision-making policy.  
- **Use trained Q-table**: Choose actions with the highest Q-values for new situations.  
- **Handle rewards**: Assign positive rewards for good decisions and negative rewards for bad ones.  
- **Apply in real-world tasks**: Used in robotics, trading, and gaming for decision-making in uncertain environments.