# TRADING ENV 

**Goal:** Use `data/prices_returns.csv` to simulate trading with portfolio weights.

## What to build (minimum)
1. **Load data**
   - Read `data/prices_returns.csv`.
   - Pivot to a matrix `R` of shape `[T, A]` (rows=dates, cols=assets).
   - Keep `dates` and `assets` lists.

2. **Environment class**
   - `__init__(R, dates, assets, window=20, cost_bps=10.0, include_cash=True)`
   - `reset(seed=0)` → set `t=window`, start with equal weights, return last `window` rows of `R` as the observation `(window, A)`.
   - `step(action)` →
     - Softmax the action → weights that sum to 1 (long-only).
     - Compute reward = `weights · returns_today − (cost_bps/1e4)*0.5*sum(|w_t − w_{t-1}|)`.
     - Advance one day, return `(next_obs, reward, done, info)`.

3. **Testing helpers**
   - `equal_weight_policy(n_assets)` and `random_policy(n_assets, seed)`.
   - `rollout(env, policy_fn, out_csv="data/returns_EQW.csv")` to write daily portfolio returns.

## Checks
- Observation shape is always `(window, A)`.
- After softmax: weights sum to 1 and are non-negative.
- No NaNs in observations or rewards.
- Equal-weight rollout produces a sensible return series (dates contiguous).

## Done when
- `TradingEnv` runs from start to finish with the equal-weight policy.
- `data/returns_EQW.csv` is created with columns `date, ret`.


In [1]:
# =======================
# FULL TRADING ENV TEST
# =======================

from trading_env.load_data import load_returns
from trading_env.trading_env import TradingEnv
import numpy as np

print("=== Loading Test Data ===")
R, dates, assets = load_returns("test_data/sample_prices_returns.csv")

print("R shape:", R.shape)
print("Dates:", dates)
print("Assets:", assets)
print("\nR Matrix:\n", R)


# Create environment (use small window since sample CSV has 5 rows)
print("\n=== Creating Environment ===")
env = TradingEnv(R, dates, assets, window=2, cost_bps=10)

# Reset environment
print("\n=== Reset ===")
obs = env.reset(seed=0)
print("Initial observation (shape {}):\n{}".format(obs.shape, obs))


# Test a single step
print("\n=== Step Test ===")
action = np.array([1.0, -2.0, 0.5, 3.0])  # example raw action
next_obs, reward, done, info = env.step(action)

print("Next observation:\n", next_obs)
print("Reward:", reward)
print("Done:", done)
print("Info:", info)


# Run full episode with random policy
print("\n=== Full Episode Rollout ===")
obs = env.reset(seed=1)

done = False
step_i = 0

while not done:
    action = np.random.randn(env.A)  # random raw action vector
    next_obs, reward, done, info = env.step(action)

    print(f"\n--- Step {step_i} ---")
    print("Date:", info["date"])
    print("Weights:", info["weights"])
    print("Raw return:", info["raw_return"])
    print("Turnover:", info["turnover"])
    print("Cost:", info["transaction_cost"])
    print("Reward:", reward)
    print("Done:", done)

    step_i += 1

print("\n=== TEST COMPLETE ===")

=== Loading Test Data ===
R shape: (5, 4)
Dates: [Timestamp('2020-01-01 00:00:00'), Timestamp('2020-01-02 00:00:00'), Timestamp('2020-01-03 00:00:00'), Timestamp('2020-01-04 00:00:00'), Timestamp('2020-01-05 00:00:00')]
Assets: ['GLD', 'QQQ', 'SPY', 'TLT']

R Matrix:
 [[    nan     nan     nan     nan]
 [ 0.0014  0.0076  0.0036  0.0029]
 [-0.0007 -0.0033 -0.002  -0.0036]
 [ 0.0027  0.0062  0.0049  0.0014]
 [-0.0014 -0.0009  0.0028  0.005 ]]

=== Creating Environment ===

=== Reset ===
Initial observation (shape (2, 4)):
[[   nan    nan    nan    nan]
 [0.0014 0.0076 0.0036 0.0029]]

=== Step Test ===
Next observation:
 [[ 0.0014  0.0076  0.0036  0.0029]
 [-0.0007 -0.0033 -0.002  -0.0036]]
Reward: -0.0037373439605577685
Done: False
Info: {'date': Timestamp('2020-01-03 00:00:00'), 'weights': array([0.11055375, 0.00550415, 0.06705424, 0.81688786]), 'raw_return': np.float64(-0.003170456097187703), 'turnover': np.float64(1.1337757267401312), 'transaction_cost': np.float64(0.0005668878633700