# Reinforcement Learning

**Created by:**

Alejandra Elizabeth Figueroa Arellano

Juan Francisco Cruz Sánchez

### Objective

This report describes the design of a simulated trading environment for training reinforcement learning agents. 

The primary objective is to create an environment where agents can learn and apply trading strategies to maximize portfolio returns while minimizing risk and losses.

Financial markets are complex and highly dynamic environments where decision making requires extensive analysis and adaptability. Reinforcement learning (RL) offers a promising approach to tackle this complexity by enabling agents to learn from interactions with a simulated trading environment, so this project focuses on developing a robust trading environment that supports sequential decision-making for RL algorithms.

### Structure

This report is divided into the following sections:

- Description of the trading environment components.
- Explanation of the state space, action space, and reward system.
- Discussion of the key methods (reset, step, render).
- Data generation process and its integration with the environment.
- Next steps and summary of the results.

-------------------------------------------------------------------------------------------------------------------------------

## Description of the environment

### Action space

The action space defines the set of actions the agent can take at any given time. 

In this environment, the action space is discrete and consists of three actions:

**- Hold (0):** the agent decides to take no action, maintaining its current portfolio position.

**- Buy (1):** the agent purchases a specific quantity of the asset, increasing its position.

**- Sell (2):** the agent sells a specific quantity of the asset, reducing its position.

These actions provide the agent with the fundamental tools to manage its portfolio actively, they represent the essential decisions required in any trading scenario, enabling the agent to adapt its strategy based on market conditions and portfolio performance.

### State space

The state space encompasses all the relevant information that the agent observes at each time step to make informed decisions. 

In this environment, the state space includes:

**- Current price of the asset.**

**- Technical indicators:** metrics such as moving averages, relative strength index (RSI), and bollinger bands to provide insights into market trends.

**- Current position.**

**- Cash balance.**

**- Time step:** the current step in the episode, indicating progress in the trading sequence.

The state space provides a comprehensive view of the trading environment, by combining market data (price and indicators) with internal data (position and cash balance), the agent can evaluate both the market conditions and its trading capacity to make optimal decisions.

### Reward system

The reward system incentivizes the agent to adopt profitable trading strategies while avoiding poor decisions and excessive inactivity. 

The design includes:

**- Positive rewards for profitable trades:** the agent is rewarded for increasing the portfolio value.

**- Negative rewards for unprofitable trades:** the agent is penalized for trades that decrease the portfolio value.

**- Small negative rewards for holding:** to discourage the agent from remaining idle, a small penalty is applied when the agent decides to hold.

The reward system balances short-term and long-term profitability. Positive rewards encourage the agent to identify and exploit profitable opportunities, while penalties for holding ensure that the agent doesn't bypass decision-making to avoid risks.

-------------------------------------------------------------------------------------------------------------------------------

## Methods of our TradingEnv class

### reset ()

The reset method initializes the environment to its starting state at the beginning of each episode. 

It performs the following:

**- Resets the agent's cash balance, position, and portfolio value to default values.**

**- Sets the time step counter to zero.**

**- Returns the initial state of the environment.**

This method ensures that every episode starts under consistent conditions, allowing for fair evaluation and learning.

### step (action)

The step method executes the action selected by the agent and updates the environment. 

It performs the following:

**- Adjusts the state based on the agent’s action** (buying, selling, or holding).

**- Calculates the reward associated with the action.**

**- Checks whether the episode has ended** (end of data or portfolio liquidation).

**- Returns the next state, reward, a boolean indicating whether the episode is done, and additional diagnostic information.**

This method enables sequential decision-making, allowing the agent to learn through interactions with the environment.

### render ()

The render method provides a visual or textual representation of the current state of the environment. 

It includes:

**- Displaying the current balance, position, profit, and other performance metrics.**

**- Optionally providing a graphical representation of the portfolio’s performance.**

This method helps monitor the agent’s behavior and performance during training and evaluation, enabling debugging and optimization.

-------------------------------------------------------------------------------------------------------------------------------

## Data

The following steps were taken to prepare the data:

- Download 10 years of historical stock data were collected and divided into 10 different datasets, each covering one year.

- Generate additional scenarios: using the data, 1,000 additional scenarios were created by introducing random noise and variations to simulate different market conditions.

- Data preprocessing: the data was cleaned, normalized, and transformed into a suitable format for training.

**Integration with the environment**

The datasets serve as the market data for the trading environment, at the beginning of each episode, one dataset is loaded, and the agent interacts with this simulated market.

### Next steps:

- Implementing additional features, such as transaction costs and market impact, to enhance realism.

- Testing various reinforcement learning algorithms with this environment to compare performance.

- Extending the environment to handle multiple assets and portfolio optimization.

-------------------------------------------------------------------------------------------------------------------------------

The development of this trading environment provides a robust platform for training reinforcement learning agents in a realistic and dynamic setting, the well defined action space, state space, and reward system encourage the agent to learn effective trading strategies. 

By using historical market data and simulating various scenarios, this environment ensures comprehensive training for the agent.