# OSRS RL Agent Training Guide

This notebook guides you through setting up the environment, training a Reinforcement Learning (RL) agent for Old School RuneScape (OSRS) PvP, and evaluating its performance using this repository.

## 0. Prerequisites

Before you begin, ensure your system meets the following requirements:

*   **Operating System:**
    *   Linux (Recommended)
    *   macOS
    *   Windows (WSL2 is recommended for a Linux-like environment, as direct Windows support might vary for all components).
*   **Git:** Required for cloning the repository. You can download it from [git-scm.com](https://git-scm.com/).
*   **Conda (Miniconda or Anaconda):** Required for managing Python and Java dependencies.
    *   Download Miniconda from [here](https://docs.conda.io/en/latest/miniconda.html) or Anaconda from [here](https://www.anaconda.com/products/distribution).
*   **Python:** The project uses Python. The specific version and packages are managed by the Conda environment (`pvp-ml/environment.yml`), which generally sets up an environment with Python 3.8 or newer.
*   **Java:**
    *   **Java 17:** Required for the OSRS simulation server (`simulation-rsps`). This is automatically installed as part of the Conda environment defined in `pvp-ml/environment.yml`.
    *   **Java 11 (Optional):** If you want to run the standard Elvarg RSPS client (for observing the agent or playing against it, as mentioned in `simulation-rsps/README.md`), it requires Java 11. This is not installed by the project's Conda environment and would need to be managed separately if you choose to use this client. Many SDK managers (like SDKMAN! or Jabba) can help manage multiple Java versions.

Once these prerequisites are met, you can proceed with cloning the repository and setting up the project-specific environment.

## 1. Project Structure

This project is organized into a few key directories:

*   `/pvp-ml`: Contains the core reinforcement learning code. This includes:
    *   Training scripts (`train.py`, `run_train_job.py`)
    *   Evaluation scripts (`evaluate.py`)
    *   PPO algorithm implementation (`ppo/`)
    *   Environment definitions (`env/`)
    *   Training configurations (`config/`)
    *   Pre-trained models (`models/`)
*   `/simulation-rsps`: Contains the Old School RuneScape Private Server (RSPS) used for simulating the game environment.
    *   The server is based on Elvarg RSPS.
    *   It includes modifications for RL agent interaction.
*   `/contracts`: Contains JSON files defining the observation and action spaces for different environments (e.g., `contracts/environments/NhEnv.json` for NH (No Honor) PvP).
*   `/assets`: Contains images and GIFs used in README files.
*   `rl_training_osrs.ipynb` (this notebook): Provides a step-by-step guide to set up the environment, run training, and evaluate agents.

Understanding this structure will help you navigate the codebase and locate relevant files for configuration, training, or modification.

## 2. Setting up the Conda Environment

This project uses Conda to manage dependencies. The `pvp-ml/environment.yml` file defines the required packages, including PyTorch for deep learning and Java 17, which is necessary for the simulation server.

**Steps:**

1.  **Navigate to the `pvp-ml` directory:**
    Open your terminal and change the directory to `pvp-ml` within this project's root.
    ```bash
    cd pvp-ml
    ```

2.  **Create the Conda environment:**
    Run the following command to create a new Conda environment named `rl-env` (or you can use the `-p ./env` option from the original README to create it in a local `./env` folder within `pvp-ml`). The command below creates a named environment which is often easier to manage.
    ```bash
    conda env create -f environment.yml -n rl-env
    ```
    *Note: For CPU-only training, uncomment `cpuonly` in the `pvp-ml/environment.yml` file before creating the environment. By default, training uses a GPU if available.*

3.  **Activate the Conda environment:**
    Once the environment is created, activate it using:
    ```bash
    conda activate rl-env
    ```
    You'll need to activate this environment in any new terminal session where you intend to run training or evaluation scripts.

After these steps, your environment will be ready with all necessary libraries for both the RL components and the OSRS simulation server.

## 3. Launching the OSRS Simulation Server

The RL agent learns by interacting with a simulated Old School RuneScape environment. This simulation is provided by a modified RuneScape Private Server (RSPS).

**Important:** The simulation server must be running in a separate terminal *before* you start any training or evaluation processes. The RL scripts will connect to this server.

**Steps to launch the server:**

1.  **Ensure the Conda environment is active:**
    If you haven't already, activate the Conda environment you created in the previous step:
    ```bash
    conda activate rl-env
    ```
    This is important because the environment includes Java 17, which the server requires.

2.  **Navigate to the `ElvargServer` directory:**
    In a **new terminal window or tab**, navigate from the project root to the server directory:
    ```bash
    cd simulation-rsps/ElvargServer
    ```

3.  **Launch the server using Gradle:**
    Run the following command. This will compile and start the server.
    ```bash
    ./gradlew run
    ```
    You should see log output indicating the server is starting up. The RL plugin within the server will automatically start a remote environment server, which the training scripts will connect to.

Leave this terminal window open and the server running while you proceed with training or evaluation. If you stop the server, any ongoing training will be interrupted.

## 4. Starting a Training Job

Once your Conda environment is active and the OSRS simulation server is running, you can start training an RL agent.

**Steps:**

1.  **Navigate to the `pvp-ml` directory:**
    Open a **new terminal window or tab** (distinct from the one running the simulation server). Ensure your Conda environment (`rl-env`) is active in this terminal. Then, navigate to the `pvp-ml` directory from the project root:
    ```bash
    cd pvp-ml
    ```
    *(If you are already in `pvp-ml` from activating the conda environment, you can skip this `cd` command.)*

2.  **Choose a configuration preset:**
    Training configurations are defined in YAML files within the `pvp-ml/config` directory. There are subdirectories for different environment types (e.g., `nh` for No Honor PvP, `dharok` for Dharok PvP).
    For this example, we'll use the `PastSelfPlay` preset for the NH environment, which is defined in `pvp-ml/config/nh/past-self-play.yml`. This preset often provides a good balance of performance and sample efficiency.

3.  **Start the training job:**
    Use the `train` script to initiate training. You need to specify a preset and a unique name for your experiment.
    ```bash
    python run_train_job.py --preset PastSelfPlay --name osrs_agent_training_notebook
    ```
    *   `--preset PastSelfPlay`: Tells the script to use the configuration from `config/nh/past-self-play.yml` (the `.yml` extension and `nh/` path prefix are inferred by the script).
    *   `--name osrs_agent_training_notebook`: Assigns a name to this training run. This name will be used for creating directories to store logs and model checkpoints.

4.  **Monitoring progress:**
    *   Training logs will be output to the console.
    *   Detailed logs are stored in the `./logs` directory within `pvp-ml`.
    *   Experiment data, including saved model checkpoints, will be stored in `./experiments/<your-experiment-name>` (e.g., `pvp-ml/experiments/osrs_agent_training_notebook`). Models are typically saved periodically, often as `latest.zip` and also at certain step counts.

**Stopping a training job:**
To stop a training job before it completes, you can typically use `Ctrl+C` in the terminal where the training script is running. For more managed cleanup, the `pvp-ml/README.md` mentions:
```bash
python run_train_job.py cleanup --name osrs_agent_training_notebook
```
Or to terminate all jobs:
```bash
python run_train_job.py cleanup --name all
```

Training can take a significant amount of time, depending on the configuration, your hardware (GPU availability helps immensely), and the desired level of agent performance.

## 5. Monitoring Training with TensorBoard

TensorBoard is a visualization toolkit for TensorFlow (and PyTorch, via `torch.utils.tensorboard`) that allows you to track various metrics during your RL training runs, such as rewards, loss values, episode lengths, and more.

**Steps to launch TensorBoard:**

1.  **Ensure your Conda environment is active:**
    If not already active, in a new terminal:
    ```bash
    conda activate rl-env
    cd pvp-ml
    ```
    *(If your training job is running, TensorBoard might have been launched automatically. Check the terminal output of the training script.)*

2.  **Launch TensorBoard:**
    If TensorBoard didn't start automatically with your training job, or if you want to start it separately (e.g., after a job has finished to review logs), you can run the following command from the `pvp-ml` directory:
    ```bash
    python run_train_job.py tensorboard
    ```
    This command specifically looks for TensorBoard logs generated by the training framework. The logs are typically stored in a `./tensorboard` directory within `pvp-ml`, under your experiment name (e.g., `pvp-ml/tensorboard/osrs_agent_training_notebook`).

3.  **Access TensorBoard in your browser:**
    Open your web browser and navigate to the address provided by TensorBoard, which is usually:
    [http://127.0.0.1:6006/](http://127.0.0.1:6006/) or [http://localhost:6006/](http://localhost:6006/)

    You should see your experiment listed (e.g., `osrs_agent_training_notebook`). You can then explore various graphs and metrics to monitor your agent's learning progress.

TensorBoard is an invaluable tool for understanding how well your agent is learning and for debugging potential issues with your training setup or hyperparameters.

## 6. Evaluating a Trained Model

Once you have a trained model (either from your own training runs or a pre-trained model), you can evaluate its performance by running it against an opponent in the simulation.

**Steps to evaluate a model:**

1.  **Ensure the OSRS Simulation Server is running:**
    As with training, the simulation server must be active. If it's not running, start it following the instructions in "Section 3. Launching the OSRS Simulation Server".

2.  **Ensure your Conda environment is active:**
    In a new terminal (or the one you used for training/TensorBoard), activate the Conda environment and navigate to the `pvp-ml` directory:
    ```bash
    conda activate rl-env
    cd pvp-ml
    ```

3.  **Locate your trained model:**
    Models from your training runs are saved in the `pvp-ml/experiments/<your-experiment-name>/models/` directory. For example, if your experiment was named `osrs_agent_training_notebook`, a model might be found at `pvp-ml/experiments/osrs_agent_training_notebook/models/latest.zip` or a checkpoint like `pvp-ml/experiments/osrs_agent_training_notebook/models/<step_count>.zip`.
    This project also provides pre-trained models in the `pvp-ml/models/` directory, such as `GeneralizedNh.zip`.

4.  **Run the evaluation script:**
    Use the `evaluate` script (which is an alias for `python -m pvp_ml.evaluate`), providing the path to your model:
    ```bash
    python -m pvp_ml.evaluate --model-path <path_to_your_model>
    ```
    For example, to evaluate the `latest.zip` from our example training run:
    ```bash
    python -m pvp_ml.evaluate --model-path ./experiments/osrs_agent_training_notebook/models/latest.zip
    ```
    Or to evaluate a pre-trained model:
    ```bash
    python -m pvp_ml.evaluate --model-path ./models/GeneralizedNh.zip
    ```
    The script will load the model and connect to the simulation server. An agent controlled by your model will spawn in the game.

5.  **(Optional) Connect to the server with an RSPS client to observe:**
    To watch your agent play, or even play against it yourself, you can connect to the simulation server using an OSRS client. The `simulation-rsps/README.md` provides instructions:
    *   Clone the upstream Elvarg RSPS repository: `git clone https://github.com/RSPSApp/elvarg-rsps`
    *   Navigate to `elvarg-rsps/ElvargClient`.
    *   Run the client (requires Java 11, which might differ from the server's Java 17; you may need to manage Java versions or install Java 11 separately for the client): `./gradlew run`
    *   Log in to the server. You should find your agent in a PvP area (e.g., Mage Bank or PvP Arena for NH).

Evaluation helps you assess how well your agent performs in the simulated environment and can give insights into its strengths and weaknesses.

## 7. Serving Models via API

This project allows you to serve your trained models through a socket-based API for fast predictions. This can be useful if you want to integrate the agent's decision-making into other applications or set up a more persistent service.

**Steps to serve models:**

1.  **Ensure your Conda environment is active:**
    Activate the Conda environment and navigate to the `pvp-ml` directory:
    ```bash
    conda activate rl-env
    cd pvp-ml
    ```

2.  **Start the API server:**
    Use the `serve-api` script (which is an alias for `python -m pvp_ml.api`):
    ```bash
    python -m pvp_ml.api
    ```
    By default, this serves models located in the `pvp-ml/models/` directory and listens for connections on `127.0.0.1`. You can configure the host and other parameters if needed (refer to the script's help or source code for details). The API typically loads pre-trained models like `GeneralizedNh.zip` and `FineTunedNh.zip` by default.

3.  **Connect using a client:**
    You can interact with the API using a custom client. An example client implementation is provided in:
    `pvp-ml/test/integ/api_client.py`

    This client demonstrates how to connect to the API, send requests (e.g., for actions based on observations), and receive responses.

The API mode is more advanced and intended for scenarios where you need programmatic access to the model's inference capabilities outside the direct training/evaluation loop.

## 8. Conclusion and Next Steps

This notebook has guided you through the main steps to get an OSRS RL agent up and running using this project:

1.  **Understanding the Project Structure:** Knowing where key files and directories are located.
2.  **Setting up the Conda Environment:** Ensuring all dependencies, including Python packages and Java 17, are installed.
3.  **Launching the OSRS Simulation Server:** Providing the game world for the agent to interact with.
4.  **Starting a Training Job:** Using a preset configuration to train your own agent.
5.  **Monitoring Training with TensorBoard:** Visualizing metrics to track learning progress.
6.  **Evaluating a Trained Model:** Assessing your agent's performance in the simulation.
7.  **Serving Models via API:** (Optional) Making your agent's capabilities available programmatically.

**Potential Next Steps:**

*   **Experiment with different configurations:** Explore the YAML files in `pvp-ml/config/`. Try different presets, or even customize parameters like learning rates, network architectures, or reward structures.
*   **Dive deeper into the code:**
    *   Examine `pvp-ml/pvp_ml/ppo/ppo.py` to understand the PPO algorithm implementation.
    *   Look into `pvp-ml/pvp_ml/env/pvp_env.py` and the environment contract JSON files (e.g., `contracts/environments/NhEnv.json`) to see how the OSRS environment is represented.
    *   Study the simulation server code in `simulation-rsps/ElvargServer/src/main/java/com/github/naton1/rl/` to see how it interfaces with the RL agent.
*   **Train for longer:** RL agents often require extensive training (millions or tens of millions of timesteps) to achieve high performance.
*   **Analyze TensorBoard logs in detail:** Use the metrics to guide hyperparameter tuning and identify areas for improvement.
*   **Contribute to the project:** If you develop new features, environments, or improvements, consider contributing back to the repository.
*   **Try different self-play or adversarial training setups:** The `pvp-ml/config/nh/` directory contains various examples like `pure-self-play.yml` or `human-like-adversarial.yml`.

Reinforcement learning in complex environments like OSRS is a challenging but rewarding field. Good luck with your experiments!