# **Pendulum-v1 Example in ElegantRL-HelloWorld**






# **Part 1: Install ElegantRL**

In [1]:
# install elegantrl library
!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git

Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git
  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-1hyi8kfj
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-1hyi8kfj
  Resolved https://github.com/AI4Finance-LLC/ElegantRL.git to commit 6f7096c9f825d529b90cf8cba4fc75929af979ed
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gymnasium (from ElegantRL==0.3.10)
  Downloading gymnasium-1.0.0-py3-none-any.whl.metadata (9.5 kB)
Collecting farama-notifications>=0.0.1 (from gymnasium->ElegantRL==0.3.10)
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl.metadata (558 bytes)
Downloading gymnasium-1.0.0-py3-none-any.whl (958 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m958.1/958.1 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Building wheels for collected packages: Ele

# **Part 2: Specify Environment and Agent**

*   **agent**: chooses a agent (DRL algorithm) from a set of agents in the [directory](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl/agents).
*   **env**: creates an environment for your agent.


In [1]:
import gymnasium as gym

In [2]:
from elegantrl.train.config import Config
from elegantrl.agents.AgentSAC import AgentSAC

env = gym.make

env_args = {
    "id": "Pendulum-v1",
    "env_name": "Pendulum-v1",
    "num_envs": 1,
    "max_step": 1000,
    "state_dim": 3,
    "action_dim": 1,
    "if_discrete": False,
    "reward_scale": 2**-1,
    "gpu_id": 0, # if you have GPU
}
args = Config(AgentSAC, env_class=env, env_args=env_args)

# **Part 3: Specify hyper-parameters**
A list of hyper-parameters is available [here](https://elegantrl.readthedocs.io/en/latest/api/config.html).

In [3]:
args.max_step = 1000
args.reward_scale = 2**-1  # RewardRange: -1800 < -200 < -50 < 0
args.gamma = 0.97
args.target_step = args.max_step
args.eval_times = 2**3
args.num_workers=24 #rollout, improve gpu utilization
args.num_threads=4
# args.learner_gpu_ids=[0,1]

# **Part 4: Train and Evaluate the Agent**






In [4]:
from elegantrl.train.run import train_agent

train_agent(args)

| train_agent_multiprocessing() with GPU_ID 0
| Arguments Remove cwd: ./Pendulum-v1_SAC_0
| Evaluator:
| `step`: Number of samples, or total training steps, or running times of `env.step()`.
| `time`: Time spent from the start of training to this moment.
| `avgR`: Average value of cumulative rewards, which is the sum of rewards in an episode.
| `stdR`: Standard dev of cumulative rewards, which is the sum of rewards in an episode.
| `avgS`: Average of steps in an episode.
| `objC`: Objective of Critic network. Or call it loss function of critic network.
| `objA`: Objective of Actor network. It is the average Q value of the critic network.
################################################################################
ID     Step    Time |    avgR   stdR   avgS  stdS |    expR   objC   objA   etc.
0  2.46e+04      45 |-1409.61  282.0    200     0 |   -3.08  11.94   0.03 0.03098304377635941
0  4.92e+04      67 |-1241.46  392.9    200     0 |   -3.24   9.82  -0.05 -0.047913442860590294
0 

KeyboardInterrupt: 

0  2.21e+05     236 |-1600.84  190.3    200     0 |   -3.71   0.44 -13.46 -13.458587421311272


Understanding the above results::
*   **Step**: the total training steps.
*  **MaxR**: the maximum reward.
*   **avgR**: the average of the rewards.
*   **stdR**: the standard deviation of the rewards.
*   **objA**: the objective function value of Actor Network (Policy Network).
*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network).