Check the action space and observation space of the environment

In [1]:
from HohmannTransferEnv import HohmannTransferEnv

env = HohmannTransferEnv()
print(env.action_space)
print(env.observation_space)

Box(-0.1, 0.1, (1,), float32)
Box(-inf, inf, (3,), float32)


Test the step function of the environment

In [2]:
env = HohmannTransferEnv()
env.reset()
for _ in range(100):
    action = env.action_space.sample()
    print("Action: ", action)
    next_state, reward, done, _ = env.step(action)
    print("Next state: ", next_state, "Reward: ", reward, "Done: ", done)
    if done:
        break

Action:  [-0.07227103]
Next state:  [array([-0.00036136, -0.00036136]) array([-0.0072271, -0.0072271])
 1.5262548949600395e+24] Reward:  1 Done:  False
Action:  [-0.09879111]
Next state:  [array([7.63127425e+18, 7.63127425e+18])
 array([1.52625496e+20, 1.52625496e+20]) 3.422168615646904e-21] Reward:  1 Done:  False
Action:  [0.08435822]
Next state:  [array([2.28938239e+19, 2.28938239e+19])
 array([1.52625496e+20, 1.52625496e+20]) 3.802409207707702e-22] Reward:  1 Done:  False
Action:  [-0.0181253]
Next state:  [array([3.81563735e+19, 3.81563735e+19])
 array([1.52625496e+20, 1.52625496e+20]) 1.368867288477977e-22] Reward:  1 Done:  False
Action:  [0.02488953]
Next state:  [array([5.34189231e+19, 5.34189231e+19])
 array([1.52625496e+20, 1.52625496e+20]) 6.984016720448583e-23] Reward:  1 Done:  False
Action:  [0.05876366]
Next state:  [array([6.86814727e+19, 6.86814727e+19])
 array([1.52625496e+20, 1.52625496e+20]) 4.2248989844036456e-23] Reward:  1 Done:  False
Action:  [0.04424293]
Next

  self.state = np.array([pos_, vel_, Fg_])


I don't really understand how the step function works. 
Let's think step by step what we're trying to do and what we need to do it.
- We're at a certain state
  - This includes
    - position and velocity of the spacecraft in both x and y directions
    - gravitational force
- We have 2 actions available to us
  - thrust in x direction
  - thrust in y direction
- We want to get to the next state
  - We need to calculate the new position and velocity of the spacecraft
    - We need to know if we've crashed
      - I don't see a definition of where the earth is and how big it is in the environment
        - We need to add the following constraints:
          - The radius of the earth will be 10 units and it will be positioned at the origin (we might need to decrease maximum thrust). If the rocket enters this region or is more than 100 units away from the origin, the environment terminates.
- Let's say that we're at a state 
- We need to solve an ODE to get to the next state.

The state of the system includes the position and velocity of the spacecraft in both the x and y directions, and the gravitational force acting on the spacecraft. The actions you have available are thrusts in the x and y directions.

Now, how do you get from one state to the next given a certain action? This is defined by the dynamics of the system, which in this case are given by the differential equation:
$$\ddot{\mathbf{x}} = \frac{G M_E}{|\mathbf{x}|^3}\mathbf{x} + a$$

Here:

$\ddot{\mathbf{x}}$ is the second derivative of the position vector, which represents the acceleration of the spacecraft.
$G$ is the gravitational constant, $M_E$ is the mass of the Earth, and $|\mathbf{x}|$ is the distance from the spacecraft to the center of the Earth.
$\mathbf{x}$ is the position vector of the spacecraft.
$a$ is the acceleration due to the thrust of the spacecraft.
The left-hand side of the equation represents the total acceleration of the spacecraft, and the right-hand side represents the forces acting on the spacecraft (gravity and thrust), divided by the mass of the spacecraft to get acceleration (from F=ma).

This is a second-order differential equation because it involves the second derivative of the position. Most numerical solvers, like the solve_ivp function from SciPy, can only solve first-order differential equations. Therefore, we need to convert this second-order equation into a system of first-order equations.

This is done by introducing new variables: $\mathbf{y}_1$ represents the position, $\mathbf{y}_2$ represents the velocity, and $\mathbf{y}_3$ represents the acceleration. Now, we have three first-order differential equations:

$\mathbf{y}_1 = \mathbf{x}$: The position.

$\mathbf{y}_2 = \dot{\mathbf{y}}_1$: The rate of change of position is the velocity.

$\mathbf{y}_3 = \frac{G M_E}{|\mathbf{y}_1|^3}\mathbf{y}_1 + a$: This is the original differential equation, but written in terms of $\mathbf{y}_1$ instead of $\mathbf{x}$.

Given the current state of the system (the position, velocity, and gravitational force), and a certain action (the thrust), you can solve these differential equations over a short time interval to find the state of the system at the next time step.

To check if the spacecraft has crashed, you calculate the distance from the spacecraft to the center of the Earth (which is at the origin of your coordinate system) and check if it's less than the radius of the Earth (10 units in your case), or if it's more than 100 units away.

Here is a step-by-step example:

Let's say the spacecraft is initially at position (x, y) = (15, 0) with velocity (vx, vy) = (0, 1) and the gravitational force Fg is 0.02. So your state is [15, 0, 0, 1, 0.02].

The action chosen by the agent is to thrust with (ax, ay) = (0.05, 0). This is the acceleration due to thrust.

Define the system of first-order ODEs as explained above.

Use solve_ivp to solve these ODEs over a 1-second interval. The output of solve_ivp gives you the state of the system at the next time step.

Check if the new position of the spacecraft is within the Earth's radius or more than 100 units away. If it is, the episode is done.

Repeat steps 2-5 for the next action until the episode is done.

In your current code, you have already implemented these steps in the step function. However, the differential equation in your ode function might be slightly incorrect, it should be: