sim2real

UCAS course[Reinforcement Learning] group assignment, as well as the materious for COG competition 2022.

2022.05.08

离散的DQN

CogEnvDecoder == 0.1.29

discrete_v2.py 中修改了小车朝向与目标之间的夹角；奖励函数是欧氏距离+碰撞次数+碰撞时间；训练过程中每找到一个目标就reset。

test_env.py 是直接evaluate上面discrete_v2.py中的最好的模型

2022.05.14

continue.py 三个动作 dx dy dtheta; 奖励函数：goal_reward/ dis_reward/ collision_reward；训练方式：遇到障碍物就reset；每局训练

存在问题：有时会卡在某一个地方。

2022.05.18

continue_v3.py 设置卡顿标志，加入临时goal：卡顿后，有30步前往临时目标；每步训练

test_env_continue.py 环境测试 v2版本，要提前在官网的 tag 里下载好

2022.05.19

continue_v4.py: update the flag of stuck.

2022.05.22

Toolbox.py 中提供了更新的计算距离的方式：高斯混合模型模拟障碍物曲线积分计算实际距离

2022.05.24

We accidently find that the rotation is not an absolute bad state actually. By set the reward function to "100 * base_reward + 10 * rotate_loss + 1000 * collision_loss" and train the agent with 100000 steps, we get a great agent which could succeed in finding the object about 4-5 times averagely in 10 repeated evaluations. The rotated agent perform well probably because the agent stucks with a lower probability compared with those who do not rotate. Even to the extent that the rotation could prevent the agent from stuck. We still find that the performance went bad with more training steps. Moreover, we get some tips about the influence on the performance by adjust the reward function. With a higher weights for rotate_loss, the agent stops rotating at an earlier traing step.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
2_1_reward.zip		2_1_reward.zip
README.md		README.md
SAC_model.zip		SAC_model.zip
Toolbox.py		Toolbox.py
algorithm.py		algorithm.py
code_model.zip		code_model.zip
continue.py		continue.py
continue_v3.py		continue_v3.py
continue_v4.py		continue_v4.py
data.7z		data.7z
dataset.npy		dataset.npy
discrete.py		discrete.py
discrete_v2.py		discrete_v2.py
eva_reward_max_actor		eva_reward_max_actor
eva_reward_max_actor_optimizer		eva_reward_max_actor_optimizer
eva_reward_max_critic		eva_reward_max_critic
eva_reward_max_critic_optimizer		eva_reward_max_critic_optimizer
getDist.py		getDist.py
iter_actor		iter_actor
iter_actor_optimizer		iter_actor_optimizer
iter_critic		iter_critic
iter_critic_optimizer		iter_critic_optimizer
label.npy		label.npy
networks.py		networks.py
not_rotate_but_great_with_maze.zip		not_rotate_but_great_with_maze.zip
rotate_but_great.zip		rotate_but_great.zip
shoot_q		shoot_q
shoot_q_optimizer		shoot_q_optimizer
shoot_test.py		shoot_test.py
test_env.py		test_env.py
test_env_continue.py		test_env_continue.py
test_linux.py		test_linux.py
utils.py		utils.py
with_maze.zip		with_maze.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sim2real

2022.05.08

2022.05.14

2022.05.18

2022.05.19

2022.05.22

2022.05.24

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Fleming-Sung/sim2real

Folders and files

Latest commit

History

Repository files navigation

sim2real

2022.05.08

2022.05.14

2022.05.18

2022.05.19

2022.05.22

2022.05.24

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages