In [1]:
%pip install gymnasium stable-baselines3

Collecting gymnasium
  Downloading gymnasium-1.2.0-py3-none-any.whl.metadata (9.9 kB)
Collecting stable-baselines3
  Downloading stable_baselines3-2.7.0-py3-none-any.whl.metadata (4.8 kB)
Collecting cloudpickle>=1.2.0 (from gymnasium)
  Downloading cloudpickle-3.1.1-py3-none-any.whl.metadata (7.1 kB)
Collecting farama-notifications>=0.0.1 (from gymnasium)
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl.metadata (558 bytes)
Downloading gymnasium-1.2.0-py3-none-any.whl (944 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m944.3/944.3 kB[0m [31m28.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading stable_baselines3-2.7.0-py3-none-any.whl (187 kB)
Downloading cloudpickle-3.1.1-py3-none-any.whl (20 kB)
Downloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Installing collected packages: farama-notifications, cloudpickle, gymnasium, stable-baselines3
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4/4[0m [stable-baselines3]stable-baselines

In [2]:
# ============================================
# Section 1: Setup
# ============================================

# Install libraries (only if needed in Kaggle or local Jupyter)
# !pip install gymnasium stable-baselines3 matplotlib

import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines3 import DQN, PPO


# Deep Reinforcement Learning (DRL) Introduction  
# 딥 강화학습(DRL) 소개

In this lesson, we will:  
이번 수업에서 우리는:  

1. Learn the basics of Reinforcement Learning (RL) with a simple environment (CartPole).  
   간단한 환경(CartPole)을 이용해 강화학습(RL)의 기본을 학습합니다.  

2. Train a DRL agent using a standard library (`stable-baselines3`).  
   표준 라이브러리(`stable-baselines3`)를 사용하여 DRL 에이전트를 학습시킵니다.  

3. Visualize and interpret the learning process.  
   학습 과정을 시각화하고 해석합니다.  

4. Discuss how these concepts apply to **medical imaging tasks** (e.g., landmark detection, segmentation).  
   이러한 개념이 **의료영상 작업**(예: 랜드마크 탐지, 세분화)에 어떻게 적용되는지 논의합니다.  


In [3]:
# ============================================
# Section 2: Create RL Environment
# ============================================

# CartPole is a classic RL problem:
# The agent must balance a pole on a moving cart.
# 에이전트가 움직이는 카트 위에 막대를 균형 있게 세우는 문제입니다.

env = gym.make("CartPole-v1", render_mode="rgb_array")


# Check action and observation space

print("Observation space:" , env.observation_space)
print("Action space:" , env.action_space)



Observation space: Box([-4.8               -inf -0.41887903        -inf], [4.8               inf 0.41887903        inf], (4,), float32)
Action space: Discrete(2)


### Key RL Concepts / 주요 RL 개념

- **State (상태)**: The current condition of the environment (pole angle, cart position, etc.).  
  환경의 현재 상태 (막대 각도, 카트 위치 등).  

- **Action (행동)**: What the agent can do (move left or right).  
  에이전트가 취할 수 있는 행동 (왼쪽/오른쪽 이동).  

- **Reward (보상)**: Feedback given after each action (keeping the pole upright).  
  각 행동 후 주어지는 피드백 (막대를 세워 유지하면 보상).  

- **Policy (정책)**: The strategy mapping states → actions.  
  상태를 행동으로 매핑하는 전략.  


In [4]:
# ============================================
# Section 3: Train a DRL Agent (DQN)
# ============================================

# Use Deep Q-Network (DQN) agent
# DQN 에이전트 학습

model = DQN("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=5000)



Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 20       |
|    ep_rew_mean      | 20       |
|    exploration_rate | 0.848    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 7441     |
|    time_elapsed     | 0        |
|    total_timesteps  | 80       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 18.4     |
|    ep_rew_mean      | 18.4     |
|    exploration_rate | 0.721    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 2167     |
|    time_elapsed     | 0        |
|    total_timesteps  | 147      |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.635    |
|    n_updates        | 11       |
-------------------------------

<stable_baselines3.dqn.dqn.DQN at 0x7ce915aa36e0>

In [7]:
# ============================================
# Section 4: Evaluate Agent
# ============================================

episodes = 5
for ep in range(episodes):
    obs, _ = env.reset()
    total_reward = 0
    done= False

    while not done:
        action, _ = model.predict(obs, deterministic= True)
        obs, reward, done, truncated, info = env.step(action)
        total_reward += reward

    print(f"Episode {ep+1}: Total reward = {total_reward}")


Episode 1: Total reward = 9.0
Episode 2: Total reward = 10.0
Episode 3: Total reward = 10.0
Episode 4: Total reward = 9.0
Episode 5: Total reward = 10.0


### What we learned here / 여기서 배운 점

- The agent improves its **decision-making** by trial and error.  
  에이전트는 시행착오를 통해 **의사결정 능력**을 향상시킵니다.  

- This is very similar to medical imaging tasks where:  
  이는 의료 영상 작업과 매우 유사합니다:  
  - Detecting lesions requires **sequential refinement**.  
    병변 탐지는 **순차적 정밀화**가 필요합니다.  
  - Segmentation can be seen as **step-by-step boundary tracing**.  
    세분화는 **경계선을 단계적으로 추적**하는 것과 같습니다.  

➡ Next: Let’s connect this to a medical dataset example (e.g., Kaggle Chest X-ray).  
➡ 다음 단계: Kaggle 흉부 X-ray 같은 의료 데이터셋과 연결해 봅시다.  
