 ### UAV SIMULATION FOR TARGET RECOGNITION
- AUTHOR: BASSEY RIMAN
- ID: Q2034066
- EMAIL: Q2034066@LIVE.TEES.AC.UK

## INTRODUCTION
This project delves into the realm of Unmanned Aerial Vehicle (UAV) simulation with a primary focus on target recognition, through advanced Reinforcement Learning (RL) techniques. The UAV, autonomously guided by RL algorithms, adeptly navigates dynamic environments. Utilizing Q-learning algorithms for adaptive decision-making, the UAV proficiently maneuvers through intricate scenarios, identifying targets amidst environmental obstacles. This simulation accurately replicates real-world challenges, offering insights into RL applications for UAVs, particularly in surveillance tasks. Employing RL, the UAV simulation emerges as a potent tool for military intelligence gathering, substantially enhancing surveillance capabilities. The project's paramount significance lies in its potential to provide a strategic edge in intelligence gathering and reconnaissance operations, advancing military capabilities in a dynamically evolving technological landscape.

### External materials incorporated into this project include:

- Military Drone Image: 
A high-quality image of a military drone is incorporated to represent the agent. This addition enhances the realism of the simulation, providing a detailed portrayal of the UAV used for target recognition.https://www.pngall.com/military-drone-png/download/50108

- Simulated Aerial Shot Environment (Sky):
A simulated aerial shot environment is utilized to replicate real-world conditions. This includes a sky environment that contributes authenticity to the UAV simulation.https://www.pexels.com/video/thick-fogs-covering-the-mountain-valley-4763084/

- Spaceship Image:
An image of a spaceship is incorporated into the project to represent the target, adding diversity to the simulation scenarios and enhancing the overall dynamic nature of the UAV's encounters.https://www.pngkey.com/detail/u2e6q8t4t4r5u2a9_star-wars-rpg-nave-star-wars-star-trek/

## <div style="background-color:#3498db; padding:40px; border-radius: 20px;">👨‍💻Application Requirements
</div>

- Let's gear up for the target recognition adventure! First, I'm ensuring I have the latest version of Pygame for the target recognition magic. The '!pip install --upgrade pygame' command takes care of this.
- Now, I'm bringing in Imageio, a powerhouse for video and image processing. To ensure it can handle video files properly, I'm including the 'imageio[ffmpeg]' installation command, making sure FFmpeg support is on board.
- With Pygame upgraded and Imageio ready to roll, my coding environment is prepped and optimized for the target recognition project. Time to dive into the code and start building intelligence into my agent!

In [1]:
#!pip install --upgrade pygame
#!pip install imageio
#!pip install imageio[ffmpeg]
#!pip install matplotlib

## <div style="background-color:#3498db; padding:40px; border-radius: 20px;">👨‍💻Importing Necessary Libraries</div>

Here, I'm bringing in the essential libraries for my target recognition project. Pygame sets the visual stage,
random adds an element of unpredictability, math and numpy handle numerical operations, and imageio is crucial
for working with video files. Matplotlib to create visualizations that help us understand and analyze the performance of the agent during training.

In [2]:
import pygame
import random
import math
import time
import numpy as np
import imageio
import matplotlib.pyplot as plt

pygame 2.5.2 (SDL 2.28.3, Python 3.11.5)
Hello from the pygame community. https://www.pygame.org/contribute.html


## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻Reinforcement Learning Parameters</div>

In this section, I'm laying the foundation for my target recognition adventure. I define the size of state
and action spaces and initialize the learning process with a Q-table filled with zeros.

In [3]:
STATE_SPACE_SIZE = 4
ACTION_SPACE_SIZE = 4
Q_TABLE = np.zeros((STATE_SPACE_SIZE, ACTION_SPACE_SIZE))

## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻Reward Constants</div>

These constants define the rules governing rewards and penalties in my target recognition environment.
I've assigned values for actions such as successfully identifying the target, the cost of movement, and penalties
for collisions or taking too much time in the recognition process.

In [4]:
REWARD_TARGET_IDENTIFIED = 200
REWARD_MOVEMENT = 50
PENALTY_COLLISION = -50
PENALTY_TIME = -20
REWARD_DISTANCE_TO_TARGET = 100

## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻Direction Mapping Dictionary</div>

This nifty dictionary serves as my guide, translating human-readable directions ('UP', 'DOWN', 'LEFT', 'RIGHT')
into numerical values. It acts like a compass directing the algorithm's focus during the target recognition process.

In [5]:
DIRECTION_MAPPING = {'UP': 0, 'DOWN': 1, 'LEFT': 2, 'RIGHT': 3}

## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻State-related Functions</div>

Now, I'm defining functions related to the current state of my recognition agent. From calculating the state based
on positions to choosing actions and updating my Q-table during the learning process.

In [6]:
# Initializing exploration rate
epsilon = 0.1

def get_state():
    """Get the current state of the agent."""
# ---------------------------------------------------------------------------------------------------
# - Calculating the distance to the target and extracts relevant information to form the current state.
# ---------------------------------------------------------------------------------------------------
    distance_to_target = math.sqrt((R_rect.x - target_rect.x) ** 2 + (R_rect.y - target_rect.y) ** 2)
    state_info = np.array([R_rect.x % STATE_SPACE_SIZE,
                           R_rect.y % STATE_SPACE_SIZE,
                           distance_to_target % STATE_SPACE_SIZE])
    
# ---------------------------------------------------------------------------------------------------
# - Scaling and converting the state information into a format suitable for the Q-table.
# ---------------------------------------------------------------------------------------------------
    state_info[:2] /= float(STATE_SPACE_SIZE)
    state_info[2] /= float(STATE_SPACE_SIZE)

    state_info = (state_info * STATE_SPACE_SIZE).astype(int)
    
# ---------------------------------------------------------------------------------------------------    
# - Printing the current state for a quick check.
# ---------------------------------------------------------------------------------------------------
    print("State:", state_info)

# ---------------------------------------------------------------------------------------------------
# - Returns the formatted state information.
# ---------------------------------------------------------------------------------------------------
    return state_info
#
#
# ---------------------------------------------------------------------------------------------------
# - Choosing an action based on the current state, incorporating exploration-exploitation through epsilon-greedy strategy.
# ---------------------------------------------------------------------------------------------------
def select_action(state):
    """Select an action based on the current state."""
    state_int = state.astype(int)
    
# ---------------------------------------------------------------------------------------------------
# - Randomly exploring with a probability of epsilon or exploits by selecting the action with the highest Q-value.
# ---------------------------------------------------------------------------------------------------
    epsilon = 0.1
    if np.random.rand() < epsilon:
        return np.random.choice(ACTION_SPACE_SIZE)
    else:
        
# ---------------------------------------------------------------------------------------------------
# - Returning the selected action.
# ---------------------------------------------------------------------------------------------------
        return np.argmax(Q_TABLE[state_int, :])
#
#
def update_q_table(state, action, reward, next_state):
    """Update the Q-table based on the observed transition."""
# ---------------------------------------------------------------------------------------------------
# - Updating the Q-table based on the observed transition, applying the Q-learning algorithm.
# ---------------------------------------------------------------------------------------------------
    action = action % ACTION_SPACE_SIZE
    alpha = 0.9
    gamma = 1
    max_q_next = np.max(Q_TABLE[next_state, :])
    
# ---------------------------------------------------------------------------------------------------   
# - Balancing the old knowledge with the new information using learning rate (alpha) and discount factor (gamma).
# ---------------------------------------------------------------------------------------------------
    Q_TABLE[state, action] = (1 - alpha) * Q_TABLE[state, action] + alpha * (reward + gamma * max_q_next)
    
# ---------------------------------------------------------------------------------------------------  
# - Printing the current state for reference.
# ---------------------------------------------------------------------------------------------------
    print("State:", state)


def obstacle_collision(x, y):
    """Check if there is a collision with obstacles."""
# ---------------------------------------------------------------------------------------------------
# - Checking if there is a collision with obstacles by examining the proximity of the agent to predefined obstacle positions.
# ---------------------------------------------------------------------------------------------------
    for position in CIRCLE_POSITIONS:
        circle_rect = pygame.Rect(position[0] - 15, position[1] - 15, 30, 30)
        if circle_rect.collidepoint(x, y):
            
# ---------------------------------------------------------------------------------------------------
# - Returning True if a collision is detected, indicating the need for a strategic move.
# ---------------------------------------------------------------------------------------------------
            return True # Collision detected
    return False

## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻Pygame Setup</div>

Here, I'm setting up the stage for my target recognition project. Pygame is initialized, and images for my recognition
agent and target are loaded. I'm also defining parameters for obstacles, rewards, and penalties in the recognition process.

In [7]:
pygame.init()
# ---------------------------------------------------------------------------------------------------
# I'm going for a fullscreen display mode, and I've aptly named the adventure 'Target Tracker.'
# ---------------------------------------------------------------------------------------------------
screen = pygame.display.set_mode((0, 0), pygame.FULLSCREEN)
pygame.display.set_caption('Target Tracker')
#
#
# ---------------------------------------------------------------------------------------------------
# The clock is ticking, and I've set the running flag to True to keep the project alive.
# ---------------------------------------------------------------------------------------------------
clock = pygame.time.Clock()
running = True
#
#
#  AGENT AND TARGET SETUP
# ---------------------------------------------------------------------------------------------------
# The agent 'R' and the elusive target are entering the scene! I'm loading their images and adjusting their sizes
# ---------------------------------------------------------------------------------------------------
R_image = pygame.image.load("R.png")
original_R_rect = R_image.get_rect()
R_image = pygame.transform.scale(R_image, (original_R_rect.width // 8, original_R_rect.height // 8))
R_rect = R_image.get_rect()
#
#
# ---------------------------------------------------------------------------------------------------
# For the optimal recognissance experience. The target is positioned strategically to keep things interesting.
# ---------------------------------------------------------------------------------------------------
target_image = pygame.image.load("target.png")
target_original_rect = target_image.get_rect()
target_width = R_rect.width
target_height = target_original_rect.height * target_width // target_original_rect.width
target_image = pygame.transform.scale(target_image, (target_width, target_height))
target_rect = target_image.get_rect()
target_rect.topleft = (screen.get_width() - target_rect.width, 0)

R_rect.topleft = (random.randint(0, screen.get_width() - R_rect.width), random.randint(0, screen.get_height() - R_rect.height))
#
#
# MAP AND VIDEO SETUP
# ---------------------------------------------------------------------------------------------------
# A map is ready to capture the agent's journey, 
# ---------------------------------------------------------------------------------------------------
map_width, map_height = 200, 200
map_surface = pygame.Surface((map_width, map_height))
map_rect = map_surface.get_rect(topleft=(10, 40))

# ---------------------------------------------------------------------------------------------------
# ...And the video file for background processing is loaded.
# ---------------------------------------------------------------------------------------------------
video_path = "vid.mp4"
video = imageio.get_reader(video_path)

# ---------------------------------------------------------------------------------------------------
# I've even given the agent an initial move direction to spice things up!
# ---------------------------------------------------------------------------------------------------
move_direction = random.choice(['UP', 'DOWN', 'LEFT', 'RIGHT'])
#
#
# ADVENTURE VARIABLES
# ---------------------------------------------------------------------------------------------------
# The adventure flags are set! Is the target identified? Is the project paused? How many iterations so far?
# These variables add dynamic elements to the target recognition mission.
# ---------------------------------------------------------------------------------------------------
target_identified = False
paused = False
total_iteration_count = 0

# ---------------------------------------------------------------------------------------------------
# Let the Target Recognition Begin!
# With the stage set and characters in place, it's time to embark on the target recognition journey.
# The game loop will keep running until the mission is accomplished or the agent decides to call it quits.
# ---------------------------------------------------------------------------------------------------
#
# ---------------------------------------------------------------------------------------------------
# Defining circle positions
# ---------------------------------------------------------------------------------------------------
CIRCLE_POSITIONS = [(150, 350), (400, 500), (650, 300), (900, 600), (1150, 400)]

reward = 0
max_steps_per_episode = 5
episode_count = 0
total_time_taken = 0.0

# ---------------------------------------------------------------------------------------------------
# Initializing reward and penalty bars
# ---------------------------------------------------------------------------------------------------
REWARD_BAR_MAX_LENGTH = 200  # Adjusting the maximum length of the reward bar
PENALTY_BAR_MAX_LENGTH = 200  # Adjusting the maximum length of the penalty bar
REWARD_BAR_LENGTH = REWARD_BAR_MAX_LENGTH // 10  # Setting default half of the bar
PENALTY_BAR_LENGTH = PENALTY_BAR_MAX_LENGTH // 10  # Setting default half of the bar

# ---------------------------------------------------------------------------------------------------
# Initializing cumulative rewards and penalties
# ---------------------------------------------------------------------------------------------------
cumulative_reward = 0
cumulative_penalty = 0

# ---------------------------------------------------------------------------------------------------
# Initializing the rate at which the bars increase
# ---------------------------------------------------------------------------------------------------
REWARD_INCREASE_RATE = 1
PENALTY_INCREASE_RATE = 0.1

# ---------------------------------------------------------------------------------------------------
# Initializing lists to store data for plotting
# ---------------------------------------------------------------------------------------------------
episode_rewards = []
cumulative_rewards = []
cumulative_penalties = []
exploration_rate_over_time = []
time_taken_per_episode = []

## <div style="background-color:#3498db; padding:40px; border-radius: 10px;">👨‍💻Main Training Loop</div>

Welcome to the main event! In this loop, my target recognition unfolds. I handle events, initialize episodes,
process video frames, guide the recognition agent's movements, update the Q-table, and display the recognition state.
It's the core of my training process for target recognition.

In [1]:
while running:
# ---------------------------------------------------------------------------------------------------
# Event handling and initialization
# ---------------------------------------------------------------------------------------------------
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        elif event.type == pygame.KEYDOWN:
            if event.key == pygame.K_ESCAPE:
                running = False  # Pressing ESC key will exit the game loop and close the window

    if episode_count < max_steps_per_episode:
        
# ---------------------------------------------------------------------------------------------------
# Episode initialization
# ---------------------------------------------------------------------------------------------------
        episode_count += 1
        target_identified = False
        paused = False
        reward = 0
        prev_state = None
        prev_action = None
        R_rect.topleft = (
            random.randint(0, screen.get_width() - R_rect.width),
            random.randint(0, screen.get_height() - R_rect.height),)
        
        total_iteration_count = 0
        start_time = pygame.time.get_ticks()
        
        
        while not target_identified:
# ---------------------------------------------------------------------------------------------------
# Event handling
# ---------------------------------------------------------------------------------------------------
            for event in pygame.event.get():
                if event.type == pygame.QUIT:
                    running = False
                    target_identified = True  
                elif event.type == pygame.KEYDOWN:
                    if event.key == pygame.K_ESCAPE:
                        running = False  # Pressing ESC key will exit the loop and close the window
                        target_identified = True  

            if not paused:


# VIDEO FRAME PROCESSING AND AGENT MOVEMENT
# -------------------------------------------------------------------------------------------------------------
# Time to process the next frame of the video and bring it to life on the screen. 
# I'm handling potential exceptions like StopIteration or IndexError gracefully
# to make sure the video playback is smooth and uninterrupted.
# -------------------------------------------------------------------------------------------------------------
                try:
                    frame = video.get_next_data()
                except (StopIteration, IndexError):
# -------------------------------------------------------------------------------------------------------------
# If the video is at its end, let's close it and reset for a fresh start.
# -------------------------------------------------------------------------------------------------------------
                    video.close()
                    video = imageio.get_reader(video_path)
        
# -------------------------------------------------------------------------------------------------------------                   frame = video.get_next_data()
# Converting the video frame to a Pygame-friendly format and scaling it to fit the screen.
# -------------------------------------------------------------------------------------------------------------
                pygame_frame = pygame.image.fromstring(frame.tobytes(), frame.shape[1::-1], "RGB")
                pygame_frame = pygame.transform.scale(pygame_frame, (screen.get_width(), screen.get_height()))
        
# -------------------------------------------------------------------------------------------------------------       
# Placing the frame on the screen, creating the illusion of continuous motion.
# -------------------------------------------------------------------------------------------------------------
                screen.blit(pygame_frame, (0, 0))
    
# -------------------------------------------------------------------------------------------------------------   
# Time for the target recognition agent to shine! Let's determine the state, select an action, and move accordingly.
# -------------------------------------------------------------------------------------------------------------
                if not target_identified:
                    state = get_state()
                    action = select_action(state)
                
# -------------------------------------------------------------------------------------------------------------               
# Calculating the new position based on the selected action.
# ------------------------------------------------------------------------------------------------------------- 
                    speed = 5
                    angle = action * 90

                    new_x = R_rect.x + speed * math.cos(math.radians(angle))
                    new_y = R_rect.y + speed * math.sin(math.radians(angle))
                
# -------------------------------------------------------------------------------------------------------------                   
# Ensuring the agent stays within the screen boundaries and avoiding collisions with obstacles.
# -------------------------------------------------------------------------------------------------------------    
                    
                    new_x = max(0, min(new_x, screen.get_width() - R_rect.width))
                    new_y = max(0, min(new_y, screen.get_height() - R_rect.height))

                    if not obstacle_collision(new_x, new_y):
                        R_rect.x, R_rect.y = new_x, new_y
# -------------------------------------------------------------------------------------------------------------                           R_rect.x, R_rect.y = new_x, new_y
# Updating the reward based on the agent's movement.
# -------------------------------------------------------------------------------------------------------------    
                        reward += REWARD_MOVEMENT
    
# -------------------------------------------------------------------------------------------------------------    
# If there was a previous state and action, let's update the Q-table to improve decision-making.
# -------------------------------------------------------------------------------------------------------------    
                    if prev_state is not None and prev_action is not None:
                        update_q_table(prev_state, prev_action, reward, state)

                    prev_state = state
                    prev_action = action
                    
# -------------------------------------------------------------------------------------------------------------    
# Placing the agent on the screen, ready for the next iteration.
# -------------------------------------------------------------------------------------------------------------    
                screen.blit(R_image, R_rect)
#
#
# DRAWING OBSTACLES AND CONNECTIONS
# --------------------------------------------------------------------------------------------------
# As I embark on creating the visual environment for my target recognition quest, it's time to bring the obstacles to life.
# Using a loop to iterate through the obstacle positions, I draw circles of blue hue representing obstacles on the screen.
# Each obstacle is labeled with a unique identifier, adding a touch of clarity to the recognissance system.
# ---------------------------------------------------------------------------------------------------
                # Drawing 5 circles
            for i, position in enumerate(CIRCLE_POSITIONS, start=1):
                pygame.draw.circle(screen, (0, 0, 255), position, 15)
                # Displaying circle number
                font = pygame.font.Font(None, 36)
                text = font.render(f'Obstacle {i}', True, (255, 255, 255))
                screen.blit(text, (position[0] - 30, position[1] + 20))
                
# ---------------------------------------------------------------------------------------------------
# Keeping an eye on potential collisions, I check if the agent intersects with any obstacle.
# If so, a vivid green outline signals the collision, adding an extra layer of feedback.
# ---------------------------------------------------------------------------------------------------
                # Checking if Agent R encounters a circle
                if R_rect.colliderect(pygame.Rect(position[0] - 15, position[1] - 15, 30, 30)):
            
                    # Drawing a green line around the circle
                    pygame.draw.circle(screen, (0, 255, 0), position, 15, width=2)
                    reward += PENALTY_COLLISION  # Negative reward for colliding with a circle
# ---------------------------------------------------------------------------------------------------
# Increasing the red penalty bar and decreasing the green reward bar by 10
# ---------------------------------------------------------------------------------------------------     
                    # Updating the penalty bar
                    PENALTY_BAR_LENGTH = min(PENALTY_BAR_MAX_LENGTH, PENALTY_BAR_LENGTH + PENALTY_INCREASE_RATE)
                    REWARD_BAR_LENGTH = max(1, REWARD_BAR_LENGTH - PENALTY_INCREASE_RATE)

                    # Adding a penalty for collision
                    reward += PENALTY_COLLISION
                    
# ---------------------------------------------------------------------------------------------------
# Now, let's establish connections between certain obstacles. It's not just a random world; there's a network to navigate!
# ---------------------------------------------------------------------------------------------------
                for connection in [(1, 3), (3, 4), (4, 2), (2, 5)]:
                    start_circle = CIRCLE_POSITIONS[connection[0] - 1]
                    end_circle = CIRCLE_POSITIONS[connection[1] - 1]
                    pygame.draw.line(screen, (0, 255, 0), start_circle, end_circle, width=2)
                    
                screen.blit(target_image, target_rect)

                
                if R_rect.colliderect(target_rect) and not target_identified:
                    target_identified = True
# ---------------------------------------------------------------------------------------------------
# Increasing the green reward bar and decreasing the red penalty bar by 10
# ---------------------------------------------------------------------------------------------------
                    REWARD_BAR_LENGTH = min(REWARD_BAR_MAX_LENGTH, REWARD_BAR_LENGTH + REWARD_INCREASE_RATE)
                    PENALTY_BAR_LENGTH = max(0, PENALTY_BAR_LENGTH - REWARD_INCREASE_RATE)
                    reward += REWARD_TARGET_IDENTIFIED
                    pygame.time.delay(2000)
                    paused = True

                if target_identified:
                    
# DISPLAYING TARGET IDENTIFIATION              
# ---------------------------------------------------------------------------------------------------
# The moment of triumph! Once the target is identified, it's time to showcase it on the screen.
# I'm using a loop to draw lines around the target, indicating its recognized status. The red lines
# emanate from the target's center, creating a visually striking effect.
# ---------------------------------------------------------------------------------------------------
                    for angle in range(0, 360, 5):
                        x = int(target_rect.centerx + 25 * math.cos(math.radians(angle)))
                        y = int(target_rect.centery + 25 * math.sin(math.radians(angle)))
                        pygame.draw.line(screen, (255, 0, 0), target_rect.center, (x, y), 2)
                    
# ---------------------------------------------------------------------------------------------------                  
# Additionally, I'm adding text to convey the exciting news to the pilot. A bold declaration
# 'Target Identified' in a larger font is displayed at coordinates (10, 250). For added information,
# the target's position is revealed with a smaller font at (10, 280).
# ---------------------------------------------------------------------------------------------------
                    font = pygame.font.Font(None, 24)
                    text_identified = font.render('Target Identified', True, (0, 0, 0))
                    screen.blit(text_identified, (10, 250))

                    text_description = font.render(f'Target Position: ({target_rect.x}, {target_rect.y})', True, (0, 0, 0))
                    screen.blit(text_description, (10, 280))
#
#
# UPDATING CUMULATIVE REWARD AND PENALTIES                   
# ---------------------------------------------------------------------------------------------------
# Keeping track of success and challenges! After every action, I calculate the cumulative rewards
# and penalties. The green reward bar increases, while the red penalty bar decreases. Positive rewards
# contribute to the agent's success, and only positive penalties are accumulated.
# ---------------------------------------------------------------------------------------------------
                cumulative_reward += reward
                cumulative_penalty += max(0, -reward)  # Only accumulating positive penalties
        
# Updating exploration rate over time
                exploration_rate_over_time.append(epsilon)
    
# Decaying exploration rate 
                epsilon *= 0.995
        
# ---------------------------------------------------------------------------------------------------
# The lengths of both bars are updated and drawn on the screen to provide a visual representation
# of the agent's overall performance.
# ---------------------------------------------------------------------------------------------------
                bar_width = 200
                bar_height = 20

                pygame.draw.rect(screen, (0, 255, 0),
                                 ((screen.get_width() - bar_width) // 2, 10, REWARD_BAR_LENGTH, bar_height))  # Green reward bar
                pygame.draw.rect(screen, (255, 0, 0),
                                 ((screen.get_width() - bar_width) // 2, 40, PENALTY_BAR_LENGTH, bar_height))
                
# ---------------------------------------------------------------------------------------------------
# Adding labels to the bars
# ---------------------------------------------------------------------------------------------------
                font = pygame.font.Font(None, 24)
                text_reward = font.render('Reward', True, (0, 0, 0))
                text_penalty = font.render('Penalty', True, (0, 0, 0))
                screen.blit(text_reward, ((screen.get_width() - bar_width) // 2, 5))
                screen.blit(text_penalty, ((screen.get_width() - bar_width) // 2, 35))
#
#
# DRAWING AGENT R'S POSITION ON THE MAP     
# ---------------------------------------------------------------------------------------------------
# Mapping the journey! To offer a bird's eye view of the agent's movements, I draw its position
# on a mini-map. The agent's current location is marked by a red dot on the map, allowing pilots
# to track their progress throughout the operations.
# ---------------------------------------------------------------------------------------------------
                pygame.draw.rect(map_surface, (255, 0, 0),
                                 (R_rect.x * map_width // screen.get_width(), R_rect.y * map_height // screen.get_height(), 2, 2))
                screen.blit(map_surface, map_rect)

                total_iteration_count += 1
#
#
# UPDATING ITERATION COUNT AND DISPLAYING TIME TAKEN
# ---------------------------------------------------------------------------------------------------
# The clock is ticking! After each iteration, I update the total iteration count and display the
# time taken to make decisions and complete actions. This real-time feedback gives players an
# understanding of the pace and efficiency of the target recognition process.
# ---------------------------------------------------------------------------------------------------
                # Displaying time taken for the current episode
                time_taken_for_current_episode = (pygame.time.get_ticks() - start_time) / 1000.0
                time_taken_per_episode.append(time_taken_for_current_episode)
                episode_rewards.append(cumulative_reward)
                
                # Drawing episode information
                text_time = font.render(f'Time taken: {time_taken_for_current_episode:.2f} seconds', True, (0, 0, 0))
                screen.blit(text_time, (10, 10))
            
                text_episode = font.render(f'Episode: {episode_count}/{max_steps_per_episode}', True, (0, 0, 0)) 
                text_episode_x = 10 + text_time.get_width() + 20  
                text_episode_y = 10  
                screen.blit(text_episode, (text_episode_x, text_episode_y))
                
#---------------------------------------------------------------------------------------------------    
# Starting fresh! To ensure each frame is evaluated independently, I reset the reward to zero at
# the end of the loop. This allows the agent to earn new rewards and face new challenges in the
# subsequent frames of the target recognition adventure.
# ---------------------------------------------------------------------------------------------------
                reward = 0

            pygame.display.flip()
            clock.tick(60)
# ---------------------------------------------------------------------------------------------------         
# Appending cumulative rewards and penalties 
# ---------------------------------------------------------------------------------------------------
            cumulative_rewards.append(cumulative_reward)
            cumulative_penalties.append(cumulative_penalty)
# ---------------------------------------------------------------------------------------------------
#
#---------------------------------------------------------------------------------------------------
# Measuring the time taken for the current episode
    total_start_time = pygame.time.get_ticks()
    end_time = time.time()
    time_taken_for_current_episode = end_time - start_time
    time_taken_per_episode.append(time_taken_for_current_episode)
    
# DATA VISUALIZATION 
#---------------------------------------------------------------------------------------------------
# Plotting Episode Rewards Over Time
plt.figure(figsize=(10, 6))
plt.plot(range(1, len(episode_rewards) + 1), episode_rewards, marker='o', linestyle='-', color='b')
plt.title('Episode Rewards Over Time')
plt.xlabel('Episode')
plt.ylabel('Cumulative Reward')
plt.grid(True)
plt.show()


# Plotting Penalties vs. Rewards
plt.figure(figsize=(10, 6))
plt.bar(range(1, len(cumulative_rewards) + 1), cumulative_rewards, color='g', label='Rewards')
plt.bar(range(1, len(cumulative_penalties) + 1), cumulative_penalties, color='r', label='Penalties', bottom=cumulative_rewards)
plt.title('Cumulative Rewards and Penalties Over Time')
plt.xlabel('Episode')
plt.ylabel('Cumulative Value')
plt.legend()
plt.grid(True)
plt.show()

# Plotting Exploration Rate Over Time
plt.figure(figsize=(10, 6))
plt.plot(range(1, len(exploration_rate_over_time) + 1), exploration_rate_over_time, marker='o', linestyle='-', color='orange')
plt.title('Exploration Rate Over Time')
plt.xlabel('Episode')
plt.ylabel('Exploration Rate')
plt.grid(True)
plt.show()

# Plotting Time Taken for Target Identification
plt.figure(figsize=(10, 6))
plt.bar(range(1, episode_count + 1), time_taken_per_episode[:episode_count], color='purple')
plt.title('Time Taken for Target Identification')
plt.xlabel('Episode')
plt.ylabel('Time Taken (seconds)')
plt.show()

# POST-TRAINING DISPLAY AND CLEAN UP
# ---------------------------------------------------------------------------------------------------
# As the recognition loop concludes, I take a moment to showcase the results. I print the total time invested,
#the number of episodes completed, and then gracefully exit Pygame after a brief pause.
# ---------------------------------------------------------------------------------------------------
#
# ---------------------------------------------------------------------------------------------------
# Displaying the total time taken after all episodes
# ---------------------------------------------------------------------------------------------------
total_end_time = pygame.time.get_ticks()
total_time_taken = (total_end_time - total_start_time) / 1000.0
print("Total Time Taken for all episodes:", total_time_taken)

# ---------------------------------------------------------------------------------------------------
# Printing the number of episodes when the game loop ends
# ---------------------------------------------------------------------------------------------------
print("Number of episodes:", episode_count)

# ---------------------------------------------------------------------------------------------------
# Waiting for a moment before quitting
pygame.time.wait(3000)
# ---------------------------------------------------------------------------------------------------

pygame.quit()

NameError: name 'running' is not defined

That's my journey—merging Pygame for visual appeal, reinforcement learning for intelligence, and video processing
to create a target recognition system where my agent learns to navigate obstacles and successfully identify the target.