<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173_Fall2025/blob/main/F25_Class_06_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

##### **Module 6: Reinforcement Learning**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 6 Material

* Part 6.1: Introduction to Introduction to Gymnasium and Q-Learning
* Part 6.2: Stable Baselines Q-Learning
* **Part 6.3: Atari Games with Stable Baselines Neural Networks**
* Part 6.4: Future of Reinforcement Learning


## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

Note: using Google CoLab


Make sure your GMAIL address is included as the last line in the output above.

### Install Gymnasium

Before we can beging, we need to install Hugging Face datasets by running the code in the next cell.

In [1]:
# Install gymnasium

# Install stable-baselines
!pip install stable-baselines3[extra] gymnasium > /dev/null

# Install gymnasium[atari] package
!pip install gymnasium[atari] > /dev/null

# Install pyvirtualdisplay
!pip install pyvirtualdisplay > /dev/null

# Install opengl
!sudo apt-get install -y python-opengl ffmpeg > /dev/null

# Set a non-interactive frontend for debconf and install xvfb and ffmpeg
!sudo DEBIAN_FRONTEND=noninteractive apt-get install -y xvfb ffmpeg > /dev/null


E: Unable to locate package python-opengl


### Install Custom Functions

Run the cell below to load custom functions used in this lesson.

In [2]:
# Simple function to print out elasped time
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# **Atari Games with Stable Baselines Neural Networks**

The Atari 2600 is a home video game console from Atari, Inc., Released on September 11, 1977. Most credit the Atari with popularizing microprocessor-based hardware and games stored on ROM cartridges instead of dedicated hardware with games built into the unit. Atari bundled their console with two joystick controllers, a conjoined pair of paddle controllers, and a game cartridge: initially [Combat](https://en.wikipedia.org/wiki/Combat_(Atari_2600)), and later [Pac-Man](https://en.wikipedia.org/wiki/Pac-Man_(Atari_2600)).

Atari emulators are popular and allow gamers to play many old Atari video games on modern computers. These emulators are even available as JavaScript.

* [Virtual Atari](http://www.virtualatari.org/listP.html)

Atari games have become popular benchmarks for AI systems, particularly reinforcement learning. OpenAI Gym internally uses the [Stella Atari Emulator](https://stella-emu.github.io/). You can see the Atari 2600 in Figure 12.ATARI.

**Figure 12.ATARI: The Atari 2600**
![Atari 2600 Console](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/atari-1.png "Atari 2600 Console")

## Actual Atari 2600 Specs

* CPU: 1.19 MHz MOS Technology 6507
* Audio + Video processor: Television Interface Adapter (TIA)
* Playfield resolution: 40 x 192 pixels (NTSC). It uses a 20-pixel register that is mirrored or copied, left side to right side, to achieve the width of 40 pixels.
* Player sprites: 8 x 192 pixels (NTSC). Player, ball, and missile sprites use pixels 1/4 the width of playfield pixels (unless stretched).
* Ball and missile sprites: 1 x 192 pixels (NTSC).
* Maximum resolution: 160 x 192 pixels (NTSC). Max resolution is achievable only with programming tricks that combine sprite pixels with playfield pixels.
* 128 colors (NTSC). 128 possible on screen. Max of 4 per line: background, playfield, player0 sprite, and player1 sprite. Palette switching between lines is common. Palette switching mid-line is possible but not common due to resource limitations.
* 2 channels of 1-bit monaural sound with 4-bit volume control.

## Gymnasium Atari Breakout

You can use OpenAI Gym with Windows; however, it requires a special [installation procedure](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30).

This chapter demonstrates playing [Atari Breakout](https://en.wikipedia.org/wiki/Breakout_(video_game)). Atari Breakout is a classic arcade game that was released by Atari, Inc. in 1976. In the game, the player controls a paddle at the bottom of the screen, using it to bounce a ball against a wall of bricks at the top. The objective is to destroy all the bricks by hitting them with the ball, which the player deflects with the paddle. As the player progresses, the ball moves increasingly faster, and some bricks may require multiple hits to break. The player loses a turn when the ball misses the paddle and hits the bottom of the screen. The simplicity of Breakout's gameplay, combined with its increasing difficulty as the game progresses, has made it a quintessential example of the easy-to-learn-yet-hard-to-master design ethos that characterized many early video games.

In the context of artificial intelligence research and particularly within reinforcement learning, Atari Breakout has been adapted as an environment within OpenAI's Gym toolkit, a collection of environments that provide a standardized interface for algorithm development and benchmarking. Stable Baselines is a set of high-quality implementations of reinforcement learning algorithms, which offers a simple way to train and evaluate agents on various tasks, including playing Atari games like Breakout. The adaptation of Breakout to the Gym environment, often referred to as 'Breakout-v0' or 'BreakoutDeterministic-v4' in the Gym library, abstracts the game's mechanics into observations, actions, and rewards, which an AI agent can interact with. In this setup, the agent observes the game state (typically the pixel data from the screen), selects actions (like moving the paddle left or right), and receives rewards (such as the score for breaking bricks). This allows researchers and enthusiasts to apply and test reinforcement learning algorithms using Stable Baselines to develop AI agents that can learn to play Breakout at a superhuman level, offering a playground to advance the field of machine learning.

## Training the Agent

We are now ready to train the DQN. Depending on how many episodes you wish to run through, this process can take many hours. This code will update both the loss and average return as training occurs. As training becomes more successful, the average return should increase. The losses reported reflecting the average loss for individual training batches.

In [8]:
# Co-Pilot

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.monitor import Monitor

# Set this constant to either 'Breakout' or 'Atlantis' to choose the game
GAME_NAME = 'Breakout'  # Or 'Atlantis'

# Correct environment ID
env_id = f"ALE/{GAME_NAME}-v5"
print(f"Using environment ID: {env_id}")

# Create the game environment, note that we wrap it with VecFrameStack for preprocessing
env = gym.make(env_id)
env = Monitor(env)
env = VecFrameStack(env, n_stack=4)

# Initialize the agent, here we use Proximal Policy Optimization (PPO)
model = PPO('CnnPolicy', env, verbose=1, tensorboard_log="./atari_ppo_tensorboard/")

# Train the agent
TIMESTEPS = 1e5
model.learn(total_timesteps=TIMESTEPS)

# Save the model
model.save(f"{GAME_NAME}_ppo_model")

# Evaluate the trained agent
eval_env = gym.make(env_id)
eval_env = Monitor(eval_env)
eval_env = VecFrameStack(eval_env, n_stack=4)
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10)

print(f"Mean reward: {mean_reward} +/- {std_reward}")

# Don't forget to close the environments when you are done
env.close()
eval_env.close()


Using environment ID: ALE/Breakout-v5


NamespaceNotFound: Namespace ALE not found. Have you installed the proper package for ALE?

In [9]:
# Original Code

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

# Set this constant to either 'Breakout' or 'Atlantis' to choose the game
GAME_NAME = 'Breakout'  # Or 'Atlantis'

# Create the game environment, note that we wrap it with VecFrameStack for preprocessing
env_id = f"{GAME_NAME}NoFrameskip-v4"
env = make_atari_env(env_id, n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)

# Initialize the agent, here we use Proximal Policy Optimization (PPO)
model = PPO('CnnPolicy', env, verbose=1, tensorboard_log="./atari_ppo_tensorboard/")

# Train the agent
TIMESTEPS = 1e5
model.learn(total_timesteps=TIMESTEPS)

# Save the model
model.save(f"{GAME_NAME}_ppo_model")

# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)

print(f"Mean reward: {mean_reward} +/- {std_reward}")

# Don't forget to close the environment when you are done
env.close()


NameNotFound: Environment `BreakoutNoFrameskip` doesn't exist.

## Videos

Perhaps the most compelling way to view an Atari game's results is a video that allows us to see the agent play the game. We now have a trained model and observed its training progress on a graph. The following functions are defined to watch the agent play the game in the notebook.

We can also visualize this environment

In [10]:
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecVideoRecorder
from stable_baselines3 import PPO
import os

# Set the game name here
GAME_NAME = 'Breakout'  # Can be 'Atlantis' as well

# Load your previously trained model
model_path = f"{GAME_NAME}_ppo_model.zip"
model = PPO.load(model_path)

# Create the Atari environment and apply the correct wrappers
env_id = f"{GAME_NAME}NoFrameskip-v4"
env = make_atari_env(env_id, n_envs=1, seed=0)
env = VecFrameStack(env, n_stack=4)

# Record the environment
video_folder = '/content/videos'
if not os.path.exists(video_folder):
    os.makedirs(video_folder)

env = VecVideoRecorder(env, video_folder,
                       record_video_trigger=lambda step: step == 0,
                       video_length=500,
                       name_prefix=f"{GAME_NAME}-agent")

# Reset the environment and observe the initial observation shape
obs = env.reset()
print("Initial observation shape:", obs.shape)  # Should be (1, 4, 84, 84)

# Run one episode
done = False
while not done:
    action, _states = model.predict(obs, deterministic=True)
    obs, rewards, done, info = env.step(action)
    env.render()

# Close the environment which should also save the video
env.close()

FileNotFoundError: [Errno 2] No such file or directory: 'Breakout_ppo_model.zip.zip'

The goal is to move the above cart without causing the pole to fall over.

In [11]:
from IPython.display import HTML
from base64 import b64encode

# Load the video and encode it
video_path = '/content/videos/'  # Make sure this matches the path where the videos are saved
video_files = [f for f in os.listdir(video_path) if f.endswith('.mp4')]

if video_files:
    video_filename = video_files[-1]  # if you expect multiple videos, modify this to select the correct one
    full_video_filename = f"{video_path}/{video_filename}"
    mp4 = open(full_video_filename, 'rb').read()
    encoded = b64encode(mp4).decode('ascii')
    html = HTML(data=f'<video width="640" height="480" controls><source src="data:video/mp4;base64,{encoded}" type="video/mp4"></video>')
else:
    html = HTML(data="Error: No video found")

html

FileNotFoundError: [Errno 2] No such file or directory: '/content/videos/'

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Copy of Class_06_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Lizard Tail**

## **US Space Force**

![___](https://upload.wikimedia.org/wikipedia/commons/thumb/2/29/Seal_of_the_United_States_Space_Force.svg/1920px-Seal_of_the_United_States_Space_Force.svg.png)


The **United States Space Force (USSF)** is the space force branch of the United States Armed Forces and one of the eight uniformed services of the United States. It is one of two independent space forces in the world.

The United States Space Force traces its origins to the Air Force, Army, and Navy's military space programs created during the beginning of the Cold War. US military space forces first participated in combat operations during the Vietnam War and have participated in every U.S. military operation since, most notably in the Persian Gulf War, which has been referred to as the "first space war." The Strategic Defense Initiative and creation of Air Force Space Command in the 1980s marked a renaissance for military space operations.

Proposals for a U.S. Space Force were first seriously considered during the Reagan Administration as part of the Strategic Defense Initiative. Congress began exploring establishing a Space Corps or Space Force in the late 1990s and early 2000s. The idea of establishing a Space Force was resurrected in the late 2010s in response to Russian and Chinese military space developments, resulting in the Space Force's establishment on 20 December 2019 during the first Trump Administration.

The Space Force is organized as part of the Department of the Air Force alongside the U.S. Air Force, a coequal service. The Department of the Air Force is headed by the civilian secretary of the Air Force, while the U.S. Space Force is led by the chief of space operations. The U.S. Space Force's status as part of the Department of the Air Force is intended to be an interim measure towards a fully independent Department of the Space Force, led by a civilian secretary of the Space Force.

### **Mission**

Secure our Nation's interests in, from, and to space.

>Mission statement of the United States Space Force[6]
The Space Force's statutory responsibilities are outlined in 10 U.S.C. § 9081 and originally introduced in the United States Space Force Act, the Space Force is organized, trained, and equipped to:

1. Provide freedom of operation for the United States in, from, and to space;
2. Conduct space operations; and
3. Protect the interests of the United States in space.

The Department of Defense further defines the specified functions of the Space Force to:

1. Provide freedom of operation for the United States in, from, and to space.
2. Provide prompt and sustained space operations.
3. Protect the interests of the United States in space.
4. Deter aggression in, from, and to space.
5. Conduct space operations.

The Space Force further breaks down its mission into three core functions, which align directly to its mission statement to "secure our Nation's interests in, from, and to space:"

1. Space Superiority (in space)
2. Global Mission Operations (from space)
4. Assured Space Access (to space)

## **Space Superiority**

**Concept of a future space interception**

Space superiority defends against space and counterspace threats by protecting spacecraft in space or protecting against attacks enabled by adversary spacecraft, requiring that the Space Force establish control of the domain. The Space Force describes that at a time and place of the United States' choosing it must be able to assure continued use of spacecraft and deny adversaries use of their spacecraft or space-enabled capabilities.[6]

Missions that support space superiority include orbital warfare, electromagnetic warfare, and space battle management.

**Global Mission Operations**

Missile warning radar at Pituffik Space Base, Greenland
Global mission operations integrates joint functions across all domains (land, air, maritime, space, cyberspace) on a global space. Through space, the U.S. military and its allies can see, communicate, and navigate. Global mission operations also protect U.S. forces on Earth through early warning of incoming missiles and other types of attack. The Space Force describes global mission operations as allowing the rest of the U.S. military to defend the air, land, and sea.[6]

Missions that support global mission operations include missile warning, satellite communications, and positioning, navigation, and timing.

**Assured Space Access**

Missions supporting space access include launch, range control, cyber, and space domain awareness.

**History**

In the long haul, our safety as a nation may depend upon achieving "space superiority." Several decades from now, the important battles may not be sea battles or air battles, but space battles, and we should be spending a certain fraction of our national resources to ensure that we do not lag in obtaining space supremacy.

Launch of Explorer 1, America's first satellite, by the U.S. Army in 1958
The beginnings of the U.S. Space Force can be traced to the aftermath of World War II. General Henry H. Arnold, commander of the Army Air Forces, tasked General Bernard Schriever to integrate with the scientific community to identify and develop technologies that could be beneficial for the new U.S. Air Force in the next global conflict. Identifying the importance of space, the U.S. Army, U.S. Navy, and U.S. Air Force each started their own separate space and rocket programs. The U.S. Air Force created the first military space organization in the world, establishing the Western Development Division in 1954 and placing it under the command of General Schriever. The Army followed a year later, creating the Army Ballistic Missile Agency under the leadership of General John Bruce Medaris and Wernher von Braun.

The Army led the United States into space, launching the first American spacecraft, Explorer 1, on 31 January 1958. Space exploration continued to be a military responsibility until the National Aeronautics and Space Administration was created in 1958. The military shifted from conducting their own space exploration programs to supporting NASA's, providing the agency with its astronauts and space launch vehicles, while also conducting astronaut recovery and supporting space launches from the Air Force's Eastern Range.

The Air Force was recognized as the lead military service for space by the early 1960s, with the Army and Navy operating in supporting roles. Early military space efforts were focused on developing and fielding spacecraft to accomplish national objectives, with a focus on weather, reconnaissance and surveillance, communications, and navigation. On 18 August 1961, the Air Force and National Reconnaissance Office launched the first CORONA reconnaissance mission, recovering 3,000 feet of film from space and imaged 1.65 million square miles of the Soviet Union's territory.

Concerned about the development of the Soviet Union's own space forces, the Air Force advocated for a military human spaceflight program. General Curtis LeMay described strong parallels between World War I aviation and 1960s space operations, noting how quickly flying evolved from chivalric and unarmed reconnaissance flights to combat efforts designed to destroy enemy air superiority. General LeMay believed that it was naive to believe that the same trends were not expected to be seen in space and must be prepared for. Although the Air Force made significant progress in developing the X-20 spaceplane, Manned Orbiting Laboratory, and Blue Gemini, opposition from the Department of Defense prevented operational fielding.

In November 1968, the Central Intelligence Agency reported a successful satellite destruction simulation performed by the Soviet Union as a part of its Istrebitel Sputnikov anti-satellite weapons research programme. Possibly as a response to the Soviet programme, the United States has earlier began Project SAINT, which was intended to provide anti-satellite capability to be used in the case of war with the Soviet Union. However the project was cancelled early on due to budget constraints and after details were leaked to The New York Times in 1962. Despite these setbacks, the Air Force did successfully field the Program 437 anti-satellite weapon system, which used nuclear Thor missiles to intercept and destroy enemy spacecraft.

Although most military space forces were organized under the Air Force, they were still fragmented within several different major commands. Recognizing rapid growth of space forces and the need to centralize them under one command, the Air Force established Air Force Space Command in 1982. This was followed by the establishment of the joint United States Space Command in 1985, aligning Air Force Space Command, Naval Space Command, and Army Space Command under a single operational commander. These two moves, along with the Strategic Defense Initiative's establishment by President Ronald Reagan, led to a renaissance of military space operations in the 1980s.

Space forces were first used in combat operations during the Vietnam War, with Air Force weather and communications spacecraft supporting ground, sea, and air operations. During Operation Urgent Fury in Grenada, satellite communications were used to conduct command and control for the first time, while Operation El Dorado Canyon and Operation Just Cause marked the first time that major U.S. forces incorporated information from space-based intelligence systems.

The Persian Gulf War marked the first time that military space forces were unleashed to their fullest extent. Over sixty spacecraft provided 90% of theater communications and command and control for a multinational army of 500,000 troops, weather support for commanders and mission planners, missile warning of Iraqi Scud missile launches, and satellite navigation for air and land forces moving across a featureless desert. The decisive role that space forces played directly enabled an overwhelming Coalition victory and led to the Persian Gulf War being coined "the first Space War."

While U.S. space forces supported all U.S. military operations in the 1990s, Operation Allied Force marked the first use of Global Positioning System-aided munitions in a conflict, ushering in a new era of precision bombing. Following the September 11 attacks, U.S. space forces mobilized to respond as part of the Global War on Terrorism Operation Enduring Freedom, Operation Iraqi Freedom, and Operation Inherent Resolve.
