# **Lab 4: First Draft of Final Project**
    - Salissa Hernandez
    - Juan Carlos Dominguez
    - Leonardo Piedrahita
    - Brice Danvide

This lab is meant to be the first draft of motivation, methods, and one result for the final project of the class. In this lab, you will select a topic to investigate and perform an analysis to understand if this topic is of sufficient interest to use as a final project. You will be writing down many of these aspects in preparation for a final project presentation. 

For the final class project, you should investigate a topic from the course of your choosing. This topic can be related to anything we have discussed. For instance, you might choose to investigate a new implementation of Stable Diffusion training with an alternative latent space function (or new form of cross attention). Please do not let this example bias your choosing of a project--the topic you choose to investigate does not need to be a new algorithmic approach. It could be a new application of an algorithm we discussed. For instance, you might choose to use multi-task modeling for assessing robotic surgery. Or you might choose a new topic in ethical considerations of models. The only requirement is that the idea be something that creates new knowledge in the world and is somehow related to the vast number of topics we discussed in the class. 

Any topic can be chosen from ethical machine learning, convolutional visualization, data generation (with VAE's or GANs), multi-task or multi-modal architectures, stable diffusion, style transfer, or reinforcement learning. 

This lab will help you to ensure that the topic is appropriate for a final project in the course in terms of scope (not so easy that it might considered trivial, but not so hard that it might be considered a full blown dissertation). The right aim of scope should be such that you are investigating the initial analysis of a research topic--but there would still be much work to do for a full research publication. If you are unsure if your topic is appropriate, please contact the instructor for feedback.


# **Objective**
For our final project, we explore the application of **Reinforcement Learning** (RL) to the domain of **autonomous vehicles**, focusing on how RL can improve safety and decision-making processes in dynamic driving environments. Our analysis involves using a **simulated driving environment** (such as CARLA) to train an RL agent to perform basic tasks like navigation, obstacle avoidance, and interaction with other vehicles. By implementing and testing different RL algorithms, we seek to understand their effectiveness in making real-time driving decisions and to analyze how these algorithms can be adapted for real-world applications.

Our **primary objective** is to determine how RL can be used to teach an autonomous vehicle to navigate safely and efficiently within a simulated environment, and to assess the performance of different algorithms in handling complex driving scenarios. Additionally, we will explore the broader implications of RL in autonomous driving, including safety, efficiency, and ethics, with the goal of providing insights into potential challenges when deploying RL-based systems in real-wrold autonomous vehicles. 

### **Terminology Used in This Project**
To ensure clarity and understanding in our analysis, we defined the following key terms use throughout our project:
- Reinforcement Learning (RL): A type of machine learning where an agent learns to make decisions by interacting with an environment, receiving rewards or penalties based on its actions (Medium)
- Agent: The autonomous vehicle in this scenario, which learns to navigate and make driving decisions based on the environment and rewards. (OpenAI Spinning Up)
- Environment: The simulated world in which the agent operates, which includes roads, traffic signs, other vehicles, pedestrians, and obstacles (OpenAI Spinning Up)
- State: A specific configuration of the environment, such as the position and speed of the vehicle, the distance to obstacles, or the presence of other cars. (OpenAI Spinning Up)
- Action: The decision made by the agent at a given state, such as steering left, speeding up, or braking. (OpenAI Spinning Up)
- Reward: A value assigned to the agent's actions, which helps it learn which behaviors are desirable (e.g., avoiding collisions or following traffic laws). (OpenAI Spinning Up)
- Q-Learning / Deep Q-Networks (DQN): RL algorithms used to estimate the expected future rewards for each action, helping the agent make optimal decisions in a given state. (HuggingFace)
- Proximal Policy Optimization (PPO): A state-of-the-art RL algorithm used for training in environments with large action spaces, such as controlling a vehicle's speed and direction. (HuggingFace)

Sources:
- Medium: https://medium.com/%40gurkanc/deep-reinforcement-learning-agents-algorithms-and-strategies-a-practical-game-scenario-a412428ae0e0
- OpenAI Spinning Up - Intro to RL: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
- HuggingFace: https://huggingface.co/learn/deep-rl-course/en/unit3/from-q-to-dqn
- HuggingFace: https://huggingface.co/blog/deep-rl-ppo

# **1. Motivation**

[5 Points] 
- Motivate the need for the research project. 
- Why is this investigation important? 
- What related work are you building from? 
- What are the main research question(s)? 
- What is your hypothesis for what will happen? 
- This section should be something that can be converted into two or three slides for the final presentation. 
- You should write down the motivations and related work that will be presented to the instructor later on.  

## **1.1 Motivation for Our Research Project**
Autonomous vehicles (AVs) represent one of the most transformative innovations in the modern transportation industry. While the technological advancements are promising from current projects like Waymo, the ability of AVs to make safe, efficient, and ethical driving decisions in real-time remains a critical challenge. AVs operate in highly dynamic environments with unpredictable conditions such as varying traffic patterns, road conditions, weather, and interactions with pedestrians and other vehicles. Reinforcement learning offers an opportunity to optimize the decision-making process through experience-based learning.

The **motivation behind this project is to investigate how RL can improve AVs’ performance in terms of safety, efficiency, and adaptability**. By utilizing RL algorithms, we hope to enhance the vehicle’s ability to learn from its environment and make intelligent decisions, such as when to accelerate, brake, and avoid obstacles. These algorithms can also learn from different driving scenarios, enabling the vehicle to perform better in diverse real-world conditions, potentially reducing accidents and improving traffic flow.

The need for autonomous vehicles stems from a broader societal context. Human error accounts for the overwhelming majority of traffic accidents, and autonomous systems offer the potential to significantly reduce these numbers by eliminating distractions, fatigue, and impaired judgment. Moreover, AVs can provide mobility for individuals who are unable to drive, increase transportation efficiency through intelligent routing and coordination, and contribute to environmental sustainability by reducing emissions through smoother, optimized driving. As we move toward a future where AVs are more widely deployed, it becomes increasingly important to ensure these systems are capable of making real-time, intelligent decisions in complex environments. This project contributes to that goal by exploring reinforcement learning as a scalable, adaptable framework for safe and intelligent autonomous driving.

## **1.2 Why is this investigation important?**
The importance of this research lies in addressing two fundamental issues in autonomous driving: **safety and real-time decision-making**. Current autonomous vehicle systems, while capable, still face significant limitations in complex and unpredictable driving environments, such as handling emergent situations (e.g., sudden traffic changes, pedestrians crossing the street) or adapting to new road conditions (e.g., weather, construction zones).

Furthermore, existing decision-making systems in AVs are often rule-based, relying on predefined algorithms. These can struggle to adapt to unforeseen circumstances. RL, however, allows the vehicle to learn from experience. By continuously interacting with its environment, an RL-based system can improve decision-making by understanding which actions lead to better outcomes (e.g., avoiding accidents, following traffic rules). This aligns well with the shift towards adaptive and intelligent systems in autonomous vehicles, which is necessary for widespread adoption and integration into public roads.

*** MAYBE NEED MORE SPECIFIC EXAMPLES OF CURRENT AVs??

## **1.3 Related Work**
There has already been a lot of work done in applying reinforcement learning to autonomous vehicles. Companies like **Tesla** and **Waymo** have been leading the way in developing autonomous systems that can make real-time driving decisions using advanced machine learning techniques. Tesla’s Autopilot and Waymo’s fully autonomous cars both rely on deep learning models to control actions like steering, braking, and navigating complex environments. In the academic space, one of the most widely used tools is CARLA (Car Learning to Act), an open-source driving simulator created specifically for testing and training autonomous vehicles. Many researchers have used CARLA to apply reinforcement learning algorithms such as Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) to teach agents how to drive safely, avoid obstacles, and interact with traffic. Another relevant area of work comes from DeepMind, which has used reinforcement learning to teach robots how to perform complex tasks by interacting with simulated environments. While their research focuses more on robotics, the techniques and successes they’ve had are very applicable to autonomous driving. Our project builds on those existing efforts by experimenting with RL algorithms in simulation, aiming to better understand how well these methods can handle real-world driving challenges like unpredictable traffic or sudden obstacles.


*** NEED OUTSIDE SOURCES/REFERENCES


## **1.4 Main Research Questions**
One of the main questions we’re exploring is: **How can reinforcement learning actually improve the decision-making process in autonomous vehicles, especially in unpredictable or complex situations?** We want to see whether RL can help AVs make smarter, safer real-time choices when things like traffic jams, unexpected pedestrians, or road closures come up.

We’re also asking: **Which RL algorithms work best for teaching self-driving cars how to safely move through traffic and avoid obstacles?** To figure this out, we’ll be testing a few popular algorithms—like Q-Learning, Deep Q-Networks (DQN), and Proximal Policy Optimization (PPO)—to see how well they handle tasks like staying on course, avoiding crashes, and reacting to changes in the environment.

Another important question is: **How can we evaluate whether these RL systems are ready for real-world use?** It’s one thing for a model to perform well in a simulator, but we need to think about how those skills transfer to actual roads. This includes looking at how safe and efficient they are, and how they handle tricky situations that involve ethical decisions—like what to do in an unavoidable accident scenario.

Finally, we want to look at **what specific challenges come up when using RL in real-world autonomous driving, and how we might deal with them.** Since RL depends a lot on simulation, there’s always a risk that the agent won’t behave the same way outside of that controlled setting. Part of our work will focus on finding ways to close that gap between learning in simulation and driving in the real world.

## **1.5 Our Hypothesis**
We **hypothesize that Reinforcement Learning (RL) algorithms can significantly enhance the real-time decision-making capabilities of autonomous vehicles by enabling them to learn from past interactions with the environment and improve over time**. Specifically:

- Deep Q-Learning Networks (DQN) will prove effective in improving basic tasks such as navigation and obstacle avoidance due to its ability to estimate Q-values and maximize long-term rewards.

- Proximal Policy Optimization (PPO) will provide an efficient solution for training agents in large, continuous action spaces (e.g., steering, accelerating, braking) while ensuring stability in learning.

The RL-trained agents will be able to navigate complex driving scenarios with a higher level of safety and efficiency compared to traditional, rule-based systems. However, there may be challenges related to transferring the learned behavior to real-world environments, especially in highly dynamic traffic conditions.

We expect that RL-based autonomous vehicles will show improved performance in real-time decision-making tasks, but also face challenges in the scalability and robustness of learned behaviors when transitioning from simulated to real-world environments.

# **2. Methodology & Analysis**

[5 Points] 
- You have a great deal of free rein to decide what analyses you should use and therefore you will be graded on the appropriateness of the methods chosen. 
- Argue for a few analyses that can help to answer your research question(s). 
- You should argue for more than one kind of analysis to help answer your research questions. 
- Try to make this the first draft of your methodology. 
- This will eventually turn into 1-2 slides on methodology in your final presentation. 


# **3. Visualizations & Results**

[5 Points] 
- Perform one part of the analysis to help answer one (or more) research question(s). 
- Create visualizations that will help to provide evidence. 
- Discuss the results and how they provide evidence for answering the research questions. 
- Try to make this a first draft of one part of the results for the project.
- Try to have at least one visualization that you plan to use as a figure in the final presentation. 