# Carla The Reinforcement Learning Self-Driving Car (Version Morra)

## Introduction

<p>In this notebook we will endeavor to build a Self-Driving car using a reinforcement learning approach in an environment called the Carla Simulator based on RGB data and Collision data.
</p>

<p>Carla is an environment built using unreal engine and C++ to support development and simulation of autonomous driving that uses the OpenDRIVE standard (1.4 as today) to define roads and urban settings.Where the environment is the server whether it is a local host or a remote host and the agent (In this case the car) is the client. for more information please the following links:
</p>

- <a href="http://proceedings.mlr.press/v78/dosovitskiy17a/dosovitskiy17a.pdf">CARLA: An Open Urban Driving Simulator</a>

- <a href="http://carla.org/">The official website for the carla open-source simulator for autonmous driving research.</a>


## Requirements & Installation
For requirements and installation please review the following links here:
- <a href="https://carla.readthedocs.io/en/latest/start_quickstart/#requirements">Requirements</a>
- <a href="https://carla.readthedocs.io/en/latest/start_quickstart/#installation-summary">How to install carla</a>

- Install Tensorflow version (2.3.1) and Keras version (2.4.3) otherwise you may need to update or downgrade some of the commands, hence it is advisable to use a virtual environment.

- <a href="https://github.com/carla-simulator/carla/blob/master/Docs/download.md">Don't forget to Download relatively recenet version of the carla simulatior</a>
----------------------------------------------------
Once everything is setup you must ensure that you have that you have `CarlaUE4.exe` running. or if you are on linux run the command `./CarlaUE4.sh`

In [3]:
from IPython.display import Image
from IPython.core.display import HTML 

In [4]:
Image(url= "https://carla.readthedocs.io/en/latest/img/welcome.png")

## Carla Python API
Carla has a <a href="https://carla.readthedocs.io/en/latest/python_api/">Python API</a> that facilitates the process of building intelligent systems in python. However there are concepts.


In [2]:
Image(url= "https://carla.readthedocs.io/en/latest/img/carla_modules.png")

--------------------------------
- 1st- World and client.
- 2nd- Actors and blueprints.
- 3rd- Maps and navigation.
- 4th- Sensors and data.

All these concepts that were built using OOP. So most of these core concepts were treated as objects and classes. for example:

----------------------------------------------

        # starting the environement
        self.client = carla.Client("localhost", 2000)
        self.client.set_timeout(3.0)
---------------------------------------------------------------

# Code Overview.
The code will be broken down into four parts.
- Global variables
- Two classes
    - EnvControl: Is a class that handles the carla environment using the Python API for Carla, processes the images ....etc
    - Deep_Cue_Network_Agent: This is the class where the actual DQN algorithm is written in conjunction to the VGG-16 Network. 
- The main function: This is where everything comes together and the results are being recorded episodically.

## The necassary libraries and Packages

In [None]:
# Imports
import math
import numpy as np
import pandas as pd
import cv2
from collections import deque
import tensorflow as tf
from keras.layers import Dense, GlobalMaxPool2D
from keras.optimizers import Adam
from keras.models import Model
from keras.applications.vgg16 import VGG16
import glob
import os
import sys
from tqdm import tqdm
import random
import time
from threading import Thread


# Getting the necassary files through glob
try:
    sys.path.append(glob.glob('../carla/dist/carla-*%d.%d-%s.egg' % (
        sys.version_info.major,
        sys.version_info.minor,
        'win-amd64' if os.name == 'nt' else 'linux-x86_64'))[0])
except IndexError:
    pass

# Importing the carla API Classes.
import carla

## The Global variables.
<p>The Global variables here are variables that will be used everywhere in the script. Some of the properties of these global variables is that, they can affect the behavior of the entire script. so it would be better to put them aside in the event of wanting to fine tune them.</p>

The global variables handle:
- Total script run time calculation.
- window size for the image data.
- the size of the minibatch and the training batch
- the name of the model, for recording purposes.
- the minumum possible reward in the event that an anomly would occur.
- DQN hyperparameters.
    - Episodes
    - Discount rate
    - Epsilon
    - Epsilon decay.
    - Minimum epsilon.
- recording and updating variable.

In [7]:
## GLOBAL VARIABLES
very_start = time.time()

# The frame dimesions.
IM_WIDTH = 500
IM_HEIGHT = 500

# The minibatch size
minibatch_size = 16

# Predictions size
PREDICTION_BATCH_SIZE = 1

# The size of the training batch.
TRAINING_BATCH_SIZE = minibatch_size // 4

# CNN archietecture
MODEL_NAME = "VGG16_GlobalMax2DPool"

# Lowest possible reward
MIN_REWARD = - 100 

# The number of episodes (NOT STEPS)
EPISODES = 500

# The discount rate from the bellman equation
DISCOUNT = 0.997

# Whether or not we will use the network
# The probability of using the network increases over time
# However it won't go below 0.001
epsilon = 1
EPSILON_DECAY = 0.997
MIN_EPSILON = 0.001

# Get rewards every
GET_REWARD_STATS_EVERY = 5

## EnvControl
The `EnvControl` class controls the environment. It manages a plethora of different aspects of the environment and acts as an umpire in regards to the relationship between the car and the environment.

------------------
#### Some of the things, the class handles:
- Server/Client relations (the car and the environment).
- Spawning a car of model type tesla (The agent) at random location in carla environment [Every time it restarts].
- Provides the means for manual control.
- Attaching the camera to the front of the car and preproccessing the images it recieves into `RGB` format.
- One step dynamics: Meaning it defines how the car should take a step in its environment through a governing body of rules.
- Collision detection.
- Destroys actors at the end of each episode.

In [None]:
class EnvControl:
 
    
    # Full steering amount
    STEER_AMT = 0.7
    
    # The size of the frame.
    im_width = 500
    im_height = 500
    
    # The values coming in from the front camera after being processed.
    front_camera = None
    
    def __init__(self):
        
        # starting the environement
        self.client = carla.Client("localhost", 2000)
        self.client.set_timeout(3.0)
        
        # Getting the world into a variable.
        # Getting the car from the blueprint object
        self.world = self.client.get_world()
        self.blueprint_library = self.world.get_blueprint_library()
        self.model_3 = self.blueprint_library.filter("model3")[0]
        # The duration of the episode
        self.SECONDS_PER_EPISODE = 10
    
    def RESTART(self):
        # recording collisions
        self.collision_hist = []
        # Recording the list of actors for later destruction
        self.actor_list = []   
        
        # Getting the spawn locations (there are 200 spawn locations)
        # Then creating the vehicle 
        # Finally storing it into the actor list for later destruction
        self.transform = random.choice(self.world.get_map().get_spawn_points())
        self.vehicle = self.world.spawn_actor(self.model_3, self.transform)
        self.actor_list.append(self.vehicle)
        
        # Finding the RGB senor blueprint.
        # Initializing it with necassary parameters.
        # With front view
        self.rgb_cam = self.blueprint_library.find('sensor.camera.rgb')
        self.rgb_cam.set_attribute("image_size_x", f"{self.im_width}")
        self.rgb_cam.set_attribute("image_size_y", f"{self.im_height}")
        self.rgb_cam.set_attribute("fov", f"110")
        
        ## Creating the first Sensor (Sensor No. 1)
        # Getting the car location
        # relative to the car location putting the camera in the front area
        # Attaching it and assigning it
        # Putting it into the actor list
        # All the data coming into that sensor, get the RGB version of it.
        transform = carla.Transform(carla.Location(x=2.5, z=0.7))
        self.sensor = self.world.spawn_actor(self.rgb_cam, transform, attach_to=self.vehicle)
        self.actor_list.append(self.sensor)
        self.sensor.listen(lambda data: self.process_img(data))

        
        
        # Applying control as stationary
        self.vehicle.apply_control(carla.VehicleControl(throttle=0.0, brake=0.0))
        time.sleep(4)
        
        ## Creating the second sensor (Sensor No. 2)
        # Getting the blueprint for it.
        # Putting into the actors list for later destruction.
        colsensor = self.blueprint_library.find("sensor.other.collision")
        self.colsensor = self.world.spawn_actor(colsensor, transform, attach_to=self.vehicle)
        self.actor_list.append(self.colsensor)
        
        # Be stand by till you recieve input  -----
        while self.front_camera is None:
            time.sleep(0.01)
            
        # getting the starting time of the episode.
        self.episode_start = time.time()
        
        # Putting the car in a stationary position
        self.vehicle.apply_control(carla.VehicleControl(throttle=0.0, brake=0.0))
        
        return self.front_camera # Returning what the sensor sees.
    
    def process_img(self, image):
        raw = np.array(image.raw_data) # convert to an array
        reshaped_image = raw.reshape((self.im_height, self.im_width, 4)) # was flattened, so we're going to shape it.
        rgb_image = reshaped_image[:, :, :3] # remove the alpha
        cv2.imshow("", rgb_image)
        cv2.waitKey(1)
        self.front_camera = rgb_image
    
    # Book keeping
    def collision_data(self, event):
        self.collision_hist.append(event)
    
    def step(self, action):
        '''
        - ALL actions right,left,forward
        - handle for the observation, possible collision, and reward
        
        '''
        if action == 0:
            self.vehicle.apply_control(carla.VehicleControl(throttle = 0.7, steer = - 1 * self.STEER_AMT)) # left
        elif action == 1:
            self.vehicle.apply_control(carla.VehicleControl(throttle = 0.5, steer = 0.0)) # Half-throttle
        elif action == 2:
            self.vehicle.apply_control(carla.VehicleControl(throttle = 1.0, steer= 0)) # Full-throttle
        elif action == 3:
            self.vehicle.apply_control(carla.VehicleControl(throttle = 0.7, steer = 1 * self.STEER_AMT)) # right
            
        
        # Getting the velocity
        v = self.vehicle.get_velocity()
        
        # Getting the resultant velocity meter/sec
        kmh = int(3.6 * math.sqrt(v.x**2 + v.y**2 + v.z**2))
        
        ## The Reward Shaping
        # if there is a collision Then just end the episode
        #  If there isnt a collision check the speed if all is well reward it
        if len(self.collision_hist) != 0:
            done = True
            reward = -100
        elif kmh < 35:
            done = False
            reward = -15
        elif kmh > 120:
            done = False
            reward = - 10
        else:
            done = False
            reward = 15
        # Is the episode over already ?
        if self.episode_start + self.SECONDS_PER_EPISODE < time.time():
            done = True
        # Getting the new input based, the reward and the terminal state boolean
        return self.front_camera, reward, done, None

## Deep_Cue_Network_Agent class
This class consists of six different essential methods.

- `__init__` : Instantiates a Deep_Cue_Network_Agent Object

- `build_model`: This method creates globally pooled version of the VGG-16 architecture.

- `update_replay_memory`: Updates the replay memory

- `train` : the train method updates the q-values for the DQN algorithm.

- `predict_qs`: gets the q-values using the neural network's .predict() method after reshaping and processing the input first.

- `thread_loop`: creates an infinite loop that acts as a thread for the purposes of aiding the training process, such that the agent would be predicting and fitting simultaneously and it can only be stopped through a termination flag that is triggered by the end of the training process.

In [None]:
class Deep_Cue_Network_Agent:
    def __init__(self):
        # Target model and fitment model.
        self.model = self.build_model()
        self.target_model = self.build_model()
        self.target_model.set_weights(self.model.get_weights())
        
        # The size of the replay memory.
        self.REPLAY_MEMORY_SIZE = 4_000
        
        # Q - Learning specific variable
        # The replay memory, using a deque (A queue that can be used from both sides).
        self.replay_memory = deque(maxlen= self.REPLAY_MEMORY_SIZE)

        # The minimum replay memory size.
        self.MIN_REPLAY_MEMORY_SIZE = 1_000
        
        # State and network attributes
            # Tracker
            # termination boolean
            # Last episode that was logged in
            # Training flag
        self.target_update_counter = 0 # Update tracker
        self.terminate = False  # is it terminal state
        self.last_logged_episode = 0
        self.training_initialized = False
        
        # Update the target model every 5
        self.UPDATE_TARGET_EVERY = 5
        
    def build_model(self):
        '''
        Handles building the model, using maxpooled version
        of the VGG-16 archeticutre without the weights.
        input :: None
        output :: keras.model
        '''
        
        
        # Creating the VGG16
        # Try with pre-trained network -----
        base_model = VGG16(include_top = False, weights = None, input_shape= (IM_HEIGHT,IM_WIDTH ,3))
        
        # getting the output 
        x = base_model.output
        
        # the pooling method
        x = GlobalMaxPool2D()(x)
        
        # Getting the predictions from the dense layer.
        predictions = Dense(3, activation="linear")(x)
        
        # setting the stage.
        model = Model(inputs = base_model.input, outputs=predictions)
        model.compile(loss="mse", optimizer=Adam(lr = 0.009), metrics=["accuracy"])
        
        return model
    
    def update_replay_memory(self, transition):
        '''
         Funtion updates replay memory with the new frames.
         Input: transition -> Tuple :: (current_state, action, reward, new_state, done)
         Output: None
        '''
        self.replay_memory.append(transition)

    def train(self):
        
        if len(self.replay_memory) < self.MIN_REPLAY_MEMORY_SIZE:
            return
        
        print("Training started !!!!")
        
        # Getting a random sample from the replay memory relative to the minibatch size assigned.
        minibatch = random.sample(self.replay_memory, minibatch_size)
        
        # CURRENT Q - VALUES
        # Getting the current states from tuple which is at the 0th index
        # normalizing the frame and storing it in current states
        current_states = np.array([transition[0] for transition in minibatch])/255
        
        # Using them to predict
        current_qs_list = self.model.predict(current_states, PREDICTION_BATCH_SIZE)
            
        # New Q - values
        # Getting the new state which is at the 3rd index
        # Normalize the frames
        new_current_states = np.array([transition[3] for transition in minibatch])/255
        # FUTURE Q_LIST
        future_qs_list = self.target_model.predict(new_current_states, PREDICTION_BATCH_SIZE)

        # Change it into a Supervised learning problem, In somesense
        X = []
        y = []
        
        # Looping through the minibatch
        # Updating the q value if it is not over
        for index, (current_state, action, reward, new_state, done) in enumerate(minibatch):
            if not done:
                # The equation for updating the q-value
                new_q = reward + DISCOUNT * np.max(future_qs_list[index])
            else:
                new_q = reward
                
            # Update the current 
            current_qs = current_qs_list[index]
            current_qs[action] = new_q

            # X and Y like supervised learning.
            X.append(current_state)
            y.append(current_qs)
            
        # preparing the tensorboard
        log_this_step = False
        if current_episode_ptr > self.last_logged_episode:
            log_this_step = True
        self.last_log_episode = current_episode_ptr
        
        # Fitting the model
        self.model.fit(np.array(X)/255, np.array(y), batch_size = TRAINING_BATCH_SIZE, verbose = 0, shuffle = False)

        if log_this_step:
            self.target_update_counter += 1

        if self.target_update_counter > self.UPDATE_TARGET_EVERY:
            self.target_model.set_weights(self.model.get_weights())
            self.target_update_counter = 0
            
    def predict_qs(self, state): ## predict_qs for only one
        return self.model.predict(np.array(state).reshape(-1, *state.shape)/255)[0]
    
    def thread_loop(self):
        # toy data to warm up the model.
        X = np.random.uniform(size=(1, IM_HEIGHT, IM_WIDTH, 3)).astype(np.float32)
        y = np.random.uniform(size=(1, 3)).astype(np.float32)
        
        # Fitting the model
        self.model.fit(X,y, verbose = False, batch_size = 1) # Fitting the model of batch size one.
            
        # Initisalization setup
        self.training_initialized = True            
            
        # Infinite loop.
        while True:
            if self.terminate:
                return
            self.train()
            time.sleep(0.02)

## The main function
<p>Through the main function we want to utilize our GPU for training purposes and we are going to use 2048 MB The GPU memory, feel free to change that depending on your hardware. We create a folder to store the results and also a directory for storing the weights for the our neural network. after initializing the `EnvControl object` and the `Deep_Cue_Network_Agent`. we open up the thread so that we can predict at the same time. Then we run the agent for specified number of episodes in this case for 500 episodes.</p>

One thing that is important to note is this block of code:
            # Play for given number of seconds only
            while True:

                # This part stays mostly the same, the change is to query a model for Q values
                if np.random.random() > epsilon:
                    # Get action from Q table [Use the newtwork, use the .predict()]
                    action = np.argmax(agent.predict_qs(current_state))
                else:
                    # Get random action
                    action = np.random.randint(0, 4)
                    # This takes no time, so we add a delay matching 10 FPS (prediction above takes longer)
                    time.sleep(1/10)
The reason for that is because this block of code handles the exploitation versus-the exploration in particular this line
`if np.random.random() > epsilon:`.

In [None]:
if __name__ == '__main__':

    # For stats
    ep_rewards = []
    ep_rewards.append(MIN_REWARD) # Starting out with a low reward
    full_stat = []
    
    # The head
    full_stat.append(['average_reward','Min_reward', 'max_reward', 'epsilon'])

    # For more repetitive results
    random.seed(40)
    np.random.seed(40)
    tf.random.set_seed(40) #https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/random/set_seed

    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
      try:
        tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit= 1024*2)])
      except RuntimeError as e:
        print(e)

    # Create models folder
    if not os.path.isdir('models'):
        os.makedirs('models')

    # Create agent and environment
    agent = Deep_Cue_Network_Agent()
    env = EnvControl()           
    
    # Start training thread and wait for training to be initialized
    trainer_thread = Thread(target=agent.thread_loop, daemon=True)
    trainer_thread.start()
    while not agent.training_initialized:
        time.sleep(0.01)
            
    # Initialize predictions - forst prediction takes longer as of initialization that has to be done
    # It's better to do a first prediction then before we start iterating over episode steps
    agent.predict_qs(np.ones((env.im_height, env.im_width, 3)))

    ID = 0

    # Iterate over episodes
    for episode in tqdm(range(1, EPISODES + 1), ascii=True, unit='episodes'):

            env.collision_hist = []

            # Update tensorboard step every episode
            current_episode_ptr = episode

            # Restarting episode - RESTART episode reward and step number
            episode_reward = 0
            step = 1
            
            # RESTART environment and get initial state
            current_state = env.RESTART()
            
            # RESTART flag and start iterating until episode ends
            done = False
            episode_start = time.time()
            
            # the data frame
            # Create the pandas DataFrame 
            df = pd.DataFrame() 
            df['Statistics'] = ["average_reward","min_reward","max_reward","epsilon"]

            # Play for given number of seconds only
            while True:

                # This part stays mostly the same, the change is to query a model for Q values
                if np.random.random() > epsilon:
                    # Get action from Q table [Use the newtwork, use the .predict()]
                    action = np.argmax(agent.predict_qs(current_state))
                else:
                    # Get random action
                    action = np.random.randint(0, 4)
                    # This takes no time, so we add a delay matching 10 FPS (prediction above takes longer)
                    time.sleep(1/10)
                    
                # Take a step within the episode.
                # get all the information the new state and reward acquired and whether or not it is done.
                new_state, reward, done, _ = env.step(action)

                # Total reward per episode.
                episode_reward += reward

                # Update replay memory every step. 
                agent.update_replay_memory((current_state, action, reward, new_state, done))                   
                
                # Update the current state.
                current_state = new_state
                
                # Update the step, register a new step.
                step += 1
                
                # whether or not the episode is done.
                if done:
                    break       
                
            # Destroy all the actors objects at the end of the episode.
            for actor in env.actor_list:
                actor.destroy()      
                
            # Time to get the metadata, the statistics basically.
            # Append episode reward to a list and log stats (every given number of episodes)
            
            ep_rewards.append(episode_reward) # adding the current episode reward
            
            # Registiering the stats every certain amount of `episodes`, so we can keep track of it
            # We are getting the average reward, the minumum and the maxiumum for the last certain amount of `episodes`
            if not episode % GET_REWARD_STATS_EVERY or episode == 1:
                
                # Getting the stats
                average_reward = sum(ep_rewards[-GET_REWARD_STATS_EVERY:])/len(ep_rewards[-GET_REWARD_STATS_EVERY:])
                min_reward = min(ep_rewards[-GET_REWARD_STATS_EVERY:])
                max_reward = max(ep_rewards[-GET_REWARD_STATS_EVERY:])
                
                # Making use of this update.
                agent_stats = [average_reward,min_reward, max_reward, epsilon]
                
                print()
                print("---------------------------")
                print("episode",episode)
                print("average_reward:min_reward:max_reward:epsilon")
                # Print stats.
                print(agent_stats)
                print("---------------------------")
                print()
                full_stat.append(agent_stats)
                
                # Move on to the next.
                

                # Save model, but only when min reward is greater or equal a set value
                if min_reward >= MIN_REWARD:
                    agent.model.save(f'models/{MODEL_NAME}__{max_reward:_>10.2f}max_{average_reward:_>10.2f}avg_{min_reward:_>7.2f}min_{EPISODES:_>4.0f}episodes__{int(time.time())}.model')                 
                    
            # Decay epsilon
            if epsilon > MIN_EPSILON:
                epsilon *= EPSILON_DECAY
                epsilon = max(MIN_EPSILON, epsilon) 

    # Now that all the madness above is over let's end it it              
    # Set termination flag for training thread
    # Kill the thread
    # Save the final model with its timestamp next to it,
    agent.terminate = True
    trainer_thread.join() 
    agent.model.save(f'models/{MODEL_NAME}__{int(time.time())}.model')  
    
    # Time at the end of the script
    very_end = time.time()
    
    # The total time needed for the script to run
    total_time_from_start_to_finish = very_end - very_start
    
    from pprint import pprint
    
    print()
    pprint(full_stat)
    
    print()
    print()
    total_time_from_start_to_finish = total_time_from_start_to_finish /(60*60)
    print("Time for the whole script to run is : ",total_time_from_start_to_finish)
    
    # Transfer the list of lists to pandas dataframe.
    df = pd.DataFrame(full_stat[1:], columns=full_stat[0])
    
    # Save the df
    df.to_csv(f'Carla_Metrics_AccEpsilon__{int(time.time())}.csv')

DISCLAIMER: All the images being used belong to the CARLA simulator.