# Advanced Programme in Deep Learning (Foundations and Applications)
## A Program by IISc and TalentSprint

### Assignment 31: Deep Q Learning

## Learning Objectives

At the end of the experiment, you will be able to :

* understand Q-learning
* differentiate between Q-learning and Deep Q-learning
* implement Deep Q-learning to solve Atari Breakout environment

## Information

**Q-Learning**

The agent will perform the sequence of actions that will eventually generate the maximum total reward. This total reward is also called the **Q-value** and we will formalize our strategy as:

$$Q(s, a) = r(s, a) + \gamma\ maxQ(s', a)$$

The above equation states that the Q-value yielded from being at state $s$ and performing action $a$ is the immediate reward $r(s,a)$ plus the highest Q-value possible from the next state $s’$. Gamma here is the discount factor which controls the contribution of rewards further in the future.

$Q(s’,a)$ depends on $Q(s”,a)$ which will then have a coefficient of gamma squared. So, the Q-value depends on Q-values of future states as shown here:

$$Q(s, a) \rightarrow \gamma\ Q(s', a) + \gamma^2\ Q(s'', a)\ ...\ ...\ ...\ \gamma^n\ Q(s''^{...n}, a)$$

Adjusting the value of gamma will diminish or increase the contribution of future rewards.

Since this is a recursive equation, we can start with making arbitrary assumptions for all q-values. With experience, it will converge to the optimal policy.

To know more about Q-Learning, click [here](https://github.com/rishal-hurbans/Grokking-Artificial-Intelligence-Algorithms/tree/master/ch10-reinforcement_learning).




**Approximate Q-Learning and Deep Q-Learning**

The main problem with Q-Learning is that it does not scale well to large (or even medium) Markov Decision Processes with many states and actions, and it is hard to keep track of an estimate for every single Q-Value.
<br><br>
<center>
<img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2019/04/Screenshot-2019-04-16-at-5.46.01-PM-670x440.png" width=650px />
</center>
<br><br>

The solution is to find a function $Q_θ(s, a)$ that approximates the Q-Value of any state-action pair (s, a) using a manageable number of parameters (given by the parameter vector θ). This is called **Approximate Q-Learning**.

$$Q_{target}(s, a) = r + \gamma\ maxQ_{\theta}(s', a')$$

For years it was recommended to use linear combinations of handcrafted features extracted from the state to estimate Q-Values, but in 2013, DeepMind showed that deep neural networks can work much better, especially for complex problems, and it does not require any feature engineering. A DNN used to estimate Q-Values is called a Deep Q-Network (DQN), and using a DQN for Approximate Q-Learning is called **Deep Q-Learning**.

In deep Q-learning, we use a neural network to approximate the Q-value function. The state is given as the input and the Q-value of all possible actions is generated as the output.

### Implementing Deep Q-Learning for Atari Breakout

Let's implement Deep Q-Learning on the Atari Breakout (`BreakoutNoFrameskip-v4`) environment.

**Atari Breakout**

<center>
<img src="https://www.gannett-cdn.com/media/USATODAY/USATODAY/2013/05/14/atari-breakout-16_9.jpg?width=1023&height=578&fit=crop&format=pjpg&auto=webp" width=400px/>
</center>
<br><br>

In this environment, a board moves along the bottom of the screen returning a ball that will destroy blocks at the top of the screen. The aim of the game is to remove all blocks and breakout of the level. The agent must learn to control the board by moving left and right, returning the ball and removing all the blocks without the ball passing the board.

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2239822" #@param {type:"string"}

In [None]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "9167668365" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython
import warnings
warnings.filterwarnings("ignore")

ipython = get_ipython()

notebook= "M3_AST_31_Deep_Q_Learning_C" #name of the notebook

def setup():
    ipython.magic("sx pip3 install PyVirtualDisplay")
    ipython.magic("sx sudo apt-get install xvfb")
    ipython.magic("sx sudo apt-get install python-opengl")
    ipython.magic("sx sudo apt-get install ffmpeg")
    ipython.magic("sx pip install gym-notebook-wrapper")
    ipython.magic("sx pip install gym[atari]")
    ipython.magic("sx pip install gym[accept-rom-license]")
    ipython.magic("sx pip install pyglet")
    ipython.magic("sx sudo apt install freeglut3-dev freeglut3 libgl1-mesa-dev libglu1-mesa-dev libxext-dev libxt-dev")
    ipython.magic("sx sudo apt install python3-opengl libgl1-mesa-glx libglu1-mesa")


    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer1() and getAnswer2() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer1" : Answer1, "answer2" : Answer2, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://dlfa-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer1():
  try:
    if not Answer1:
      raise NameError
    else:
      return Answer1
  except NameError:
    print ("Please answer Question 1")
    return None

def getAnswer2():
  try:
    if not Answer2:
      raise NameError
    else:
      return Answer2
  except NameError:
    print ("Please answer Question 2")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Import required packages

In [None]:
import numpy as np
import gym
import gnwrapper
import glob
import io
import base64
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import urllib.request

import warnings
warnings.filterwarnings("ignore")

In [None]:
# List of available environments
from gym import envs
print(envs.registry.all())

### Configure parameters

We will be using an epsilon-greedy algorithm for choosing the best action, where there is an epsilon chance of sampling a random action from the action space. Instead of using epsilon decay, we will be using linear annealing to decrease epsilon from 1 to 0.1 over 1 million frames by Deepmind’s specification.

In [None]:
seed = 42

# Discount factor for past rewards
gamma = 0.99

# Epsilon greedy parameter
epsilon = 1.0

# Minimum epsilon greedy parameter
epsilon_min = 0.1

# Maximum epsilon greedy parameter
epsilon_max = 1.0

# Rate at which to reduce chance of random action being taken
epsilon_interval = (epsilon_max - epsilon_min)

# Size of batch taken from replay buffer
batch_size = 32

# Number of frames to run
max_steps_per_episode = 10000

Next, we define functions used to show the video by adding it to the CoLab notebook

In [None]:
display = Display(visible=0, size=(1400, 900))
display.start()

""" Utility functions to enable video recording of gym environment and displaying it.
To enable video, we just do "env = wrap_env(env) """

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else:
    print("Could not find video")


def wrap_env(env):
    try:
        env = gnwrapper.Monitor(env, './video', "recording")
    except:
        env = gnwrapper.Monitor(env, './video', "recording")
    return env

### Instantiate the environment




In [None]:
# Create Breakout environment
# Initialize the environment ‘Breakout-v5’.
env = wrap_env(gym.make("ALE/Breakout-v5"))
env.seed(seed)

Exception ignored in: <function RecordVideo.__del__ at 0x793b480d5f30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gym/wrappers/record_video.py", line 219, in __del__
    self.close_video_recorder()
  File "/usr/local/lib/python3.10/dist-packages/gym/wrappers/record_video.py", line 186, in close_video_recorder
    if self.recording:
  File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 325, in __getattr__
    return getattr(self.env, name)
  File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 325, in __getattr__
    return getattr(self.env, name)
  File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 325, in __getattr__
    return getattr(self.env, name)
  [Previous line repeated 946 more times]
  File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in __getattr__
    if name.startswith("_"):
RecursionError: maximum recursion depth exceeded while calling a Python object


(3444837047, 2669555309)

You can ignore the above runtime warinings (if any)

 It is possible to specify various flavors of the environment via the keyword arguments` difficulty `and `mode`. A flavor is a combination of a game mode and a difficulty setting.


 You may use the suffix "-ram" to switch to the RAM observation space. In v0 and v4, the suffixes "Deterministic" and "NoFrameskip" are available. These are no longer supported in v5. In order to obtain equivalent behavior, pass keyword arguments to gym.make as outlined in the general article on Atari environments. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via gym.make("Breakout-v0").

**Version History**

*   v5: Stickiness was added back and stochastic frameskipping was removed. The entire action space is used by default. The environments are now in the “ALE” namespace.

*   v4: Stickiness of actions was removed

*   v0: Initial versions release (1.0.0)


Here is the [Atari Breakout](https://www.gymlibrary.dev/environments/atari/breakout/) reference document




The format that gym takes is a concatenated string of `[‘Game][‘NoFrameskip’, ‘Deterministic’, None][‘-v0’, ‘-v4’]`. Here’s a quick explanation of each term:


*   “NoFrameskip”: Each step of the environment is one frame

*   “Deterministic”: Each step executes the same action for k frames and returns the kth frame. k = 4.

*   None: Same as “Deterministic” but k is sampled from [2, 5].


In [None]:
print('State shape: ', env.observation_space.shape)
print('Number of actions: ', env.action_space.n)

State shape:  (210, 160, 3)
Number of actions:  4


### Create the Deep Q-Network Model

Deep Q network learns an approximation of the Q-table, which is a mapping between the states and actions that an agent will take. For every state in Breakout environment, we'll have four actions that can be taken:

* **0:** do nothing
* **1:** fire ball to start game
* **2:** move right
* **3:** move left

The environment provides the state, and the action is chosen by selecting the larger of the four Q-values predicted in the output layer.

In [None]:
# Resets the environment to an initial state and returns an initial observation.
# Shape of observation or state
env.reset().shape

(210, 160, 3)

Refer to the following [Deepmind paper](https://arxiv.org/pdf/1312.5602v1.pdf) for playing Atari with Deep Reinforcement Learning

In [None]:
# Write a funtion for model creation
num_actions = 4
def create_q_model():

    # Network defined by the Deepmind paper
    inputs = layers.Input(shape=(210, 160, 3))

    # Convolutions on the frames on the screen
    layer1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)
    layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(layer1)
    layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(layer2)

    layer4 = layers.Flatten()(layer3)

    layer5 = layers.Dense(512, activation="relu")(layer4)
    action = layers.Dense(num_actions, activation="linear")(layer5)

    return keras.Model(inputs=inputs, outputs=action)

The first model makes the predictions for Q-values which are used to make an action.

In [None]:
# Create model
model = create_q_model()
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 210, 160, 3)]     0         
                                                                 
 conv2d (Conv2D)             (None, 51, 39, 32)        6176      
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 18, 64)        32832     
                                                                 
 conv2d_2 (Conv2D)           (None, 22, 16, 64)        36928     
                                                                 
 flatten (Flatten)           (None, 22528)             0         
                                                                 
 dense (Dense)               (None, 512)               11534848  
                                                                 
 dense_1 (Dense)             (None, 4)                 2052  

Now build a target model for the prediction of future rewards. Since the same network is calculating the predicted value and the target value, there could be a lot of divergence between these two. So, instead of using one neural network for learning, we can use two.

The weights of a target model get updated every 10000 steps thus when the loss between the Q-values is calculated the target Q-value is stable.

In [None]:
# Create target model
model_target = create_q_model()

### Train the Model

The following pseudo-algorithm implements deep Q-learning with experience replay.


<br><br>

<center>
<img src="https://cdn.iisc.talentsprint.com/DLFA/Experiment_related_data/deep_Q_learning.png" width=480px, height=480px/>
</center>
<br><br>




**Note:** The below code cell might take some time to run, suggesting to use GPU. Refer to the following [link](https://towardsdatascience.com/reinforcement-learning-explained-visually-part-5-deep-q-networks-step-by-step-5a5317197f4b) for the more details of Training the Deep-Q Networks

In [None]:
from pyglet.gl import *

In [None]:
# In the Deepmind paper they use RMSProp however then Adam optimizer
# improves training time
optimizer = keras.optimizers.Adam(learning_rate=0.00025, clipnorm=1.0)

# Experience replay buffers
# Reinforcement learning algorithms use replay buffers to store trajectories of experience
# when executing a policy in an environment. During training, replay buffers are queried for a
# subset of the trajectories (either a sequential subset or a sample) to "replay" the agent's experience
action_history = []
state_history = []
state_next_history = []
rewards_history = []
done_history = []
episode_reward_history = []
running_reward = 0
episode_count = 0
frame_count = 0

# Number of frames to take random action and observe output
epsilon_random_frames = 50000

# Number of frames for exploration
epsilon_greedy_frames = 1000000.0

# Maximum replay length
max_memory_length = 100000

# Train the model after 4 actions
update_after_actions = 4

# How often to update the target network
update_target_network = 10000

# Using huber loss for stability and also to avoid exploding gradients
# Huber loss is a combination of linear as well as quadratic scoring methods.
# It has an additional hyperparameter delta (δ). Loss is linear for values above delta and quadratic below delta.
# Compared with MSE, Huber Loss is less sensitive to outliers as if the loss is too much
# it changes quadratic equation to linear and hence is a combination of both MSE and MAE.
loss_function = keras.losses.Huber()

while True:  # Run until solved
    state = np.array(env.reset())
    episode_reward = 0

    # episodes - This indicates how many games we want the agent to play in order to train itself
    for timestep in range(1, max_steps_per_episode):
        frame_count += 1

        # Use epsilon-greedy for exploration to select an action
        if frame_count < epsilon_random_frames or epsilon > np.random.rand(1)[0]:
            # With the probability epsilon, we take random action
            action = np.random.choice(num_actions)
        else:
            # Predict action Q-values
            # From environment state
            state_tensor = tf.convert_to_tensor(state)
            state_tensor = tf.expand_dims(state_tensor, 0)
            action_probs = model(state_tensor, training=False)
            # Take best action
            # with probability 1-epsilon, we select an action that has a maximum Q-value
            action = tf.argmax(action_probs[0]).numpy()

        # Decay probability of taking random action
        # Hence, a decaying epsilon ensures that our agent does not rely upon the
        # random predictions at the initial training epochs, only to later on exploit
        # its own predictions more aggressively as the Q-function converges to more consistent predictions.
        epsilon -= epsilon_interval / epsilon_greedy_frames
        epsilon = max(epsilon, epsilon_min)

        # Is used to display the environment image
        env.render()

        # Apply the sampled action in our environment
        # env.step - executes the given action and returns four values
        state_next, reward, done, _ = env.step(action)
        state_next = np.array(state_next)

        episode_reward += reward

        # Save actions and states in replay buffer
        action_history.append(action)
        state_history.append(state)
        state_next_history.append(state_next)
        done_history.append(done)
        rewards_history.append(reward)
        state = state_next

        # Update every fourth frame and once batch size is over 32
        if frame_count % update_after_actions == 0 and len(done_history) > batch_size:

            # Get indices of samples for replay buffers
            indices = np.random.choice(range(len(done_history)), size=batch_size)

            # Using list comprehension to sample from replay buffer
            state_sample = np.array([state_history[i] for i in indices])
            state_next_sample = np.array([state_next_history[i] for i in indices])
            rewards_sample = [rewards_history[i] for i in indices]
            action_sample = [action_history[i] for i in indices]
            done_sample = tf.convert_to_tensor([float(done_history[i]) for i in indices])

            # Build the updated Q-values for the sampled future states
            # Use the target model for stability
            # The Target network takes the next state from each data sample and predicts
            # the best (max predicted Q value) out of all actions that can be taken from that state. This is the ‘Target Q Value’
            future_rewards = model_target.predict(state_next_sample)

            # Q value = reward + discount factor * expected future reward
            # Compute Q value
            updated_q_values = rewards_sample + gamma * tf.reduce_max(future_rewards, axis=1)

            # If final frame set the last value to -1
            updated_q_values = updated_q_values * (1 - done_sample) - done_sample

            # Create a mask so we only calculate loss on the updated Q-values
            masks = tf.one_hot(action_sample, num_actions)

            with tf.GradientTape() as tape:

                # Train the model on the states and updated Q-values
                # The Q network takes the current state and action from each data sample
                # and predicts the Q value for that particular action. This is the ‘Predicted Q Value’.
                q_values = model(state_sample)

                # Apply the masks to the Q-values to get the Q-value for action taken
                q_action = tf.reduce_sum(tf.multiply(q_values, masks), axis=1)

                # Calculate loss between new Q-value and old Q-value
                loss = loss_function(updated_q_values, q_action)

            # Backpropagation
            # after we compute the loss using the given loss function, and we use the tape to compute
            # the gradient of the loss with regard to the model’s trainable variables. Again,
            # these gradients will be tweaked later, before we apply them, depending on how good or bad the action turned out to be.
            grads = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(grads, model.trainable_variables))

        if frame_count % update_target_network == 0:
            # update the target network with new weights
            model_target.set_weights(model.get_weights())
            # Log details
            template = "running reward: {:.2f} at episode {}, frame count {}"
            print(template.format(running_reward, episode_count, frame_count))

        # Limit the state and reward history
        if len(rewards_history) > max_memory_length:
            del rewards_history[:1]
            del state_history[:1]
            del state_next_history[:1]
            del action_history[:1]
            del done_history[:1]

        if done:
            break

    # Update running reward to check condition for solving
    episode_reward_history.append(episode_reward)
    if len(episode_reward_history) > 100:
        del episode_reward_history[:1]
    running_reward = np.mean(episode_reward_history)

    episode_count += 1

    # Condition to consider the task solved
    # Note that this execution may take more than 30 minutes
    if running_reward > 2.4 or episode_count >100:
        print("Stopped at episode {}!".format(episode_count))
        break





running reward: 1.26 at episode 54, frame count 10000
Stopped at episode 101!


**Here**, one of the stopping criteria is if `running_reward > 2.4`, by increasing this value, learning can be improved. Here, we have chosen the `episode_count > 100` by increasing this value the training time also increases.




**Note:** The Deepmind paper trained for "a total of 50 milion frames( that is 38 days of game experience in total)". However it will give good results at around 10 million frames which are processed in less than 24 hours on a modern machine.


### Visualizations

In [None]:
# Visualize training
env.close()
show_video()

We can also get a gist of how the learning processes through the figures given below.

Before any training:

<img src="https://i.imgur.com/rRxXF4H.gif" />

In early stages of training:

<img src="https://i.imgur.com/X8ghdpL.gif" />

In later stages of training:

<img src="https://i.imgur.com/Z1K6qBQ.gif" />

### Please answer the questions below to complete the experiment:




Consider the statements given below and answer Q1.

A. A Q-table is a simple data structure that we use to keep track of the states, actions, and their expected rewards. More specifically, the Q-table maps a state-action pair to a Q-value (the estimated optimal future value) which the agent will learn.

B. A Q-table is a data structure used to calculate the minimum expected future rewards for the action at each state.

C. In Deep Q-Learning, the regular Q-table can be replaced with either neural network or MLP or CNN.

In [None]:
#@title Q.1. Which of the above statements is/are False?
Answer1 = "Both B & C" #@param ["","Only A","Both A & B","Only B", "Only C", "Both B & C"]


In [None]:
#@title Q.2. In DQNs (Deep Q-Networks), the Experience buffer(replay buffer) is used to store past experiences in memory and replay some of them by randomly sampling from a uniform distribution. Randomly sampled training data from the replay buffer provides an identical and independent distributions for training the DQN. Thus, it helps avoid possible sampling bias to improve learning performance.
Answer2 = "TRUE" #@param ["","TRUE", "FALSE"]


In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "nil" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 2592
Date of submission:  18 Feb 2024
Time of submission:  11:59:05
View your submissions: https://dlfa-iisc.talentsprint.com/notebook_submissions
