# Tutorial 5 - Google Colab

[![View notebook on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/avakanski/Fall-2025-Applied-Data-Science-with-Python/blob/main/docs/Lectures/Tutorials/Tutorial_5-Google_Colab/Tutorial_5-Google_Colab.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/avakanski/Fall-2025-Applied-Data-Science-with-Python/blob/main/docs/Lectures/Tutorials/Tutorial_5-Google_Colab/Tutorial_5-Google_Colab.ipynb)

<a id='top'></a>

Google Colab (or Colaboratary) is a cloud-based platform created by Google that offers an environment for sharing, running, and writing Python code within Google Drive. Colab runs Jupyter notebook files, and it comes with pre-installed popular Data Science and Machine Learning libraries and frameworks, such as TensorFlow, PyTorch, NumPy, pandas, and others.

Google Colab provides CPU, TPU, and GPU support. It enables real-time collaborative editing on a single notebook, much like the collaborative text editing functionality provided by Google Docs.

Colab has many Machine Learning and Data Science libraries preinstalled. Hence, we don't need to install common libraries like `scikit-learn`, `pandas`, `numpy`, `keras`, `pytorch`, etc. We can directly import these libraries. 

If we need to install additional libraries that are not part of Colab, we can use `!pip install` as in the following example.

    !pip install -q matplotlib-venn

We can also use `!` to run shell commands in Colab notebooks.

    !ls

## Top Menus

On the top menu section, Colab provides very similar features to the original Jupyter Notebook interface.

- **File**: create new notebook, open/rename/save/download existing notebooks
- **Edit**: copy/paste cells, notebook settings
- **Insert**: code/text/section header cells, code snippets, add a form field
- **Runtime**: run cells, interrupt runtime, restart runtime, **change runtime type (CPU, GPU, TPU)**
- **Tools**: command palette, settings, keyboard shortcuts, etc.

## Upload Files and Mount the Google Drive

If you need to process your own files in a notebook (e.g., to train a model on your dataset), you will need to first upload the local files to Google Drive, and mount the Google Drive.

1) Login into Google Drive with your Gmail account ([Link](https://drive.google.com/)).
2) Create a folder to store your data files, and upload the data files to the folder (e.g., drag and drop from the files from a local directory). 

<img width="250" src="images/img2.jpg">

3) Use the following code to mount your Google Drive in the Jupyter notebook.

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


4) Click `Connect to Google Drive` to permit the notebook to access Google Drive.

<img width="400" src="images/img4.jpg">

5) After you mount the Google Drive, you can load files from the drive. For example, load a csv file. Note that the path to the file needs to start with `drive/My Drive/...`.

In [7]:
import pandas as pd
df_IMDb = pd.read_csv('drive/My Drive/data_file/IMDb_movies.csv')
df_IMDb

Unnamed: 0.1,Unnamed: 0,Movie Name,Year of Release,Watch Time,Movie Rating,Metascore of movie,Gross,Votes,Description
0,0,The Shawshank Redemption,1994,142,9.3,82.0,28.34,2777378,"Over the course of several years, two convicts..."
1,1,The Godfather,1972,175,9.2,100.0,134.97,1933588,"Don Vito Corleone, head of a mafia family, dec..."
2,2,The Dark Knight,2008,152,9.0,84.0,534.86,2754087,When the menace known as the Joker wreaks havo...
3,3,Schindler's List,1993,195,9.0,95.0,96.9,1397886,"In German-occupied Poland during World War II,..."
4,4,12 Angry Men,1957,96,9.0,97.0,4.36,824211,The jury in a New York City murder trial is fr...
...,...,...,...,...,...,...,...,...,...
995,995,Philomena,2013,98,7.6,77.0,37.71,102336,A world-weary political journalist picks up th...
996,996,Un long dimanche de fiançailles,2004,133,7.6,76.0,6.17,75004,Tells the story of a young woman's relentless ...
997,997,Shine,1996,105,7.6,87.0,35.81,55589,"Pianist David Helfgott, driven by his father a..."
998,998,The Invisible Man,1933,71,7.6,87.0,,37822,"A scientist finds a way of becoming invisible,..."


## Enable GPU

When you run a notebook, the panel in the upper right corner of the screen indicates whether the notebook is connected to a GPU or CPU. 

For instance, the notebook in the following figure is connected to a Tesla T4 GPU. 

<img width="800" src="images/img12.png">

To change the runtime, click on the arrow in the upper right corner of your screen, as shown in the next figure, and from the menu select `Change runtime type`.

<img width="400" src="images/img13.png">

This will open the window shown in the following figure. 

The `Runtime type` arrow allows us to select between `Python3` and `R` programming languages. We can leave it at `Python3`.

Importantly, the `Hardware accelerator` field allows to select the hardware for the runtime. The options include `CPU`, three `GPU` options, and `TPU`. Note that these options are based on my subscription to Colab Pro. If you are using the free Colab version, you may see only `T4 GPU` available.

Also, for the  Colab Pro subscription, there is a `High-RAM` button, which allows users to allocate additional RAM memory to the notebook. This feature can be helpful when working with large datasets, that exceed the available RAM memory. The standard RAM memory allocation in Colab is 12 GB, and the High-RAM option allocates 25 GB.

After you select the Hardware Accelerator, click `Save`.

<img width="400" src="images/img14.png">

There are two other methods to change the runtime type in Colab.

***Method 1:***

- Click `Runtime → Change runtime type`.

<img style="float: left; height:400px; width:auto" src="images/img10.jpg">

***Method 2:***

- Click `Edit → Notebook Settings`.

<img width="300" src="images/img6.jpg">

### Check GPU

When connected to a GPU, use the following code to check the details about the GPU. 

    !nvidia-smi
    
<img width="600" src="images/img15.png">   

Also, the following code can be used to print whether the notebook is connected to GPU. If connected, it will print out "Found GPU at: /device:GPU:0".

In [4]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

## Left Pane

In the left pane, Colab provides several useful tools for managing the notebook environment.

- **Table of contents**: display the outline of the notebook
- **Find and replace**: search for text within the notebook
- **Variables**: display all the variables that are currently defined in the notebook
- **Secrets**: for saving and managing API keys and other sensitive information
- **Files**: display files available in your Colab environment, including files in your Google Drive

<img width="400" src="images/img18.png">

## Colab Subscription Plans

Paid Colab subsription plans provide better hardware, i.e., access to more powerful GPU, and more VRAM and RAM. They also have longer timeout sessions. For instance, Colab Pro+ allows you to run the script for up to 24 hours with the web browser closed.

With the free version (Pay As You Go), only T4 GPU are available, but sometimes there would be no GPU available. The free version has at most 12 hours sessions before timeout. In practice, for the free version, your session could be timed out and your script could be interrupted at any time. If you reconnect within the next 1-2 minutes, the session can be resumed, and otherwise, you will have to re-run your script.

For this course, it is recommended to subscribe to Colab Pro. Note that Colab Pro provides certain number of computational units, which allows to use more powerful GPUs (like A100) and High-RAM. For the purposes of this course, T4 GPUs are sufficient. Therefore, you don't need to purchase any additional computational units, just purchase the subscription of $9.99 per month and use T4 GPU with standard RAM. 

<img width="1000" src="images/img9.jpg">

### Monitor Your Hardware Resources

To monitor available and used resources, click on the arrow in the upper right corner of your screen, and select `View resources`. This will show resources stats, as in the following figure, including the current subsription, available computational units, used VRAM and RAM, and similar.

<img width="400" src="images/img16.png">

<img width="400" src="images/img17.png">

## Welcome Page of Colab

The [welcome page of Colab](https://research.google.com/colaboratory/) offers basic tutorials about working with Jupyter notebooks, data science, and machine learning. 

It includes:

- Overview of Colab basic features, such as [Code Cells](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) and [Markdown Cells](https://colab.research.google.com/notebooks/markdown_guide.ipynb)
- Loading data: Drive, Sheets, and Google Cloud Storage [Link](https://colab.research.google.com/notebooks/io.ipynb)
- Data Visualization [Link](https://colab.research.google.com/notebooks/charts.ipynb)
- Machine Learning Introduction Course [Link](https://developers.google.com/machine-learning/crash-course/)

<img width="600" src="images/img1.jpg">

## References

1. "Colab 101: Your Ultimate Beginner's Guide!" by Sam Witteveen, available at: [https://www.youtube.com/watch?v=Ii6gs9zADEA](https://www.youtube.com/watch?v=Ii6gs9zADEA).

[BACK TO TOP](#top)