# Tutorial 6 - Google Colab

[![View notebook on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/avakanski/Fall-2023-Python-Programming-for-Data-Science/blob/main/docs/Lectures/Theme_2-Data_Engineering/Tutorial_6-Google_Colab/Tutorial_6-Google_Colab.ipynb)
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/avakanski/Fall-2023-Python-Programming-for-Data-Science/blob/main/docs/Lectures/Theme_2-Data_Engineering/Tutorial_6-Google_Colab/Tutorial_6-Google_Colab.ipynb)

<a id='top'></a>

Google Colab (or Colaboratary) is a cloud-based platform created by Google that offers an environment for sharing, running, and writing Python code within Google Drive. Colab runs Jupyter notebook files, and it comes with pre-installed popular Data Science and Machine Learning libraries and frameworks, such as TensorFlow, PyTorch, NumPy, pandas, and others.

Google Colab provides CPU, TPU, and GPU support. It enables real-time collaborative editing on a single notebook, much like the collaborative text editing functionality provided by Google Docs.

This is the official [Colab webpage](https://research.google.com/colaboratory/).

**Welcome Page of Colab**

The welcome page offers basic tutorials about working with Jupyter notebooks, data science, and machine learning.

It provides:

- Overview of Colab basic features, such as [Code Cells](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) and [Markdown Cells](https://colab.research.google.com/notebooks/markdown_guide.ipynb)
- Loading data: Drive, Sheets, and Google Cloud Storage [Link](https://colab.research.google.com/notebooks/io.ipynb)
- Data Visualization [Link](https://colab.research.google.com/notebooks/charts.ipynb)
- Machine Learning introduction course [Link](https://developers.google.com/machine-learning/crash-course/)


**Top Menus**

On the top menu section, Colab provides very similar features to the original Jupyter Notebook interface.

- **File**: create/rename/upload/move/save notebook files
- **Edit**: move/copy/past cells, notebook settings
- **Insert**: code/text/section header cells/code snippets/add a form field
- **Runtime**: run/interrupt/restart cells/runtime type change (GPU<->CPU)
- **Tools**: command palette/settings/keyboard shortcuts, etc.
    

<img width="800" src="images/img1.jpg">

## Upload Files and Mount the Google Drive

1) Login into Google Drive with your Gmail account ([Link](https://drive.google.com/)).
2) Create a folder to store your data files.

<img width="250" src="images/img2.jpg">

3) Create a new Jupyter notebook file. The file can be accessed in the "Colab Notebooks" folder in Google Drive.

<img width="250" src="images/img3.jpg">

4) Use the following code to mount your Google Drive in the Jupyter notebook.

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


5) Click `Connect to Google Drive` to permit the notebook to access Google Drive.

<img width="400" src="images/img4.jpg">

6) After you mount the Google Drive, you can load files from the drive. For example, load a csv file. Note that the path to the file needs to start with `drive/My Drive/...`.

In [7]:
import pandas as pd
df_IMDb = pd.read_csv('drive/My Drive/data_file/IMDb_movies.csv')
df_IMDb

Unnamed: 0.1,Unnamed: 0,Movie Name,Year of Release,Watch Time,Movie Rating,Metascore of movie,Gross,Votes,Description
0,0,The Shawshank Redemption,1994,142,9.3,82.0,28.34,2777378,"Over the course of several years, two convicts..."
1,1,The Godfather,1972,175,9.2,100.0,134.97,1933588,"Don Vito Corleone, head of a mafia family, dec..."
2,2,The Dark Knight,2008,152,9.0,84.0,534.86,2754087,When the menace known as the Joker wreaks havo...
3,3,Schindler's List,1993,195,9.0,95.0,96.9,1397886,"In German-occupied Poland during World War II,..."
4,4,12 Angry Men,1957,96,9.0,97.0,4.36,824211,The jury in a New York City murder trial is fr...
...,...,...,...,...,...,...,...,...,...
995,995,Philomena,2013,98,7.6,77.0,37.71,102336,A world-weary political journalist picks up th...
996,996,Un long dimanche de fiançailles,2004,133,7.6,76.0,6.17,75004,Tells the story of a young woman's relentless ...
997,997,Shine,1996,105,7.6,87.0,35.81,55589,"Pianist David Helfgott, driven by his father a..."
998,998,The Invisible Man,1933,71,7.6,87.0,,37822,"A scientist finds a way of becoming invisible,..."


### Enable GPU

Colab has built-in features that allow users to switch between CPU and GPU for working with Data Science/Machine Learning models.

***Method1:***

- Click `Edit → Notebook Settings`.
- Choose an available GPU from the `Hardware Accelerator`, and click `Save`.


<img width="300" src="images/img6.jpg">



<img width="500" src="images/img7.jpg">


***Method2:***

- Click `Runtime → Change runtime type`.
- Choose an available GPU from the `Hardware Accelerator`, and click `Save`.

<img style="float: left; height:400px; width:auto" src="images/img10.jpg">

### Using GPU

Use the following code to load the GPU in your script. When you are connected to GPU, the code will print out "Found GPU at: /device:GPU:0"

In [4]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

### Monitor Your Hardware Resources

To monitor available and used resources, click the `Connect` button in the upper right corner of your screen, and if your GPU is connected, it will show the hardware stats.

<img width="400" src="images/img11.jpg">

### Colab Subscription Plans

Paid Colab plans provide better hardware, i.e. access to more powerful GPU, and more VRAM and RAM. They also have longer timeout sessions. For instance, Colab Pro+ allows you to run the script for up to 24 hours with the web browser closed.

The free Colab version offers 16GB of GPU RAM, while the paid versions can have up to 48GB of RAM. Larger VRAM and RAM may be required to train some large language models (LLM).

For the free version, only T4 GPU are available, but sometimes there would be no GPU available. The free version has at most 12 hours sessions before timeout. In practice, for the free version, your session could be timed out and your script could be interrupted at any time. If you reconnect within the next 1-2 minutes, the session can be resumed, and otherwise, you will have to re-run your script.

<img width="1000" src="images/img9.jpg">

[BACK TO TOP](#top)