# Using Kaggle datasets in Colab

**Neural Networks for Machine Learning Applications**<br>
Rabindra Manandhar<br>
[Information Technology, Bachelor's Degree](https://www.metropolia.fi/en/academics/bachelors-degrees/information-technology)<br>
[Metropolia University of Applied Sciences](https://www.metropolia.fi/en)


## Instructions

These instructions were adapted from: https://www.kaggle.com/general/74235

Here is a quick summary:
1. First you need to do some extra installations, so that we can directly download dataset packages from Kaggle.
2. Then you need to get your Kaggle API token and upload it to Notebook's background Linux server in folder /root/.kaggle.
3. After that you mount your Google Drive, so that your Notebook have access for it.
4. Then you are ready to download the dataset and extract it using unzip.
5. Finally, you have the dataset stored into your Google Drive and next time, when you need to access it, you only mount your Drive and use the data in Colab.

In [None]:
# Install kaggle-package in quiet mode
%pip install -q kaggle
# Import some functions and libraries
from google.colab import files, drive
import os, shutil

Before running the following code you need to [get your Kaggle API Key](https://christianjmills.com/posts/kaggle-obtain-api-key-tutorial). Here are quick instructions how:
1. Open www.kaggle.com.
1. Login.
2. Go to your Kaggle account (click your Avatar or portrait image on the upper right corner).
3. Select Settings and Scroll to API section.
4. Click **Expire API Token** to remove previous tokens.
5. Click on **Create New API Token** to create a new API token
6. Download and save the kaggle.json file on your laptop.

The next code opens a files upload tool. Use it to choose your kaggle.json file from your laptop and upload it here. You need to do this only once.

In [None]:
# Choose the kaggle.json file that you downloaded on your Laptop/Desktop
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"rabindramanandhar","key":"eb29bef862d8f240cef8c41024084697"}'}

Next we copy the kaggle.json file to /root/.kaggle folder and restrict the permissions to read, write or execute the file. We use Unix shell commands to do that. The following code
- Copies the uploaded kaggle.json file to /root/.kaggle directory (~ = /root), and
- Changes the permissions for the copied file.

The Linux shell command kaggle (used later on this Notebook) will require that the kaggle.json is in that particular directory and it has reading permissions for it.

In [None]:
%%!
mkdir ~/.kaggle
cp ./kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json
ls -l ~/.kaggle

['total 4', '-rw------- 1 root root 73 Jan 26 04:57 kaggle.json']

Next mount your Google Drive.
1. In Colab, from the left toolbar, select Files.
2. In Files sidebar area, select Mount Drive.
3. Give permissions to this Notebook to access your Google Drive.
4. Now you should see a new folder 'drive' on the the files list.

In [None]:
## Let's check that we can read the contents of the drive
os.listdir('/content/drive/MyDrive')

['software_flow',
 'Untitled Diagram.drawio',
 'Assignment11_report.docx',
 'BiometricLoginSystem.drawio',
 'software',
 'Final_Report.docx',
 'NeuralNetworksForMachineLearningApplicatoins',
 'Colab Notebooks',
 'input']

Now we are ready to download a dataset from Kaggle and save it to MyDrive. For that, you need to find the API command to download the kaggle dataset. That can be found from every dataset in Kaggle. For example, if you open [Heart disease health indicator dataset](https://www.kaggle.com/datasets/alexteboul/heart-disease-health-indicators-dataset) and select the tree dots (...), you can select 'Copy API command'. If you paste that to your Notebook you see:

> kaggle datasets download -d alexteboul/heart-disease-health-indicators-dataset

The following Linux shell commands will do the rest:
- Download Case 1 dataset as a zip-file
- Unzip the file to directory /drive/MyDrive/input
- Remove the zip-file, as we don't need it anymore

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%%!
kaggle datasets download -d alexteboul/heart-disease-health-indicators-dataset
unzip -o heart-disease-health-indicators-dataset.zip -d ./drive/MyDrive/input
rm heart-disease-health-indicators-dataset.zip

['Downloading heart-disease-health-indicators-dataset.zip to /content',
 '',
 '  0% 0.00/2.66M [00:00<?, ?B/s]',
 '',
 '100% 2.66M/2.66M [00:00<00:00, 151MB/s]',
 'Archive:  heart-disease-health-indicators-dataset.zip',
 '  inflating: ./drive/MyDrive/input/heart_disease_health_indicators_BRFSS2015.csv  ']

Now you can check and see that you have completed the download and the dataset file (or files) can be found in your Drive.

In [None]:
os.listdir('./drive/MyDrive/input')

['heart_disease_health_indicators_BRFSS2015.csv']

Later, when you want to use this dataset in your Colab Notebooks, you need not to repeat all these steps, but you only need to:
1. Mount your Drive.
2. Use the following code to see what is in your Drive

```
## First, mount your Google Drive and then you can list the contents
import os
os.listdir('./drive/MyDrive/input')
```