<a href="https://colab.research.google.com/github/TheDataFestAI/Learning_Resources/blob/main/learning_poc/download_data_from_kaggle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Download Data From Kaggle

Reference:
1. Download data from kaggle into colab:
    1. https://www.kaggle.com/discussions/general/74235

2. Kaggle Api:
    1. https://github.com/Kaggle/kaggle-api


## Step 1: Generate New Api Access Token from Your Kaggle Personal Account

## Step 2: Install the `Python Packages`

In [3]:
! pip install -q kaggle

## step 3: Upload the `kaggle.json` file into colab local directory

In [49]:
"""
https://github.com/googlecolab/colabtools/blob/main/google/colab/files.py
"""
from google.colab import files

kaggle_filename = "kaggle.json"

# used "_upload_file()" to specify the filename after upload into the colab
# used "out" variable to store the return value from "_upload_file()" for not to display the file content
out = files._upload_file(filepath=kaggle_filename)

## Step 4: Create `~/.kaggle/` dir and move `kaggle.json` there

In [59]:
"""
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/

Replicated the above unix commands with python 'os' module
"""
import os
from pathlib import Path


home_dir = os.path.expanduser('~')
kaggle_key_dir = os.path.join(home_dir, ".kaggle")
source_kaggle_json = os.path.abspath(os.path.join(os.getcwd(), kaggle_filename))
dest_kaggle_json = os.path.join(kaggle_key_dir, kaggle_filename)

# # create the `~/.kaggle/` directory
# used "os.path.expanduser('~')" to get home directory path
# if ".kaggle" in os.listdir(path=home_dir):
# didn't use the above if condition as listdir() may consume more processing if list of dirs are more
if os.path.exists(kaggle_key_dir):
    print(f"`{kaggle_key_dir}` dir is already present")
else:
    os.makedirs(name=kaggle_key_dir, exist_ok=True)
    print(f"`{kaggle_key_dir}` dir has been created")

# # move `kaggle.json` into `~/.kaggle/` directory
if os.path.isfile(source_kaggle_json) and not os.path.isfile(dest_kaggle_json):
    Path(source_kaggle_json).rename(dest_kaggle_json)
    print(f"`{source_kaggle_json}` moved to `{dest_kaggle_json}`")
else:
    print(f"{source_kaggle_json} doesn't exists or/and {dest_kaggle_json} is already present")

`/root/.kaggle` dir is already present
/content/kaggle.json doesn't exists or/and /root/.kaggle/kaggle.json is already present


## *(Optional)* Step 5: Check the existence of `~/.kaggle/kaggle.json`

In [58]:
# # get list of directories under "/"
# os.listdir("/")

# # get list of directory under `home` directory
# os.listdir(path=os.path.expanduser('~'))

if os.path.isfile(dest_kaggle_json):
    print(f"{dest_kaggle_json} is present")

/root/.kaggle/kaggle.json is present


## Step 6: Change the permission of `~/.kaggle/kaggle.json`

In [63]:
"""
! chmod 600 ~/.kaggle/kaggle.json
"""
os.chmod(path=dest_kaggle_json, mode=600)

## Get Kaggle dataset lists

In [64]:
"""
! kaggle datasets list
"""

ref                                                    title                                        size  lastUpdated          downloadCount  voteCount  usabilityRating  
-----------------------------------------------------  ------------------------------------------  -----  -------------------  -------------  ---------  ---------------  
thedrcat/daigt-v2-train-dataset                        DAIGT V2 Train Dataset                       29MB  2023-11-16 01:38:36           2160        203  1.0              
thedrcat/daigt-proper-train-dataset                    DAIGT Proper Train Dataset                  119MB  2023-11-05 14:03:25           1950        158  1.0              
muhammadbinimran/housing-price-prediction-data         Housing Price Prediction Data               763KB  2023-11-21 17:56:32           9970        168  1.0              
carlmcbrideellis/llm-7-prompt-training-dataset         LLM: 7 prompt training dataset               41MB  2023-11-15 07:32:56           1774     

## Download the dataset from kaggle

In [None]:
"""
# sample code
"""

# What you have learnt from this notebook:

id | Topic | Description | Comments
:--- | :---: | :--- | :---
1 | **os.path.isfile()** | This only checks the file not any dir | |
2 | **os.path.exists()** | This checks the exitence of file, dir both | |
3 | **os.makedirs()** | works same as mkdir command to create new directory | |
4 | **os.getcwd()**| get current working directory | |
5 | **os.path.join()** | | |
6 | **os.path.expanduser('~')** | get home directory | |
7 | **os.chmod()** | its used to change the file mode like 600 | |
8 | **pathlib.Path** | used to move the file by renaming it | |