<a href="https://colab.research.google.com/github/nateraw/huggingface-datasets-converter/blob/main/huggingface_datasets_converter_kaggle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Hugging Face Datasets Converter (Kaggle)

This notebook allows you to convert a Kaggle dataset to a Hugging Face dataset. 

Follow the 4 simple steps below to take an existing dataset on Kaggle and convert it to a Hugging Face dataset, which can then be loaded with the `datasets` library.

# Step 1 - Setup

Run the cell below to install required dependencies.

In [None]:
%%capture
! git clone https://github.com/nateraw/huggingface-datasets-converter.git
%cd /content/huggingface-datasets-converter
! pip install -r requirements.txt
! git config --global credential.helper store
%cd /content/huggingface-datasets-converter

# Step 2 - Authenticate with Kaggle

Navigate to https://www.kaggle.com. Then go to the [Account tab of your user profile](https://www.kaggle.com/me/account) and select Create API Token. This will trigger the download of `kaggle.json`, a file containing your API credentials.

Then run the cell below to upload kaggle.json to your Colab runtime.

⚠️ It should be named exactly `kaggle.json`.

In [None]:
from google.colab import files

uploaded = files.upload()

for name in uploaded.keys():
  print(f'User uploaded file "{name}" with length {len(uploaded[name])} bytes')

# Then move kaggle.json into the folder where the API expects to find t.
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

Alternatively, you can upload the `kaggle.json` file manually to your working directory (probably the "content" folder) or using your Google Drive account ([see this](https://colab.research.google.com/notebooks/io.ipynb) for examples). Then run: 

```
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
```

# Step 3 - Authenticate with Hugging Face 🤗

You'll need to authenticate with your Hugging Face account, so make sure to [sign up](https://huggingface.co/join) if you haven't already. 

Then, run the cell below and provide a token that has ***write access***



In [None]:
from huggingface_hub import notebook_login

notebook_login()

# Step 4 - Convert From Kaggle

Below, input the:

- Kaggle ID of the dataset you'd like to upload (ex. `kaggleuser/dataset-name`)
- Repo ID of the dataset repo you'd like to upload to (ex. `huggingface-user/dataset-name`).

You can find the Kaggle ID in the dataset URL. For example, for the dataset in "https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries", the ID is "ruchi798/data-science-job-salaries".

In [None]:
#@title Kaggle Converter { vertical-output: true }

%cd /content/huggingface-datasets-converter

from huggingface_datasets_converter import kaggle_to_hf

kaggle_id = "deepcontractor/monkeypox-dataset-daily-updated" #@param {type:"string"}
repo_id = "nateraw/monkeypox" #@param {type:"string"}

kaggle_to_hf(
    kaggle_id=kaggle_id,
    repo_id=repo_id,
    unzip=True,
    path_in_repo=None
)
