## Authenticating with Kaggle using kaggle.json

Navigate to https://www.kaggle.com. Then go to the [Account tab of your user profile](https://www.kaggle.com/me/account) and select Create API Token. This will trigger the download of kaggle.json, a file containing your API credentials.

Then run the cell below to upload kaggle.json to your Colab runtime.

In [None]:
from pathlib import Path
!ls
if not Path('training_v2.csv').exists():
  from google.colab import files

  uploaded = files.upload()

  for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))
    
  # Then move kaggle.json into the folder where the API expects to find it.
  !mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
  !kaggle competitions download -c widsdatathon2020
  !unzip training_v2.csv.zip
  !ls

# Check whether the data was retrieved properly

In [None]:
import pandas as pd
train_df = pd.read_csv('training_v2.csv').set_index('patient_id').drop(columns = ['encounter_id','apache_4a_hospital_death_prob','apache_4a_icu_death_prob'])
train_df.index.name = 'Patient'
train_df.sample(6)

## Split into different datasets for different hospitals

In [None]:
hospital_ids = train_df['hospital_id'].unique()
n_hospitals = len(hospital_ids)

if not Path('client_datasets').exists():
  !mkdir client_datasets
  %cd client_datasets
  print(f"We'll create {n_hospitals} datasets")
  for i,hospital_id in enumerate(hospital_ids):
    train_df[train_df['hospital_id'] == hospital_id].to_csv(f"icu_raw_{i}.csv")
  !ls
  %cd ..