<center> <h1 style="background-color:none; color:tomato; font-size:30px; font-weight:bold; font-family:Adobe Garamond Pro;" >How to Access/Import Kaggle Datasets Using API</h1></center>

Kaggle is one of the largest data science community platforms. It offers access to diverse datasets, competitions, resources, and robust tools for practicing data science and machine learning. You can use Kaggle's datasets by downloading them or through their API.

To load a dataset from Kaggle into a your local machine using the Kaggle API, you can follow these steps:
 1. Create a Kaggle account - if you do not have already
 2. Import necessary libraries 
 3. Generate Kaggle API token
 4. Setup and download the datasets
 5. Read the dataset- if needed for verification.


### `2. Import Necassery Libraries`

In [1]:
import pandas as pd
import os 
import zipfile
import kaggle
import opendatasets as od

### `3. Generate Kaggle API Token`
- The `import kaggle` code will instruct you to generate a Kaggle API token and save it in your home directory.
- Log into your Kaggle account.
- Navigate to your account settings page.
- Click on 'Create a new API token'.
- This action will prompt you to download a .json file to your system. Save this file in your home directory. Alternatively, you can save it elsewhere on your machine and then move it to your home directory. For Mac users, if the Kaggle folder is not visible in your home directory, unhide hidden files using the command: Command + Shift + .

### `4. Download the dataset and unzip it to your folder`
There are two types of data: competition and standalone.

- To download a standalone dataset in the current directory, use the following command:
!kaggle datasets download -d username/dataset_name
- To download a competition dataset in the current directory, first, you need to accept the competition terms. Use this command:
!kaggle competitions download -c competition_name
- To unzip it to a desired path, use this command:
!unzip zipfile name.zip -d relative_path/
- remove the zip file after


In [2]:
# search for the datasets - if you haven't identified already
!kaggle datasets list -s 'breast-cancer' # let select 'breast-cancer-wisconsin-data'

ref                                                            title                                              size  lastUpdated          downloadCount  voteCount  usabilityRating  
-------------------------------------------------------------  -------------------------------------------------  ----  -------------------  -------------  ---------  ---------------  
reihanenamdari/breast-cancer                                   Breast Cancer                                      43KB  2022-08-08 19:25:55          19286        266  1.0              
yasserh/breast-cancer-dataset                                  Breast Cancer Dataset                              49KB  2021-12-29 19:07:20          51453        409  1.0              
imtkaggleteam/breast-cancer                                    Breast Cancer                                      49KB  2023-10-21 19:19:28           1760         87  1.0              
nancyalaswad90/breast-cancer-dataset                           Breast Cance

In [3]:
# download standalone dataset
!kaggle datasets download -d "uciml/breast-cancer-wisconsin-data"

# Download Competetion Dataset
! Kaggle competitions download -c "boston-housing"

# Unzip both file to a create dataset folder in the current director
os.makedirs("data/bc_wisconsin", exist_ok=True)  # create directory for breast-cancer-wisconsin-data
os.makedirs("data/boston", exist_ok=True) # create directory for boston-housing
!unzip breast-cancer-wisconsin-data.zip -d data/bc_wisconsin/
!unzip boston-housing.zip -d data/boston/

#remove the zip file after
os.remove("breast-cancer-wisconsin-data.zip") # this file is in the current directory
os.remove("boston-housing.zip")

Dataset URL: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
License(s): CC-BY-NC-SA-4.0
Downloading breast-cancer-wisconsin-data.zip to /Users/meda_ah/Developer/DMLE/[1] ML Reg_Projects
  0%|                                               | 0.00/48.6k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 48.6k/48.6k [00:00<00:00, 1.58MB/s]
Downloading boston-housing.zip to /Users/meda_ah/Developer/DMLE/[1] ML Reg_Projects
  0%|                                               | 0.00/13.9k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 13.9k/13.9k [00:00<00:00, 12.5MB/s]
Archive:  breast-cancer-wisconsin-data.zip
  inflating: data/bc_wisconsin/data.csv  
Archive:  boston-housing.zip
  inflating: data/boston/submission_example.csv  
  inflating: data/boston/test.csv    
  inflating: data/boston/train.csv   


### `5. Read the files`

In [4]:
# List content of extract file 
print("bc_wisconsin files: ", os.listdir("data/bc_wisconsin"))
print("boston_housing files: ", os.listdir("data/boston"))

bc_wisconsin files:  ['data.csv']
boston_housing files:  ['submission_example.csv', 'test.csv', 'train.csv']


In [5]:
df1 = pd.read_csv("data/bc_wisconsin/data.csv")
df1.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [6]:
df2 =  pd.read_csv("data/boston/train.csv")
df2.head()

Unnamed: 0,ID,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
3,5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
4,7,0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311,15.2,395.6,12.43,22.9
