# Data collection

## Objectives

* Fetch data from Kaggle
* Save as raw data
* Preparation for further processing
* Split data into train, validate and test sets

## Inputs

* Kaggle JSON file containing the authentication token

## Outputs

* Generate Dataset: inputs/datasets/fracture_dataset

---

# Import packages

In [8]:
import numpy
import os

In [1]:
%pip install -r /workspace/Bone-Fracture-Detection/requirements.txt

Note: you may need to restart the kernel to use updated packages.


---

# Change working directory

Change the working directory from its current folder to its parent folder.

In [2]:
current_dir = os.getcwd()
current_dir

'/workspace/Bone-Fracture-Detection/jupyter_notebooks'

Make the parent of the current directory the new current directory.

In [3]:
os.chdir(os.path.dirname(current_dir))

Confirm the new current directory.

In [4]:
current_dir = os.getcwd()
current_dir

'/workspace/Bone-Fracture-Detection'

---

# Kaggle

Install Kaggle

In [6]:
!pip install kaggle

Collecting kaggle
  Downloading kaggle-1.6.14.tar.gz (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.1/82.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting tqdm (from kaggle)
  Downloading tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting text-unidecode>=1.3 (from python-slugify->kaggle)
  Downloading text_unidecode-1.3-py2.py3-none-any.whl.metadata (2.4 kB)
Downloading python_slugify-8.0.4-py2.py3-none-any.whl (10 kB)
Downloading tqdm-4.66.4-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading text_unidecode-1.3-py2.py3-none-any.w

Change kaggle configuration directory to current working directory and permission of kaggle authentication json.

In [7]:
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

Set Kaggle Dataset and Download it.

In [9]:
KaggleDatasetPath = "bmadushanirodrigo/fracture-multi-region-x-ray-data"
DestinationFolder = "inputs/fracture_dataset"   
! kaggle datasets download -d {KaggleDatasetPath} -p {DestinationFolder}

Dataset URL: https://www.kaggle.com/datasets/bmadushanirodrigo/fracture-multi-region-x-ray-data
License(s): ODC Public Domain Dedication and Licence (PDDL)
Downloading fracture-multi-region-x-ray-data.zip to inputs/fracture_dataset
100%|███████████████████████████████████████▉| 480M/481M [00:19<00:00, 30.4MB/s]
100%|████████████████████████████████████████| 481M/481M [00:19<00:00, 26.3MB/s]


Unzip the downloaded file, delete the zip file.

In [10]:
import zipfile
with zipfile.ZipFile(DestinationFolder + '/fracture-multi-region-x-ray-data.zip', 'r') as zip_ref:
    zip_ref.extractall(DestinationFolder)

os.remove(DestinationFolder + '/fracture-multi-region-x-ray-data.zip')