# **Data Collection**

## Objectives

1. **Data Collection:**
   - Download the Kaggle dataset as a zip file.
   - Unzip the file into the unorganized folders given and into the designated directory.

2. **Data Cleaning and Organization:**
   - Extract all images from the various folders within the dataset.
   - Categorize the images into two folders: 'Alert' and 'Not Alert'
   - Remove non-image files to ensure a clean dataset.

3. **Dataset Splitting:**
   - Divide the cleaned and organized data into three sets: training, testing, and validation.

## Inputs

- Kaggle JSON file.

## Outputs

- A well-organized dataset containing only images, categorized into 'Awake' and 'Drowsy,' and separated into training, testing, and validation sets within the input folder.


---

# Import Libraries

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import tensorflow as tf
sns.set_style("white")
from matplotlib.image import imread

2023-12-28 15:21:21.592556: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/PP5-Driver-Awareness-Detector/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [4]:
current_dir = os.getcwd()
current_dir

'/workspace/PP5-Driver-Awareness-Detector'

#  Install Kaggle

Run the following to install the kaggle library

In [12]:
pip install kaggle

Note: you may need to restart the kernel to use updated packages.


run the following code to change the Kaggle configuration directory to the 
current working directory and set permissions for the Kaggle authentication JSON

In [13]:
os.environ['KAGGLE_CONFIG_DIR'] = os.getcwd()
! chmod 600 kaggle.json

Retreive the required kaggle dataset and download it

In [19]:
KaggleDataset = "kutaykutlu/drowsiness-detection"
DestinationFolder = "inputs/awareness"
! kaggle datasets download -d {KaggleDataset} -p {DestinationFolder}

drowsiness-detection.zip: Skipping, found more recently modified local copy (use --force to force download)


Unzip the downloaded dataset and delete the remaining zip file

In [20]:
import zipfile
with zipfile.ZipFile(DestinationFolder + '/drowsiness-detection.zip', 'r') as zip_ref:
    zip_ref.extractall(DestinationFolder)

os.remove(DestinationFolder + '/drowsiness-detection.zip')

---

# Data Cleaning and Organization

To clean the data, we will establish two directories: one dedicated to images of 'eyes open' and another for 'eyes closed'. Subsequently, we will iterate through all folders, specifically targeting those labeled "open_eyes" or "closed_eyes". As a final step, we will delete files not needed and any empty file paths

Import Shutil library

In [22]:
import shutil

In [27]:
# set directories to use
source_directory_path = 'inputs/awareness'
destination_directory_path = 'inputs/awareness'

def move_open_folders_to_open(source_dir, destination_dir, folder_to_move='open_eye', destination_folder='eyes_open'):
    open_folder_path = os.path.join(destination_dir, destination_folder)
    os.makedirs(open_folder_path, exist_ok=True)

    for root, dirs, files in os.walk(source_dir):
        if os.path.basename(root) == folder_to_move:
            for file in files:
                file_path = os.path.join(root, file)
                shutil.move(file_path, os.path.join(open_folder_path, file))

move_open_folders_to_open(source_directory_path, destination_directory_path)

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In case you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create here your folder
  # os.makedirs(name='')
except Exception as e:
  print(e)
