# 🔷 PART 1: Exploratory Data Analysis 🔷

In this Jupyter notebook, we analyze our given external datasets through a **basic comprehensive** lens: we manipulate, curate, and prepare data in order to ask critical questions and gain an effective understanding of how to perform higher-level prediction-driven data modification.

---

## 🔵 TABLE OF CONTENTS 🔵 <a name="TOC"></a>

Use this **table of contents** to navigate the various sections of the preprocessing notebook.

#### 1. [Section A: Imports and Initializations](#section-A)

    All necessary imports and object instantiations for data preprocessing.

#### 2. [Section B: Manipulating Our Data](#section-B)

    Data manipulation operations, including (but not limited to) 
    null value imputation and data cleaning. 

#### 3. [Section C: Visualizing Trends Across Our Data](#section-C)

    Data visualizations to outline trends and patterns 
    inherent across our data that may mandate further analysis.

#### 4. [Section D: Saving Our Interim Datasets](#section-D)

    Saving preprocessed data states for further access.

#### 5. [Appendix: Supplementary Custom Objects](#appendix)

    Custom object architectures used throughout the data preprocessing.
    
---

## 🔹 Section A: Imports and Initializations <a name="section-A"></a>

General Imports for Data Manipulation and Visualization.

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Specialized Imports for Globular File/Directory Navigation.

In [10]:
import os, glob

Specialized Imports for Image Modification.

In [111]:
from PIL import Image
from PIL.ExifTags import TAGS

Custom Algorithmic Structures for Processed Data Visualization.

In [12]:
import sys
sys.path.append("../source/structures")

# TODO: Place custom structures from `../source/structures` here.

##### [(back to top)](#TOC)

---

## 🔹 Section B: Manipulating Our Data <a name="section-B"></a>

In [97]:
DIRPATHS = {
    "Parasitized":  "/Volumes/Bianca/DEVELOPER/data-science/Malaria-Imaging/datasets/1-raw/cell_images/Parasitized/",
    "Uninfected":   "/Volumes/Bianca/DEVELOPER/data-science/Malaria-Imaging/datasets/1-raw/cell_images/Uninfected/",
    "dummy":        "/Volumes/Bianca/DEVELOPER/data-science/Malaria-Imaging/datasets/1-raw/cell_images/dummy/"
}

In [98]:
def get_image_data(key):
    """ Create datasets from imaging subdirectories. """
    dataset, images = dict(), glob.glob(os.path.join(DIRPATHS[key], "*.png"))
    for position, filename in enumerate(glob.glob(os.path.join(DIRPATHS[key], "*.png"))):
        with open(filename, "rb") as frb:
            dataset["{}_{}".format(key, position)] = Image.open(frb).convert("RGB")
    return dataset

Importation of Dummy Dataset(s).

In [99]:
dataset_dummy = get_image_data(key="dummy")

Importation of True Datasets.

In [90]:
# dataset_parasitized = get_image_data(key="Parasitized")
# dataset_uninfected =  get_image_data(key="Uninfected")

General Investigation and Manipulation of Input Dataset(s).

In [101]:
supplement_dummy = pd.DataFrame(columns=["ID"], data=dataset_dummy.keys())

In [125]:
np.array(dataset_dummy["dummy_1"]).flatten()

array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)

In [113]:
exifdata = dataset_dummy["dummy_0"].getexif()

In [114]:
for tag_id in exifdata:
    tag = TAGS.get(tag_id, tag_id)
    fields = exifdata.get(tag_id)
    if isinstance(fields, bytes):
        fields = fields.decode()
    print(f"{tag:25}: {fields}")

##### [(back to top)](#TOC)

---

## 🔹 Section C: Visualizing Trends Across Our Data <a name="section-C"></a>

##### [(back to top)](#TOC)

---

## 🔹 Section D: Saving Our Interim Data <a name="section-D"></a>

##### [(back to top)](#TOC)

---

## 🔹 Appendix: Supplementary Custom Objects <a name="appendix"></a>

##### [(back to top)](#TOC)

---