# 🔷 PART 1: Exploratory Data Analysis 🔷

In this Jupyter notebook, we analyze our given external datasets through a **basic comprehensive** lens: we manipulate, curate, and prepare data in order to ask critical questions and gain an effective understanding of how to perform higher-level prediction-driven data modification.

---

## 🔵 TABLE OF CONTENTS 🔵 <a name="TOC"></a>

Use this **table of contents** to navigate the various sections of the preprocessing notebook.

#### 1. [Section A: Imports and Initializations](#section-A)

    All necessary imports and object instantiations for data preprocessing.

#### 2. [Section B: Manipulating Our Data](#section-B)

    Data manipulation operations, including (but not limited to) 
    null value imputation and data cleaning. 

#### 3. [Section C: Visualizing Trends Across Our Data](#section-C)

    Data visualizations to outline trends and patterns 
    inherent across our data that may mandate further analysis.

#### 4. [Section D: Saving Our Interim Datasets](#section-D)

    Saving preprocessed data states for further access.

#### 5. [Appendix: Supplementary Custom Objects](#appendix)

    Custom object architectures used throughout the data preprocessing.
    
---

## 🔹 Section A: Imports and Initializations <a name="section-A"></a>

General Imports for Data Manipulation and Visualization.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Custom Algorithmic Structures for Processed Data Visualization.

In [2]:
import sys
sys.path.append("../structures")

from dataset_preprocessor import Dataset_Preprocessor

##### [(back to top)](#TOC)

---

## 🔹 Section B: Manipulating Our Data <a name="section-B"></a>

Instantiating our Preprocessor Engine.

In [3]:
preproc = Dataset_Preprocessor()

Reading our data into tree-like dictionary hierarchical object.

In [4]:
datasets = preproc.load_data(which="all")

### Feature Encoding

Map over `Outcome` feature of all datasets and reencode with following specifications:
- _Active_: **`1`**
- _Inactive_: **`0`**

In [5]:
# Create encoding map for `Outcome` target feature as object property
preproc.outcome_encoding_map = {"Inactive": 0, "Active": 1}

# Call `.encode_feature()` to recursively create new encoded feature on each dataset
preproc.encode_feature(dataset_structure=datasets,
                       old_feature="Outcome",
                       new_feature="TargetActivity",
                       encoding_map=preproc.outcome_encoding_map)

##### [(back to top)](#TOC)

---

## 🔹 Section C: Visualizing Trends Across Our Data <a name="section-C"></a>

##### [(back to top)](#TOC)

---

## 🔹 Section D: Saving Our Interim Data <a name="section-D"></a>

##### [(back to top)](#TOC)

---

## 🔹 Appendix: Supplementary Custom Objects <a name="appendix"></a>

#### A[0]: Data Preprocessor Engine.

Attuned data preprocessor and dataset constructor for the Kaggle bioassay and compound activity data.

##### [(back to top)](#TOC)

---