![workflow graph](Figures/SolutionNo_3_length_7.png "Workflow Graph")

In [None]:
from pathlib import Path
import sys

import pandas as pd

sys.path.append('/Users/stevep/Documents/code/APE_thesis/ape_asp')
from wrapper_functions import *    

## Workflow Input Objects

### Table 1
- id: `housing_train`
- source: `/Users/stevep/Documents/code/APE_thesis/ape_asp/ape_use_cases/thesis_use_cases/house_prices/train.csv`
- DataClass: `MixedDataFrame`
- DataClass: `NoRelevance`    

In [None]:
housing_train = load_table_csv('/Users/stevep/Documents/code/APE_thesis/ape_asp/ape_use_cases/thesis_use_cases/house_prices/train.csv')

### Step 1: `pairplot`
#### Notes
Passing `col` and `n` will only display `n` most correlating features.
    > kwarg `hue` should be column with few features.
#### inputs:
- 1
	- DataClass: `MixedDataFrame`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `housing_train`
	- src: `(0, 12)`
- 2
	- StatisticalRelevance: `DependentVariable`
	- DataClass: `IntColumn`
	- APE_label: `SalePrice`
	- src: `(0, 10)`
- 3
	- APE_label: `10`
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Int`
	- src: `(0, 14)`
- 4
	- APE_label: `GarageArea`
	- DataClass: `IntColumn`
	- StatisticalRelevance: `IndependentVariable`
	- src: `(0, 9)`
#### outputs:
- 1
	- DataClass: `Figure`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `OverallQual`

In [None]:
figure_1_1 = pairplot(data=housing_train, col='SalePrice', n=10, hue='GarageArea')

### Step 2: `k_most_corr_indep_var_corr_matrix`
#### Notes
Matrix of `k` most to `col` correlating columns.
#### inputs:
- 1
	- DataClass: `MixedDataFrame`
	- APE_label: `housing_train`
	- StatisticalRelevance: `NoRelevance`
	- src: `(0, 12)`
- 2
	- DataClass: `IntColumn`
	- APE_label: `SalePrice`
	- StatisticalRelevance: `DependentVariable`
	- src: `(0, 10)`
- 3
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Int`
	- APE_label: `10`
	- src: `(0, 14)`
#### outputs:
- 1
	- APE_label: `GrLivArea`
	- DataClass: `FloatDataFrame`
	- StatisticalRelevance: `NoRelevance`

In [None]:
floatDataFrame_2_1 = k_most_corr_indep_var_corr_matrix(data=housing_train, col='SalePrice', k=10)

### Step 3: `heatmap`
#### Notes
`piv_col1`, `piv_col2` and `num_col`
    can be used to pivot the table before creating a heatmap
#### inputs:
- 1
	- DataClass: `FloatDataFrame`
	- APE_label: `GrLivArea`
	- StatisticalRelevance: `NoRelevance`
	- src: `(2, 1)`
#### outputs:
- 1
	- APE_label: `GarageCars`
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Figure`
- 2
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `GarageCars`
	- DataClass: `Axes`

In [None]:
figure_3_1, axes_3_2 = heatmap(df=floatDataFrame_2_1)

### Step 4: `set_figure_size`

#### inputs:
- 1
	- APE_label: `16`
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Int`
	- src: `(0, 15)`
- 2
	- DataClass: `Int`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `9`
	- src: `(0, 13)`
#### outputs:
- 1
	- DataClass: `Figure`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `1stFlrSF`
- 2
	- DataClass: `Axes`
	- APE_label: `OverallQual`
	- StatisticalRelevance: `NoRelevance`

In [None]:
figure_4_1, axes_4_2 = set_figure_size(x_size=16, y_size=9)

### Step 5: `scatterplot`
#### Notes
> kwarg `hue` should be column with few features.
    > kwarg `style` should be column with few features.
    
#### inputs:
- 1
	- DataClass: `MixedDataFrame`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `housing_train`
	- src: `(0, 12)`
- 2
	- APE_label: `TotRmsAbvGrd`
	- StatisticalRelevance: `IndependentVariable`
	- DataClass: `IntColumn`
	- src: `(0, 7)`
- 3
	- DataClass: `IntColumn`
	- APE_label: `SalePrice`
	- StatisticalRelevance: `DependentVariable`
	- src: `(0, 10)`
#### outputs:
- 1
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `OverallQual`
	- DataClass: `Figure`
- 2
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `OverallQual`
	- DataClass: `Axes`

In [None]:
figure_5_1, axes_5_2 = scatterplot(df=housing_train, x='TotRmsAbvGrd', y='SalePrice')

### Step 6: `boxplot`
#### Notes
> kwarg `x` should be column with few features.
    > kwarg `hue` should be column with few features.
#### inputs:
- 1
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `housing_train`
	- DataClass: `MixedDataFrame`
	- src: `(0, 12)`
- 2
	- APE_label: `SalePrice`
	- StatisticalRelevance: `DependentVariable`
	- DataClass: `IntColumn`
	- src: `(0, 10)`
- 3
	- StatisticalRelevance: `IndependentVariable`
	- DataClass: `StrColumn`
	- APE_label: `SaleCondition`
	- src: `(0, 11)`
- 4
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Figure`
	- APE_label: `1stFlrSF`
	- src: `(4, 1)`
- 5
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Axes`
	- APE_label: `OverallQual`
	- src: `(4, 2)`
#### outputs:
- 1
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `1stFlrSF`
	- DataClass: `Figure`
- 2
	- StatisticalRelevance: `NoRelevance`
	- DataClass: `Axes`
	- APE_label: `TotalBsmtSF`

In [None]:
figure_6_1, axes_6_2 = boxplot(df=housing_train, y='SalePrice', x='SaleCondition')

### Step 7: `rotate_x_labels`

#### inputs:
- 1
	- DataClass: `Figure`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `1stFlrSF`
	- src: `(6, 1)`
- 2
	- APE_label: `TotalBsmtSF`
	- DataClass: `Axes`
	- StatisticalRelevance: `NoRelevance`
	- src: `(6, 2)`
#### outputs:
- 1
	- DataClass: `Figure`
	- StatisticalRelevance: `NoRelevance`
	- APE_label: `YearBuilt`
- 2
	- APE_label: `YearBuilt`
	- DataClass: `Axes`
	- StatisticalRelevance: `NoRelevance`

In [None]:
figure_7_1, axes_7_2 = rotate_x_labels(figure=figure_6_1, axes=axes_6_2)