## How to use exploretransform

Import the exploretransform package and load the included Boston housing corrected dataset:

In [38]:
import exploretransform as et

In [39]:
df, X, y = et.loadboston()

* df: full dataset
* X: predictors
* y: target

&nbsp;


### Summary of Functions and Classes

In [40]:
%%html
<style>
table {float:left}
</style>

Function / Class | Description
:---- | :------------- 
nested | takes a list, series or dataframe and returns the location of nested objects
loadboston | loads the Boston housing dataset
glimpse | provides dtype, levels, and first five observations for a dataframe
describe | provides various statistics on a dataframe (zeros, inf, missing, levels, dtypes)
freq | for categorical or ordinal features, provides the count, percent, and cumulative percent for each level
plotfreq | provides a bar plot using the data generated by freq
corrtable | generates a table of all pairwise correlations and uses the average correlation for the row and column in to decide on potential drop/filter candidates
calcdrop | analyzes corrtable output determines which features should be filtered/drop 
skewstats | returns the skewness statistics and magnitude for each numeric feature
ascores | calculates various association scores (kendall, pearson, mic, dcor, spearman) between predictors and target
ColumnSelect | custom transformer that selects columns for pipeline
CategoricalOtherLevel | custom transformer that creates "other" level in categorical / ordinal data based on threshold
CorrelationFilter | custom transformer that filters numeric features based on pairwise correlation

### Example: describe()

In [41]:
et.describe(X)

Unnamed: 0,variable,obs,q_zer,p_zer,q_na,p_na,q_inf,p_inf,dtype
0,town,506,0,0.0,0,0.0,0,0.0,object
1,lon,506,0,0.0,0,0.0,0,0.0,float64
2,lat,506,0,0.0,0,0.0,0,0.0,float64
3,crim,506,0,0.0,0,0.0,0,0.0,float64
4,zn,506,372,73.52,0,0.0,0,0.0,float64
5,indus,506,0,0.0,0,0.0,0,0.0,float64
6,chas,506,0,0.0,0,0.0,0,0.0,category
7,nox,506,0,0.0,0,0.0,0,0.0,float64
8,rm,506,0,0.0,0,0.0,0,0.0,float64
9,age,506,0,0.0,0,0.0,0,0.0,float64


In [42]:
%%html
<style>
table {float:left}
</style>

Column | Description
:---- | :------------- 
variable | name of variable
obs | number of observations
q\_zer | number of zeros
p\_zer | percentage of zeros
q\_na | number of missing
p\_na | percentage of missing
q\_inf | number of infinity
p\_inf | percentage of infinity
dtype | Python dtype

#### glimpse() returns:

In [43]:
%%html
<style>
table {float:left}
</style>

Column | Description
:---- | :------------- 
variable | name of variable
dtype | Python dtype
lvls | unique values of variable
obs | number of observations
head | first five observations

In [44]:
et.glimpse(X)

Unnamed: 0,variable,dtype,lvls,obs,first_five_observations
0,town,object,92,506,"[Nahant, Swampscott, Swampscott, Marblehead, M..."
1,lon,float64,375,506,"[-70.955, -70.95, -70.936, -70.928, -70.922]"
2,lat,float64,376,506,"[42.255, 42.2875, 42.283, 42.293, 42.298]"
3,crim,float64,504,506,"[0.00632, 0.02731, 0.02729, 0.0323699999999999..."
4,zn,float64,26,506,"[18.0, 0.0, 0.0, 0.0, 0.0]"
5,indus,float64,76,506,"[2.31, 7.07, 7.07, 2.18, 2.18]"
6,chas,category,2,506,"[0, 0, 0, 0, 0]"
7,nox,float64,81,506,"[0.5379999999999999, 0.469, 0.469, 0.457999999..."
8,rm,float64,446,506,"[6.575, 6.421, 7.185, 6.997999999999999, 7.147]"
9,age,float64,356,506,"[65.2, 78.9, 61.1, 45.8, 54.2]"
