# Interactive Data Exploration, Analysis, and Reporting

- Author: Team Data Science Process from Microsoft 
- Date: 2017/03
- Supported Data Sources: CSV files on the machine where the Jupyter notebook runs or data stored in SQL server
- Output: IDEAR_Report.ipynb


This is the **Interactive Data Exploration, Analysis and Reporting (IDEAR)** in _**Python**_ running on Jupyter Notebook. The data can be stored in CSV file on the machine where the Jupyter notebook runs or from a query running against a SQL server. A yaml file has to be pre-configured before running this tool to provide information about the data. 

## Step 1: Configure and Set up IDEAR

Before start utilitizing the functionalities provided by IDEAR, you need to first [configure and set up](#setup) the utilities by providing the yaml file and load necessary Python modules and libraries. 

## Step 2: Start using IDEAR
This tool provides various functionalities to help users explore the data and get insights through interactive visualization and statistical testing. 

- [Read and Summarize the data](#read and summarize)

- [Extract Descriptive Statistics of Data](#descriptive statistics)

- [Explore Individual Variables](#individual variables)

- [Explore Interactions between Variables](#multiple variables)

    - [Rank variables](#rank variables)
    
    - [Interaction between two categorical variables](#two categorical)
    
    - [Interaction between two numerical variables](#two numerical)

    - [Interaction between numerical and categorical variables](#numerical and categorical)

    - [Interaction between two numerical variables and a categorical variable](#two numerical and categorical)

- [Visualize High Dimensional Data via Projecting to Lower Dimension Principal Component Spaces](#pca)

- [Generate Data Report](#report)

After you are done with exploring the data interactively, you can choose to [show/hide the source code](#show hide codes) to make your notebook look neater. 

**Note**:

- Change the working directory and yaml file before running IDEAR in Jupyter Notebook.

- Run the cells and click *Export* button to export the code that generates the visualization/analysis result to temporary Jupyter notebooks.

- Run the last cell and click [***Generate Final Report***](#report) to create *IDEAR_Report.ipynb* in the working directory. _If you do not export codes in some sections, you may see some warnings complaining that some temporary Jupyter Notebook files are missing_. 

- Upload *IDEAR_Report.ipynb* to Jupyter Notebook server, and run it to generate report.

## <a name="setup"></a>Global Configuration and Setting Up

In [1]:
# Set the working directory as the directory where ReportMagics.py stays
# Use \\ in your path
import os
workingDir = 'C:\\Users\\jul_9\\OneDrive\\Escritorio\\Azure-TDSP-Utilities-5ce6b62f771abae218ef57bfa75545a4cb472d04\\DataScienceUtilities\\DataReport-Utils\\Python'
os.chdir(workingDir)

from ReportMagics import *

merged_report ='IDEAR_Report.ipynb'
%reset_all

In [2]:
#%%add_conf_code_to_report
import os
workingDir = 'C:\\Users\\jul_9\\OneDrive\\Escritorio\\Azure-TDSP-Utilities-5ce6b62f771abae218ef57bfa75545a4cb472d04\\DataScienceUtilities\\DataReport-Utils\\Python'
os.chdir(workingDir)

conf_file = '.\\para-adult.yaml'
Sample_Size = 10000

export_dir = '.\\tmp\\'

### Import necessary packages and set up environment parameters

In [3]:
#%%add_conf_code_to_report

import pandas as pd
import numpy as np
import os
#os.chdir(workingDir)
import collections
import matplotlib
import io
import sys
import operator

import nbformat as nbf
from IPython.core.display import HTML
from IPython.display import display
from ipywidgets import interact, interactive,fixed
from IPython.display import Javascript, display,HTML
from ipywidgets import widgets, VBox
import ipywidgets
import IPython
from IPython.display import clear_output
import scipy.stats as stats
from statsmodels.graphics.mosaicplot import mosaic
import statsmodels.api as sm
from statsmodels.formula.api import ols
import os
import errno
import seaborn as sns
from string import Template
from functools import partial
from collections import OrderedDict
import warnings
warnings.filterwarnings("ignore")

# Utility Classes
from ConfUtility import * 
from ReportGeneration import *
from UniVarAnalytics import *
from MultiVarAnalytics import *

%matplotlib inline

#DEBUG=0

font={'family':'normal','weight':'normal','size':8}
matplotlib.rc('font',**font)
matplotlib.rcParams['figure.figsize'] = (12.0, 5.0)
matplotlib.rc('xtick', labelsize=9) 
matplotlib.rc('ytick', labelsize=9)
matplotlib.rc('axes', labelsize=10)
matplotlib.rc('axes', titlesize=10)
sns.set_style('whitegrid')

### Define some functions for generating reports

In [4]:


if not os.path.exists(export_dir):
    os.makedirs(export_dir)
    

def translate_code_commands(cell, exported_cols, composite=False):    
    new_code_store = []
    exported_cols = [each for each in exported_cols if each!='']   
    for each in exported_cols:       
        w,x,y = each.split(',')
        with open('log.txt','w') as fout:
            fout.write('Processing call for the column {}'.format(each))
        temp=cell[0]

        new_line = temp.replace('interactive','apply').replace(
            "df=fixed(df)","df").replace("filename=fixed(filename)","'"+ReportMagic.var_files+"'").replace(
            "col1=w1","'"+w+"'").replace("col2=w2","'"+x+"'").replace("col3=w3","'"+y+"'").replace(
            "col3=fixed(w3)","'"+y+"'").replace(
            "Export=w_export","False").replace("conf_dict=fixed(conf_dict)","conf_dict")       
        new_line = new_line.replace("df,","[df,")
        new_line = new_line[:len(new_line)-1]+"])"
        new_line = new_line.replace("apply(","").replace(", [", "(*[")
        new_code_store.append(new_line)        
    return new_code_store


    
def silentremove(filename):
    try:
        os.remove(filename)
    except OSError as e: # this would be "except OSError, e:" before Python 2.6
        if e.errno != errno.ENOENT: # errno.ENOENT = no such file or directory
            raise # re-raise exception if a different error occured


def getWidgetValue(w):
    w_value = ''
    try:
        w_value = w.value
    except:
        pass    
    return w_value


    
            

## <a name="read and summarize"></a> Read and Summarize the Data

### Read data and infer column types

In [5]:

import yaml
with open(conf_file, 'r') as file:
    conf_dict = yaml.safe_load(file)

# Read in data from local file or SQL server
if 'DataSource' not in conf_dict:
    df=pd.read_csv(conf_dict['DataFilePath'][0], skipinitialspace=True)
else:
    import pyodbc
    cnxn = pyodbc.connect('driver=ODBC Driver 11 for SQL Server;server={};database={};Uid={};Pwd={}'.format(
            conf_dict['Server'], conf_dict['Database'],conf_dict['Username'],conf_dict['Password']))
    df = pd.read_sql(conf_dict['Query'],cnxn)

# Making sure that we are not reading any extra column
df = df[[each for each in df.columns if 'Unnamed' not in each]]

# Sampling Data if data size is larger than 10k
df0 = df # df0 is the unsampled data. Will be used in data exploration and analysis where sampling is not needed
         # However, keep in mind that your final report will always be based on the sampled data. 
if Sample_Size < df.shape[0]:
    df = df.sample(Sample_Size)

# change float data types
if 'FloatDataTypes' in conf_dict:   
    for col_name in conf_dict['FloatDataTypes']:
        df[col_name] = df[col_name].astype(float)      
        
# Getting the list of categorical columns if it was not there in the yaml file
if 'CategoricalColumns' not in conf_dict:
    conf_dict['CategoricalColumns'] = list(set(list(df.select_dtypes(exclude=[np.number]).columns)))

# Getting the list of numerical columns if it was not there in the yaml file
if 'NumericalColumns' not in conf_dict:
    conf_dict['NumericalColumns'] = list(df.select_dtypes(include=[np.number]).columns)    

# Exclude columns that we do not need
if 'ColumnsToExclude' in conf_dict:
    conf_dict['CategoricalColumns'] = list(set(conf_dict['CategoricalColumns'])-set(conf_dict['ColumnsToExclude']))
    conf_dict['NumericalColumns'] = list(set(conf_dict['NumericalColumns'])-set(conf_dict['ColumnsToExclude']))

# Ordering the categorical variables according to the number of unique categories
filtered_cat_columns = []
temp_dict = {}
for cat_var in conf_dict['CategoricalColumns']:
    temp_dict[cat_var] = len(np.unique(df[cat_var]))
sorted_x = sorted(temp_dict.items(), key=operator.itemgetter(0), reverse=True)
conf_dict['CategoricalColumns'] = [x for (x,y) in sorted_x]

ConfUtility.dict_to_htmllist(conf_dict,['Target','CategoricalColumns','NumericalColumns'])

### Print the first n (n=5 by default) rows of the data

In [6]:

def custom_head(df,NoOfRows):
    return HTML(df.head(NoOfRows).style.set_table_attributes("class='table'").to_html())
i = interact(custom_head,df=fixed(df0), NoOfRows=ipywidgets.IntSlider(min=0, max=30, step=1, \
                                                                     value=5, description='Number of Rows'))

interactive(children=(IntSlider(value=5, description='Number of Rows', max=30), Output()), _dom_classes=('widg…

### Print the dimensions of the data (rows, columns)

In [7]:

print ('The data has {} Rows and {} columns'.format(df0.shape[0],df0.shape[1]))

The data has 32561 Rows and 15 columns


### Print the column names of the data

In [8]:

col_names = ','.join(each for each in list(df.columns))
print("The column names are:" + col_names)

The column names are:age,workclass,fnlwgt,education,educationnum,maritalstatus,occupation,relationship,race,sex,capitalgain,capitalloss,hoursperweek,nativecountry,label_IsOver50K


### Print the column types

In [9]:

print("The types of columns are:")
df.dtypes

The types of columns are:


age                 int64
workclass          object
fnlwgt              int64
education          object
educationnum        int64
maritalstatus      object
occupation         object
relationship       object
race               object
sex                object
capitalgain         int64
capitalloss         int64
hoursperweek        int64
nativecountry      object
label_IsOver50K     int64
dtype: object

## <a name="individual variable"></a>Extract Descriptive Statistics of Each Column

In [10]:

def num_missing(x):
    return len(x.index)-x.count()

def num_unique(x):
    return len(np.unique(x))

temp_df = df0.describe().T
missing_df = pd.DataFrame(df0.apply(num_missing, axis=0)) 
missing_df.columns = ['missing']
unq_df = pd.DataFrame(df0.apply(num_unique, axis=0))
unq_df.columns = ['unique']
types_df = pd.DataFrame(df0.dtypes)
types_df.columns = ['DataType']

### Print the descriptive statistics of numerical columns

In [11]:

summary_df = temp_df.join(missing_df).join(unq_df).join(types_df)
summary_df

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,missing,unique,DataType
age,32561.0,38.581647,13.640433,17.0,28.0,37.0,48.0,90.0,0,73,int64
fnlwgt,32561.0,189778.366512,105549.977697,12285.0,117827.0,178356.0,237051.0,1484705.0,0,21648,int64
educationnum,32561.0,10.080679,2.57272,1.0,9.0,10.0,12.0,16.0,0,16,int64
capitalgain,32561.0,1077.648844,7385.292085,0.0,0.0,0.0,0.0,99999.0,0,119,int64
capitalloss,32561.0,87.30383,402.960219,0.0,0.0,0.0,0.0,4356.0,0,92,int64
hoursperweek,32561.0,40.437456,12.347429,1.0,40.0,40.0,45.0,99.0,0,94,int64
label_IsOver50K,32561.0,0.24081,0.427581,0.0,0.0,0.0,0.0,1.0,0,2,int64


### Print the descriptive statistics of categorical columns

In [12]:

col_names = list(types_df.index) #Get all col names
num_cols = len(col_names)
index = range(num_cols)
cat_index = []
for i in index: #Find the indices of columns in Categorical columns
    if col_names[i] in conf_dict['CategoricalColumns']:
        cat_index.append(i)
summary_df_cat = missing_df.join(unq_df).join(types_df.iloc[cat_index], how='inner') #Only summarize categorical columns
summary_df_cat

Unnamed: 0,missing,unique,DataType
workclass,0,9,object
education,0,16,object
maritalstatus,0,7,object
occupation,0,15,object
relationship,0,6,object
race,0,5,object
sex,0,2,object
nativecountry,0,42,object
label_IsOver50K,0,2,int64


## <a name="individual variables"></a>Explore Individual Variables

### Explore the target variable

In [13]:
md_text = '## Target Variable'
filename = 'tmp/target_variables.csv'


if conf_dict['Target'] in conf_dict['CategoricalColumns']:
    w1_value,w2_value,w3_value = '','',''
    w1, w2, w3, w4 = None, None, None, None
    silentremove(filename)    
    w1 = widgets.Dropdown(
        options=[conf_dict['Target']],
        value=conf_dict['Target'],
        description='Target Variable:',
    )

    
    
    i = interactive(TargetAnalytics.custom_barplot, df=fixed(df), \
                                                    filename=fixed(filename), col1=w1) 
    
    hbox = widgets.HBox([i])
    display(hbox)
    
else:
    w1_value, w2_value, w3_value = '', '', ''
    w1, w2, w3, w4 = None, None, None, None
    silentremove(filename) 
    w1 = widgets.Dropdown(
            options=[conf_dict['Target']],
            value=conf_dict['Target'],
            description='Target Variable:',
        )
    
    
    i = interactive(NumericAnalytics.custom_barplot, df=fixed(df), filename=fixed(filename),\
                                                    col1=w1) 
   
    hbox = widgets.HBox([i])
    display(hbox)
   

HBox(children=(interactive(children=(Dropdown(description='Target Variable:', options=('label_IsOver50K',), va…

### Explore individual numeric variables and test for normality (on sampled data)

In [14]:
md_text = '## Visualize Individual Numerical Variables (on Sampled Data)'
filename = ReportMagic.var_files='tmp/numeric_variables.csv'

w1_value, w2_value, w3_value = '', '', ''
w1, w2, w3, w4 = None, None, None, None
silentremove(filename) 
w1 = widgets.Dropdown(
        options=conf_dict['NumericalColumns'],
        value=conf_dict['NumericalColumns'][0],
        description='Numeric Variable:',
    )


i = interactive(NumericAnalytics.custom_barplot, df=fixed(df), filename=fixed(filename),\
                                                col1=w1) 
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Target Variable:', options=('label_IsOver50K',), va…

### Explore individual categorical variables (sorted by frequencies)

In [15]:
w_export = None
md_text = '## Visualize Individual Categorical Variables'
filename = ReportMagic.var_files='tmp/categoric_variables.csv'


w1_value, w2_value, w3_value = '', '', ''
w1, w2, w3, w4 = None, None, None, None
silentremove(filename) 
w1 = widgets.Dropdown(
    options = conf_dict['CategoricalColumns'],
    value = conf_dict['CategoricalColumns'][0],
    description = 'Categorical Variable:',
)


i = interactive(CategoricAnalytics.custom_barplot, df=fixed(df),\
                                                filename=fixed(filename), col1=w1) 

hbox = widgets.HBox([i])
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Categorical Variable:', options=('workclass', 'sex'…

## <a name="multiple variables"></a>Explore Interactions Between Variables

### <a name="rank variables"></a>Rank variables based on linear relationships with reference variable (on sampled data)

In [16]:
md_text = '## Rank variables based on linear relationships with reference variable (on sampled data)'
filename = ReportMagic.var_files='tmp/rank_associations.csv'

silentremove(filename)
cols_list = [conf_dict['Target']] + conf_dict['NumericalColumns'] + conf_dict['CategoricalColumns'] #Make target the default reference variable
cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
w1 = widgets.Dropdown(    
    options=cols_list,
    value=cols_list[0],
    description='Ref Var:'
)
w2 = ipywidgets.Text(value="5", description='Top Num Vars:')
w3 = ipywidgets.Text(value="5", description='Top Cat Vars:')

i = interactive(InteractionAnalytics.rank_associations, df=fixed(df), \
                                                conf_dict=fixed(conf_dict), col1=w1, col2=w2, col3=w3) 
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Categorical Variable:', options=('workclass', 'sex'…

### <a name="two categorical"></a>Explore interactions between categorical variables

In [17]:
md_text = '## Interaction between categorical variables'
filename = ReportMagic.var_files='tmp/cat_interactions.csv'

silentremove(filename) 
w1, w2, w3, w4 = None, None, None, None

if conf_dict['Target'] in conf_dict['CategoricalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['CategoricalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['CategoricalColumns']
    
w1 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[0],
    description='Categorical Var 1:'
)
w2 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[1],
    description='Categorical Var 2:'
)

i = interactive(InteractionAnalytics.categorical_relations, df=fixed(df), \
                                         filename=fixed(filename), col1=w1, col2=w2) 
hbox = widgets.HBox([i])
display(hbox)



HBox(children=(interactive(children=(Dropdown(description='Categorical Var 1:', options=('label_IsOver50K', 'w…

### <a name="two numerical"></a>Explore interactions between numerical variables (on sampled data)

In [18]:
md_text = '## Interaction between numerical variables (on sampled data)'
filename = ReportMagic.var_files='tmp/numerical_interactions.csv'

silentremove(filename) 
w1, w2, w3, w4 = None, None, None, None

if conf_dict['Target'] in conf_dict['NumericalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['NumericalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['NumericalColumns']
w1 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[0],
    description='Numerical Var 1:'
)
w2 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[1],
    description='Numerical Var 2:'
)

i = interactive(InteractionAnalytics.numerical_relations, df=fixed(df), \
                                         col1=w1, col2=w2) 
hbox = widgets.HBox([i])
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Numerical Var 1:', options=('age', 'fnlwgt', 'educa…

### Explore correlation matrix between numerical variables

In [19]:
md_text = '## Explore correlation matrix between numerical variables'
filename = ReportMagic.var_files='tmp/numerical_corr.csv'
export_filename = 'numerical_correlations_report2.ipynb'
silentremove(filename) 
w1, w2, w3, w4 = None, None, None, None
w1 = widgets.Dropdown(
    options=['pearson','kendall','spearman'],
    value='pearson',
    description='Correlation Method:'
)

i = interactive(InteractionAnalytics.numerical_correlation, df=fixed(df), conf_dict=fixed(conf_dict),\
                                         col1=w1) 

hbox = widgets.HBox([i])
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Correlation Method:', options=('pearson', 'kendall'…

### <a name="numerical and categorical"></a>Explore interactions between numerical and categorical variables

In [20]:
md_text = '## Explore interactions between numerical and categorical variables'
filename = ReportMagic.var_files = 'tmp/nc_int.csv'

silentremove(filename) 
w1, w2, w3, w4 = None, None, None, None

if conf_dict['Target'] in conf_dict['NumericalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['NumericalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['NumericalColumns']
    
w1 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[0],
    description='Numerical Variable:'
)

if conf_dict['Target'] in conf_dict['CategoricalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['CategoricalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['CategoricalColumns']
    
w2 = widgets.Dropdown(
    options=cols_list,
    value=cols_list[0],
    description='Categorical Variable:'
)

i = interactive(InteractionAnalytics.nc_relation, df=fixed(df), \
                                                conf_dict=fixed(conf_dict), col1=w1, col2=w2, \
                                                col3=fixed(w3)) 
hbox = widgets.HBox([i])
display( hbox )


HBox(children=(interactive(children=(Dropdown(description='Numerical Variable:', options=('age', 'fnlwgt', 'ed…

### <a name="two numerical and categorical"></a>Explore interactions between two numerical variables and a categorical variable (on sampled data)

In [21]:
md_text = '## Explore interactions between two numerical variables and a categorical variable (on sampled data)'
filename = ReportMagic.var_files='tmp/nnc_int.csv'

silentremove(filename) 
w1, w2, w3, w4 = None, None, None, None

if conf_dict['Target'] in conf_dict['NumericalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['NumericalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['NumericalColumns']
    
w1 = widgets.Dropdown(
    options = cols_list,
    value = cols_list[0],
    description = 'Numerical Var 1:'
)
w2 = widgets.Dropdown(
    options = cols_list,
    value = cols_list[1],
    description = 'Numerical Var 2:'
)

if conf_dict['Target'] in conf_dict['CategoricalColumns']:
    cols_list = [conf_dict['Target']] + conf_dict['CategoricalColumns'] #Make target the default reference variable
    cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
else:
    cols_list = conf_dict['CategoricalColumns']
    
w3 = widgets.Dropdown(
    options = cols_list,
    value = cols_list[0],
    description = 'Legend Cat Var:'
)

i = interactive(InteractionAnalytics.nnc_relation, df=fixed(df),\
                                                conf_dict=fixed(conf_dict), col1=w1, col2=w2, col3=w3) 
hbox = widgets.HBox([i])
display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Numerical Var 1:', options=('age', 'fnlwgt', 'educa…

## <a name="pca"></a>Visualize numerical data by projecting to principal component spaces (on sampled data)

### Project data to 2-D principal component space (on sampled data)

In [25]:
num_numeric = len(conf_dict['NumericalColumns'])
if  num_numeric > 3:
    md_text = '## Project Data to 2-D Principal Component Space'
    filename = ReportMagic.var_files = 'tmp/numerical_pca.csv'
    silentremove(filename) 
    
    w1, w2, w3, w4, w5 = None, None, None, None, None
    if conf_dict['Target'] in conf_dict['CategoricalColumns']:
        cols_list = [conf_dict['Target']] + conf_dict['CategoricalColumns'] #Make target the default reference variable
        cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
    else:
        cols_list = conf_dict['CategoricalColumns']
    w1 = widgets.Dropdown(
        options = cols_list,
        value = cols_list[0],
        description = 'Legend Variable:',
        width = 10
    )
    w2 = widgets.Dropdown(
        options = [str(x) for x in np.arange(1,num_numeric+1)],
        value = '1',
        width = 1,
        description='PC at X-Axis:'
    )
    w3 = widgets.Dropdown(
        options = [str(x) for x in np.arange(1,num_numeric+1)],
        value = '2',
        description = 'PC at Y-Axis:'
    )
    

    i = interactive(InteractionAnalytics.numerical_pca, df=fixed(df),\
                                                    conf_dict=fixed(conf_dict), col1=w1, col2=w2, col3=w3) 

    
    hbox = widgets.HBox([i])
    display(hbox)
    

HBox(children=(interactive(children=(Dropdown(description='Legend Variable:', options=('label_IsOver50K', 'wor…

### Project data to 3-D principal component space (on sampled data)

In [26]:
md_text = '## Project Data to 3-D Principal Component Space (on sampled data)'
if len(conf_dict['NumericalColumns']) > 3:
    filename = ReportMagic.var_files='tmp/pca3d.csv'
 
    silentremove(filename) 
    if conf_dict['Target'] in conf_dict['CategoricalColumns']:
        cols_list = [conf_dict['Target']] + conf_dict['CategoricalColumns'] #Make target the default reference variable
        cols_list = list(OrderedDict.fromkeys(cols_list)) #remove variables that might be duplicates with target
    else:
        cols_list = conf_dict['CategoricalColumns']
    w1, w2, w3, w4 = None, None, None, None
    w1 = widgets.Dropdown(
        options=cols_list,
        value=cols_list[0],
        description='Legend Variable:'
    )
    w2 = ipywidgets.IntSlider(min=-180, max=180, step=5, value=30, description='Angle')
 
    i = interactive(InteractionAnalytics.pca_3d, df=fixed(df), conf_dict=fixed(conf_dict),\
                                              col1=w1, col2=w2, col3=fixed(w3)) 

    hbox = widgets.HBox([i])
    display(hbox)


HBox(children=(interactive(children=(Dropdown(description='Legend Variable:', options=('label_IsOver50K', 'wor…