# AccelerateAI : Data Science Bootcamp

This notebook will focus on LUX

## AutoEDA - LUX

### Dataset Reference: Loan Prediction dataset

### Features:

* Objective of LUX is to facilitate fast and simple data exploration by automating the visualization and data analysis process
* General Overview - We can get interesting trends and statistics from the dataset. (Correlation, Distribution, Occurrence, Geographical features on a dataframe)
* Provides quick visualization in Jupyter widget
* Powerful intent language for features - Enhance, Filter, Generalize approach
* Automated visualizations of dataframes


### When To Use?

* Need some quick insights about an unknown dataset
* Use this as a basis for your further EDA analysis on top of it
* Need to analyze the intent of your features quickly

In [4]:
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

In [5]:
#Please use it for the first time if it is not installed in your environment by uncommenting it

#!pip install lux-api 

In [6]:
import lux
from lux.vis.Vis import Vis

In [7]:
df_train = pd.read_csv("./loan-train.csv")

df_train.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y




In [8]:
df_test = pd.read_csv("./loan-test.csv")

df_test.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
0,LP001015,Male,Yes,0,Graduate,No,5720,0,110.0,360.0,1.0,Urban
1,LP001022,Male,Yes,1,Graduate,No,3076,1500,126.0,360.0,1.0,Urban
2,LP001031,Male,Yes,2,Graduate,No,5000,1800,208.0,360.0,1.0,Urban
3,LP001035,Male,Yes,2,Graduate,No,2340,2546,100.0,360.0,,Urban
4,LP001051,Male,No,0,Not Graduate,No,3276,0,78.0,360.0,1.0,Urban




In [9]:
df_train.shape

(614, 13)

In [10]:
df_test.shape

(367, 12)

## Explore with LUX

In [11]:
df_train

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



In [12]:
df_train.intent = ['LoanAmount']

df_train

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



In [13]:
df_train.intent = ['LoanAmount','ApplicantIncome']

df_train

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



In [14]:
df_train.set_intent = ['LoanAmount','ApplicantIncome']

df_train

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()



## Visualization with LUX

In [15]:
Vis(["Loan_Status"], df_train)

LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…

<Vis  (x: COUNT(Record), y: Loan_Status) mark: bar, score: 0.0 >

In [16]:
df_train.info()

<class 'lux.core.frame.LuxDataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


In [17]:
Vis(["Property_Area=Urban","LoanAmount"], df_train)

LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…

<Vis  (x: BIN(LoanAmount), y: COUNT(Record) -- [Property_Area=Urban]) mark: histogram, score: 0.0 >

In [18]:
Vis(["Property_Area=Rural","LoanAmount"], df_train)

LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…

<Vis  (x: BIN(LoanAmount), y: COUNT(Record) -- [Property_Area=Rural]) mark: histogram, score: 0.0 >

## Interpretation Summary

* Facilitates fast and simple data exploration by automating the visualization and data analysis process
    * Exploration + Visualization methods

* General Overview - We can get interesting trends and statistics from the dataset. 
    * Correlation 
    * Distribution
    * Occurrence
    * Geographical features on a dataframe
* Powerful intent language for features - 
    * Enhance
    * Filter
    * Generalize approach
* Automated visualizations of dataframes