# Insurance Fraud Claim Prediction - Step 1: Data Analysis
This notebook demonstrates how to setup a Cortex Dataset to make data analysis visual and straightforward in an interactive Python environment.

In [None]:
# Basic setup
%run ./config.ipynb

In [None]:
# Connect to Cortex 5 and create a Builder instance
cortex = Cortex.client()
builder = cortex.builder()

## Training Data Setup
Our first step is to load our training data from the target data source (in this case a local file).  The Cortex 5 SDK for Python makes extensive use of the excellent [Pandas](https://pandas.pydata.org/) library as well some other well know data analysis and visualization libraries.  In this first set of steps, we will create a Cortex Dataset using a Pandas DataFrame instantiated using the __read_csv__ function.  Cortex 5 will automatically build a rich Dataset object using the source data and prepare it for further usage.

In [None]:
train_df = pd.read_csv('./datasets/MotorInsuranceFraudClaimABTFull.csv')

In [None]:
train_ds = builder.dataset('claims-fraud/motorinsurancefraud')\
    .title('Motor Insurance Training Data')\
    .from_df(train_df).build()
    
print("%s (%s) v%d" % (train_ds.title, train_ds.name, train_ds.version))

---
As you can see below, Cortex 5 auto-discovered all of the data parameters from the DataFrame and created the nesessary schema structure.

In [None]:
train_ds.parameters

In [None]:
viz = train_ds.visuals(train_df, figsize=(24,9))

In [None]:
viz.show_corr_heatmap()

In [None]:
viz.show_corr('Fraud Flag')

In [None]:
viz.show_dist('Fraud Flag')

In [None]:
viz.show_probplot('Fraud Flag')

In [None]:
viz.show_corr_pairs('Fraud Flag', threshold=0.7)