# Ames Housing Prices - Step 1: Data Analysis
This notebook demonstrates how to setup a Cortex Dataset to make data analysis visual and straightforward in an interactive Python environment. For vizualizations, please install the `cortex-python[viz]` library apart from the regular `cortex-python`. For builders, please install `cortex-python[builders]`

In [None]:
# Basic setup
%run ./config.ipynb

#### If not configured through cli, use Cortex.login() to configure through notebook. Configuring through Cli is the preffered route

In [None]:
# Connect to Cortex 5 and create a Builder instance
#Cortex.login()
cortex = Cortex.client()

# You can also run locally, with no need to connect to the Cortex client.
# Comment out the line cortex = Cortex.client() above and uncomment the line below
# cortex = Cortex.local()

builder = cortex.builder()

## Training Data Setup
Our first step is to load our training data from the target data source (in this case a local file).  The Cortex 5 SDK for Python makes extensive use of the excellent [Pandas](https://pandas.pydata.org/) library as well some other well know data analysis and visualization libraries.  In this first set of steps, we will create a Cortex Dataset using a Pandas DataFrame instantiated using the __read_csv__ function.  Cortex 5 will automatically build a rich Dataset object using the source data and prepare it for further usage.

In [None]:
train_df = pd.read_csv('../../data/kaggle/ames-housing/train.csv')

In [None]:
train_ds = builder.dataset('kaggle/ames-housing-train')\
    .title('Ames Housing Training Data')\
    .from_df(train_df).build()
    
print("%s (%s) v%d" % (train_ds.title, train_ds.name, train_ds.version))

---
As you can see below, Cortex 5 auto-discovered all of the data parameters from the DataFrame and created the nesessary schema structure.

In [None]:
train_ds.parameters

In [None]:
viz = train_ds.visuals(train_df, figsize=(24,9))

In [None]:
viz.show_corr_heatmap()

In [None]:
viz.show_corr('SalePrice')

In [None]:
viz.show_dist('SalePrice')

In [None]:
viz.show_probplot('SalePrice')

In [None]:
viz.show_corr_pairs('SalePrice', threshold=0.7)