# Ames Housing Prices - Step 1: Data Analysis
This notebook demonstrates how to setup a Cortex Dataset to make data analysis visual and straightforward in an interactive Python environment. For vizualizations, please install the `cortex-python[viz]` library apart from the regular `cortex-python`. For builders, please install `cortex-python[builders]`

In [1]:
# Basic setup
%run ./config.ipynb


Cortex Python SDK v6.2.1a1


#### If not configured through cli, use Cortex.login() to configure through notebook. Configuring through Cli is the preffered route

In [2]:
# Connect to Cortex and create a Builder instance
Cortex.login()
cortex = Cortex.client()

# You can also run locally, with no need to connect to the Cortex client.
# Comment out the line cortex = Cortex.client() above and uncomment the line below
#cortex = Cortex.local()

#builder = cortex.builder()

## Create Client
## By default it will take the url, token, project from your local cortex cli profile setup, 
## if you want to set these values explicitly use below Cortex.client command with params
#client = Cortex.client()
#url = "https://api.dci-dev.dev-eks.insights.ai"
#token="eyJhbGciOiJFZERTQSIsImtpZCI6Il8zWDVpam9wZ1NKbS1KZWZ1YlB6eHlFLVdYbDdTMmpJVkMtdE1ac2JEb0EifQ.eyJiZWFyZXIiOiJ1c2VyIiwicm9sZXMiOm51bGwsImV4cCI6MTY2OTczMzU2OSwiaXNzIjoiY29nbml0aXZlc2NhbGUuY29tIiwiYXVkIjoiY29ydGV4Iiwic3ViIjoiMjZlOTZlNTgtZGI4Yy00OTVkLTgyNzktYzI0NWMzZTIzMzBlIn0.fHJtmBnY0XDR5FW0BaUQY42H13QbbyLH9699ThpOJcIHI8V61BJeRfLJQEdtESSyrhQGPiLroG2Vi6YiPBmQDg"
#project = "demo-lk-6b34a"
## If you are using SSL certificate, update this with certificate path
verify_ssl_cert = False
## explicitly passing url, token, project
#client = Cortex.client(api_endpoint=url,  verify_ssl_cert=verify_ssl_cert, token=token,project=project)

## Training Data Setup
Our first step is to load our training data from the target data source (in this case a local file).  The Cortex 5 SDK for Python makes extensive use of the excellent [Pandas](https://pandas.pydata.org/) library as well some other well know data analysis and visualization libraries.  In this first set of steps, we will create a Cortex Dataset using a Pandas DataFrame instantiated using the __read_csv__ function.  Cortex 5 will automatically build a rich Dataset object using the source data and prepare it for further usage.

In [3]:
import io

train_df = pd.read_csv('../datasets/datasets/kaggle/ames-housing/train.csv')
f_obj = open("/Users/lkrishna/git/cortex-fabric-examples/notebooks/datasets/datasets/kaggle/ames-housing/train.csv", "rb")
b_buf = io.BytesIO(f_obj.read())
f_obj.close()
train_df.to_parquet(b_buf,index=False)

In [4]:
from cortex.connection import ConnectionClient
from cortex.content import ManagedContentClient

mc_client = ManagedContentClient(cortex)
#response = mc_client.upload(key='credit_dataset', stream=b_buf, content_type='application/octet-stream', stream_name='credit_dataset')
response = mc_client.upload_streaming(key='ames_dataset',stream=b_buf, content_type='application/octet-stream',retries=1)
#response = mc_client.upload_streaming(key='ames_dataset',stream=f_obj, content_type='text/csv',retries=1)
f_obj.close()

In [None]:
## Commenting this as DataSet is deprecated in V6. Need a way to represent datasets for MLOps purposes
##train_ds = builder.dataset('kaggle/ames-housing-train')\
##    .title('Ames Housing Training Data')\
##    .from_df(train_df).build()
## print("%s (%s) v%d" % (train_ds.title, train_ds.name, train_ds.version))

---
As you can see below, Cortex 5 auto-discovered all of the data parameters from the DataFrame and created the nesessary schema structure.

In [None]:
train_ds.parameters

In [None]:
viz = train_ds.visuals(train_df, figsize=(24,9))

In [None]:
viz.show_corr_heatmap()

In [None]:
viz.show_corr('SalePrice')

In [None]:
viz.show_dist('SalePrice')

In [None]:
viz.show_probplot('SalePrice')

In [None]:
viz.show_corr_pairs('SalePrice', threshold=0.7)