# Brief Data Exploration of the California Housing Dataset

 <a href="https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Data_Summaries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install Prequisites

In [None]:
!pip install python-arango
!pip install arangopipe==0.0.6.9.3
!pip install pandas PyYAML==5.1.1 sklearn2
!pip install jsonpickle
!pip install seaborn
!pip install dtreeviz

## Read Data

In [None]:

import pandas as pd
data_url = "https://raw.githubusercontent.com/arangoml/arangopipe/arangopipe_examples/examples/data/cal_housing.csv"
df = pd.read_csv(data_url, error_bad_lines=False)

## Generate Summaries

In [None]:
df.describe()

In [None]:
df.dtypes

In [None]:
df.hist()

## Store Results in Arangopipe Database Used For Model Tracking

**NOTE: You need to run this notebook after you have run the notebook Arangopipe_Feature_Examples.ipynb**. We will be using the same database to store the modeling activity. Unused Arangopipe managed service instances may have been recycled. So if it has been a while (more than two weeks) since you have run the Arangopipe_Feature_Examples notebook, please run that notebook prior to running this one.

In [None]:
from arangopipe.arangopipe_storage.arangopipe_api import ArangoPipe
from arangopipe.arangopipe_storage.arangopipe_admin_api import ArangoPipeAdmin
from arangopipe.arangopipe_storage.arangopipe_config import ArangoPipeConfig
from arangopipe.arangopipe_storage.managed_service_conn_parameters import ManagedServiceConnParam
mdb_config = ArangoPipeConfig()
from google.colab import drive
drive.mount('/content/drive')
fp = '/content/drive/My Drive/saved_arangopipe_config.yaml'
conn_params = mdb_config.create_config(fp)

In [None]:
mdb_config = mdb_config.create_connection_config(conn_params)
admin = ArangoPipeAdmin(reuse_connection = True, config = mdb_config)
ap_config = admin.get_config()
ap = ArangoPipe(config = ap_config)
proj_info = {"name": "Housing_Price_Estimation_Project"}
proj_reg = admin.register_project(proj_info)

model_info = {"name": "Data Summaries for Housing Dataset",  "task": "Exploratory Data Analysis"}
model_reg = ap.register_model(model_info, project = "Housing_Price_Estimation_Project")

## Linking Models
We will link the model created in this notebook to the model created in  the notebook Arangopipe_Feature_Examples.ipynb. To do so, we need to do the following:
1. Lookup the model we want to link to and obtain its identifier
2. Call the link entities API with the model identifier created in this notebook as the source and the model identifier obtained from the lookup in the previous step as the destination

The link entities API creates an attribute called "related_models" in the source node. We can verify the successfull linking by introspecting the model object to verify that the attribute capturing the related model is created.

In [None]:
lasso_model = ap.lookup_model("Lasso Model for Housing Dataset")

In [None]:
model_reg['_id']

In [None]:
lasso_model['_id']

In [None]:
ap.link_entities(model_reg['_id'], lasso_model['_id'])

In [None]:
ap.lookup_model("Data Summaries for Housing Dataset")