# ballet-predict-house-prices demo

If you are on this page, you are probably thinking about contributing to a Ballet feature engineering collaboration. Welcome!

The steps you follow in this demo are taken from the [Ballet Contributor Guide](https://hdi-project.github.io/ballet/contributor_guide.html), make sure to consult it for more information.

If you have questions about feature engineering, see the [Feature Engineering Guide](https://hdi-project.github.io/ballet/feature_engineering_guide.html).

When you are ready to submit your feature, look for the "<span class="fa fa-share" style="color:#FCDD35;"></span> Submit" button in the right of your notebook toolbar. First, select the code cell that contains the feature you have written. Then press the submit button, confirming that the feature code shown is what you want to submit. After submission, you will be shown a URL that takes you to the corresponding Pull Request that has been created.

In [None]:
# some preliminaries...
import logging
from ballet.util.log import enable as enable_logger
enable_logger(logging.getLogger('ballet_predict_house_prices'))
import pandas as pd
pd.set_option('display.max_columns', None)

## Explore the data

Access the development data through the `load_data` function. The resulting variables are pandas DataFrames.

There is also a [detailed data dictionary](https://s3.amazonaws.com/mit-dai-ballet/ames/DataDocumentation.txt) that you might like to consult.

In [None]:
from ballet_predict_house_prices.load_data import load_data
X_df, y_df = load_data()

In [None]:
X_df.head()

In [None]:
y_df.head()

## Explore existing features

In [None]:
from ballet_predict_house_prices.features import build
result = build(X_df, y_df)
X_train, y_train = result['X'], result['y']

In [None]:
print('Number of existing features: ', len(result['features']))
print('Number of columns in feature matrix: ', X_train.shape[1])

## Write a new feature

Now it's time to write your own feature! Or, if you'd like to experience the process of submitting your feature directly, you can see pre-existing examples in `/examples`

🚧 Be careful -- **the content of the cell must be a standalone Python module**, as it will be placed in an empty Python source file. This means that any imports or helper functions must be defined (or 
re-defined) within this cell, otherwise your submitted feature will fail to validate due to missing imports/helpers. 🚧

In [None]:
from ballet import Feature

input = None
transformer = None
name = None
feature = Feature(input=input, transformer=transformer, name=name)

### Test your feature

You probably want to make sure that your feature does not have bugs before you submit it to the upstream project. This command will check that your feature conforms to the feature API. (Assumes you have assigned your feature to a variable `feature`.)

In [None]:
from ballet.validation.feature_api.validator import validate_feature_api
validate_feature_api(feature, X_df, y_df)

For now, you can evaluate the ML performance of your feature as follows. (Assumes you have assigned your feature to a variable `feature`.)

In [None]:
import ballet_predict_house_prices
import ballet_predict_house_prices.features as features
from ballet.project import Project
from ballet.validation.main import _load_class

project = Project(ballet_predict_house_prices)
out = features.build(X_df, y_df)
X_df, y, features = out['X_df'], out['y'], out['features']
Accepter = _load_class(project, 'validation.feature_accepter')
accepter = Accepter(X_df, y, features, feature)
accepter.judge()