# Sample notebook execution in RapidMiner
Any Jupyter Notebook can just be executed inside RapidMiner with the same Python environment. No changes to the notebook are necessary. Also there are several options to enhance the experience:

- Providing a function returning a Pandas DataFrame: this directly makes the data usable after calling a notebook from within RapidMiner.
- Adding a meta-data dictionary to a data frame that should be returned in order to provide additional information to colleagues about the data. RapidMiner specific roles like: 'label', 'prediction', but also self-defined ones can be used.
- Adding jupyter notebook tags in order to allow filtering which cells should be loaded by the RapidMiner user.
- Using the RapidMiner Python library to store data directly inside a RapidMiner repository.
- Using the RapidMiner Python library to execute RapidMiner processes.

## Demo process summary:
In this notebook the boston housing data set is loaded and examined. Afterwards a random forest regressor is trained on a subset and applied on a test data set. Finally a new pandas data frame is created storing the test data, the predicted values, true label values and additional meta-data. This data frame provided through a function that RapidMiner awaits and thus directly available for a RapidMiner user to use as an ExampleSet.

The demo process contains the tag 'prototyping' which can be used inside a RapidMiner process to filter out these cells to only exclude no prototype cells.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

In [None]:
%matplotlib inline

In [None]:
# Loading example data
boston = datasets.load_boston()
X = boston.data
y = boston.target

In [None]:
# Quick data check
print(boston.DESCR)
print("Label distribution:")
plt.title("House Price")
plt.plot(y);

In [None]:
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2)
estimator = RandomForestRegressor(n_estimators=100)

In [None]:
estimator.fit(X_train,y_train)

In [None]:
prediction = estimator.predict(X_test)

In [None]:
# Provide results to RapidMiner
def rm_main():
    # Create data frame with data the model was applied to
    results = pd.DataFrame(
        data=X_test,
        columns=boston.feature_names)
    # Add predictions to the data set
    results['Prediction'] = prediction
    # Add the true house prices
    results['HousePrice'] = y_test
    # And provide meta data information
    meta_data = dict(Prediction = ('numerical', 'prediction'),
                     HousePrice = ('numerical', 'label'))
    results.rm_metadata = meta_data
    
    return results