In [4]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

from pandas_ml_common import FeaturesLabels, stratified_random_splitter
from pandas_ml_utils import pd, np, FittingParameter
from pandas_ml_utils_test.config import DF_NOTES
import matplotlib.pyplot as plt

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The following data set contains variables to determine whether a Note is valid or not

In [5]:
DF_NOTES.tail()

Unnamed: 0,variance,skewness,kurtosis,entropy,authentic
1367,0.40614,1.3492,-1.4501,-0.55949,1
1368,-1.3887,-4.8773,6.4774,0.34179,1
1369,-3.7503,-13.4586,17.5932,-2.7771,1
1370,-3.5637,-8.3827,12.393,-1.2823,1
1371,-2.5419,-0.65804,2.6842,1.1952,1


Now lets estimate which features might be useful to predict the the label whether a note is authentic (1) or not (0).
But before ew do that we add some redundancy and some random data. The feature selection should obviously be able to
get rid of such useless data. Since we do not know if the data is sorted in some way we use a
`stratified_random_splitter` to make sure that we have equally many instance of each class in the training and test set.

In [6]:
# make experiment reproducible
np.random.seed(42)

In [7]:
with DF_NOTES.model("notes-$V.model") as m:  # note the $V makro allows to save multiple versions of the model
    from pandas_ml_utils import FittableModel, SkModelProvider, FeaturesLabels, ClassificationSummary, stratified_random_splitter
    from sklearn.neural_network import MLPClassifier
    
    network_size = 10
    nr_layers = 2

    hidden = [int(network_size ** (1 / float(nr_layers)))] * nr_layers
    print("hidden size", hidden)

    fit = m.fit(
        FittableModel(
            SkModelProvider(MLPClassifier(hidden_layer_sizes=hidden, activation='tanh')),
            FeaturesLabels(
                features=["variance", "skewness", "kurtosis"],
                labels=["authentic"]
            ),
            summary_provider=ClassificationSummary
        ),
        FittingParameter(splitter=stratified_random_splitter(0.25))
    )

fit

exc_type: <class 'NameError'>
exc_value: name 'res' is not defined
exc_traceback: <traceback object at 0x7f31b21ae7c0>


NameError: name 'res' is not defined

Here we are with a nicely fitted model :tada:

We should also note that our model has been saved. So we can create any new script, app, etc. and just load it:
```python
from pandas_ml_utils import Model

df = load_a_notes_data_frame()
model = Model.load(f"{fit.model.file_name}")
prediction = df.model.predict(model)
```