# Local Flatline Interpreter

In [None]:
from flatline.interpreter import Interpreter

We create a new local interpreter, that will use *nodejs* under the rug

In [None]:
interpreter = Interpreter()

## Available functions

We can query the interpreter for all the built-in functions provided by flatline

In [None]:
interpreter.defined_functions()

## Checking symbolic expressions

The interpreter can check for us whether a Lisp or JSON s-expression is correct.

### Valid constant expressions

Lisp s-expressions are represented as strings:

In [None]:
interpreter.check_lisp('(+ 1 2)')

JSON expressions are represented as Python lists of native values

In [None]:
interpreter.check_json(["+", ["*", 3, 5]])

### Some erroneous symbolic expressions

In [None]:
interpreter.check_lisp('(+ 2')

In [None]:
interpreter.check_json(["non-existent", 3])

In [None]:
interpreter.check_json(["+", 1, "3"])

In [None]:
interpreter.check_lisp('(f 0)')

### Checking expressions that depend on input dataset fields

The latest sexp was invalid because no dataset is known, and hence there's no "field 0".

Let's create a mock dataset to tell the interpreter what are our fields:

In [None]:
mock_dataset = {'dataset':{'fields': Interpreter.infer_fields([1, 'a'])}}
mock_dataset['dataset']['fields']

Now the checks referring to those fields will pass:

In [None]:
interpreter.check_lisp('(field 0)', dataset=mock_dataset)

In [None]:
interpreter.check_json(["f", "000001"], dataset=mock_dataset)

Note how the two last expressions have no associated value, because they depend on the concrete input rows to which they're applied (i.e., these expressions do not represent constant values).

## Applying symbolic expressions

We can apply valid symbolic expressions to local rows represented as lists of native Python values:

In [None]:
test_rows = [[1, 'a'], [2, 'b'], [23, 'd']]
interpreter.apply_lisp('(fields 1 0)', test_rows)

In [None]:
interpreter.apply_lisp('(list (+ 2 (f 0)) (- (f 0) (f 0 -1)))', test_rows)

In [None]:
interpreter.apply_json(["window", "000001", -1, 1], test_rows)

In these examples, the field characteristics are guessed from the given values.  Guessing is useful for quick tests, but in real cases we should provide real dataset metadata to the apply functions.

# Extended example using remote resources

In [None]:
from bigml.api import BigML
from flatline.sampler import Sampler

In [None]:
api = BigML()

We start by creating a dataset from Quandl's dataset on Apple NASDAQ

In [None]:
source = api.create_source('https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv', {'name':'Flatline tests'})
api.ok(source)

In [None]:
dataset = api.create_dataset(source)
dataset_id = dataset['resource']
api.ok(dataset)

And download a sample of its rows locally, using a *Sampler* object

In [None]:
sampler = Sampler()

*Sampler*, like *Interpreter* are abstractions above the building blocks provided by the API bindings, and take care internally of waiting for resource completion and other housekeeping (that's why we don't need `api.ok()` calls here).

In [None]:
sampler.take_sample(dataset_id, size=5)

These are the rows that we have downloaded locally (plus all the associated metadata)

In [None]:
sampler.rows()

The sampler also keeps information on the dataset and sample metadata; e.g. the field descriptors:

In [None]:
[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]

Now we can apply locally Flatline expressions and check whether they produce sensible results.  

For instance, we could normalize **Low**, **High** and **Volume**, dividing them by their mean value in the original dataset.  

Let's define an auxiliary function to generate the corresponding Flatline JSON s-expressions:

In [None]:
def norm_field(name):
    return ["/", ["field", name], ["abs", ["mean", name]]]

norm_field('High')

We can use the interpreter to check the format and syntax of our generated code:

In [None]:
def print_as_lisp(json_sexp):
    print interpreter.json_to_lisp(json_sexp)
    
print_as_lisp(norm_field('Low'))

To generate more than one value, we wrap the list of field expressions in a `list` form:

In [None]:
def make_list(*fields):
    res = ['list']
    res.extend(fields)
    return res
    
norm_fields = make_list(norm_field('Low'), norm_field('High'), norm_field('Volume'))
print_as_lisp(norm_fields)

And now let's check that the syntax is in fact correct:

In [None]:
interpreter.check_json(norm_fields, dataset['object'])

Our lisp expression seems correct, and produces three numeric values.  We can apply it to our sample rows and confirm that the outputs are in fact what we expect:

In [None]:
sampler.apply_json(norm_fields)

Looks good so far.  Let's say we want to predict whether the stock will go up or down based on the Open and Close values of the **previous day** and today's Open value.  We can access the value of a previous row with `(field name -1)`:

In [None]:
def previous_day(name):
    return ["field", name, -1]

open_close_fields = make_list(previous_day('Open'), 
                              previous_day('Close'))

print_as_lisp(open_close_fields)

Let's check it's a good Flatline expression and see how it works on our local sample:

In [None]:
interpreter.check_json(open_close_fields, dataset=dataset['object'])

In [None]:
sampler.apply_json(open_close_fields)

Note how the entries for the previous day Open and Close values are `None` in the first row, since there's no previous day!

Finally, let's define our objective field, **UpOrDown**:

In [None]:
up_or_down = '(if (> (f "Open") (f "Close")) "down" "up")'
interpreter.check_lisp(up_or_down, dataset=dataset['object'])

In [None]:
sampler.apply_lisp(up_or_down)

Once we're happy with our transformations, we ask BigML to create the new fields over the entire dataset

In [None]:
norm_fields_sexp = interpreter.json_to_lisp(norm_fields)
open_close_sexp = interpreter.json_to_lisp(open_close_fields)

extended_dataset = api.create_dataset(dataset, {'new_fields':[{'field':norm_fields_sexp, 'names':['NLow', 'NHigh', 'NVol']},
                                                              {'field':open_close_sexp, 'names':['Open-1', 'Close-1']},
                                                              {'field':up_or_down, 'name': 'Up or down'}]})
api.ok(extended_dataset)

and we confirm that the new dataset has indeed the new columns:

In [None]:
sampler.take_sample(extended_dataset['resource'], size=3)
[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]

In [None]:
sampler.rows()