# Eventador example data science workflows

The Eventador Platform helps you manage data for your data science work from simple analytics to machine learning models. In this example we will perform data management tasks and some basic data science tasks to demonstrate various capabilities of the platform.

## Connecting to, defining, and organizing feeds of data in Apache Kafka

Let's first connect to the system, and inspect some Kafka topics/tables and see the datatypes we can work with. This works/looks just like the database you are used to using.

In [7]:
import eventador_python as ev
import pandas as pd


config={'auth': {'username': 'myusername', 'password': 'xxxx'}}
e = ev.EventadorQuery()
e.login(config['auth']['username'], config['auth']['password'])

Getting CSRF token..
Login successful, user is: kgorman


In [2]:
e.command("show tables")

{
    [38;2;0;128;0;01m"table_name"[39;00m: [38;2;186;33;33m"JMO_IN"[39m,
    [38;2;0;128;0;01m"flavor"[39;00m: [38;2;186;33;33m"source"[39m,
    [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"kafka"[39m,
    [38;2;0;128;0;01m"dtcreated"[39;00m: [38;2;186;33;33m"2019-11-19 21:26:45.082614"[39m,
    [38;2;0;128;0;01m"id"[39;00m: [38;2;102;102;102m738[39m
}

{
    [38;2;0;128;0;01m"table_name"[39;00m: [38;2;186;33;33m"JMO_IN_BREAK"[39m,
    [38;2;0;128;0;01m"flavor"[39;00m: [38;2;186;33;33m"source"[39m,
    [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"kafka"[39m,
    [38;2;0;128;0;01m"dtcreated"[39;00m: [38;2;186;33;33m"2020-01-09 21:47:21.463287"[39m,
    [38;2;0;128;0;01m"id"[39;00m: [38;2;102;102;102m1193[39m
}

{
    [38;2;0;128;0;01m"table_name"[39;00m: [38;2;186;33;33m"adsbx"[39m,
    [38;2;0;128;0;01m"flavor"[39;00m: [38;2;186;33;33m"source"[39m,
    [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"kafka"[39m,
    [38;2;0;

In [3]:
e.command("desc authorizations")

{
    [38;2;0;128;0;01m"fields"[39;00m: [
        {
            [38;2;0;128;0;01m"doc"[39;00m: [38;2;186;33;33m"Type inferred from '706'"[39m,
            [38;2;0;128;0;01m"name"[39;00m: [38;2;186;33;33m"userid"[39m,
            [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"long"[39m
        },
        {
            [38;2;0;128;0;01m"doc"[39;00m: [38;2;186;33;33m"Type inferred from '21305'"[39m,
            [38;2;0;128;0;01m"name"[39;00m: [38;2;186;33;33m"amount"[39m,
            [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"long"[39m
        },
        {
            [38;2;0;128;0;01m"doc"[39;00m: [38;2;186;33;33m"Type inferred from '31.085854'"[39m,
            [38;2;0;128;0;01m"name"[39;00m: [38;2;186;33;33m"lat"[39m,
            [38;2;0;128;0;01m"type"[39;00m: [38;2;186;33;33m"double"[39m
        },
        {
            [38;2;0;128;0;01m"doc"[39;00m: [38;2;186;33;33m"Type inferred from '-112.757024'"[39m,
            [38;2;0;128;0;01

The schema for authorizations is inferred from the topic. We can write some queries against it. The SQL is launched as a *continuous SQL* job that is always running and keeping a view up to date with the latest versions of the data. This view is exposed as a REST endpoint we can use anywhere we need data - from exploration, training, testing, or in production.

This is extremely high performance, we don't wait for results from the data, it's continuously fed to the view and we can access the view as we need to in our models.

If we get the SQL wrong then it gives us instant feedback, we can iterate, we don't need to wait for a runtime failure to happen. You can also use the https://eventador.cloud interface to author and launch jobs.

In [12]:
e.command("set cluster 7.0.0-pre3")
sql = """select userid, lat, lon, amount, card from authorizations where amount > 10000 and card is not null"""
e.query(sql, "kg_job3")

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

We can now use the results of that continuously running SQL job in Pandas via the materialized view REST endpoint.

In [None]:
df = pd.read_json()