# Protobag Demo

This notebook shows:
  * How to write Protobuf messsages to a protobag zip archive
  * How to read those messages back
  * How to convert protobag zip archives to and from Pandas Dataframes and Parquet tables
  
If you only have 30 seconds, hit "run all cells" and try scrolling to these sections:
 * [Writing Plain Protobuf Messages](#write_plain)
 * [Reading Plain Protobuf Messages](todo anchor link)
 * [Converting a Protobag to a Pandas Dataframe](todo anchor link)

`protobag` archives are just zip archives that contain string-serialized Protobuf messages. (Tar and other formats are also supported via [libarchive](https://github.com/libarchive/libarchive)).  So `protobag` offers a simple API for writing messages to an archive:

## Environment Set-up

To run this notebook locally, try using the protobag-demo dockerized environment:

`docker run -it --rm --net=host protobag/demo:0.0.3 jupyter notebook --allow-root`
TODO: build this and push it

**Google Colab** You can also [run this notebook in Google Colab](https://colab.sandbox.google.com/github/StandardCyborg/protobag/blob/master/examples/protobag-parquet/protobag-demo-full.ipynb) TODO FIX LINK. In the Colab environment, you'll need to install `protobag` and some other dependencies. Running the cell below will take care of that for you. You might need to restart the runtime (Use the menu option: Runtime > Restart runtime ...) in order for Colab to recognize the new modules.

In [1]:
import os
import sys
if os.path.exists('/opt/protobag'):
    print("We're running in the dockerized environment! We can simply add protobag to the PYTHONPATH")
    if not os.path.exists('/opt/protobag/python/protobag/protobag_native.cpython-36m-x86_64-linux-gnu.so'):
        # We need to build the `protobag_native` module.  The easiest way is to:
        !cd /opt/protobag/python && python3 setup.py test
    sys.path.append('/opt/protobag/python/')
    print("Protobag added to PYTHONPATH")
elif 'google.colab' in sys.modules:
    !pip install protobag
    print("Protobag installed into Colab runtime environment.  You might need to restart the runtime")

import protobag
print("Using protobag version %s at %s" % (protobag.__version__, protobag.__file__))

We're running in the dockerized environment! We can simply add protobag to the PYTHONPATH
Protobag added to PYTHONPATH
Using protobag version 0.0.3 at /opt/protobag/python/protobag/__init__.py


## Our Protobuf Messages

Suppose we've developed a cool game called Dino Hunters where people run around on a deserted island and try to capture wild dinosaurs.  We're using Protobuf to persist data about the `DinoHunter` characters in our game; furthermore, we want to use Protobuf to log the 2D `Position`s of our hunters as the run around and hunt dinos.  Our message schema is as follows:

```protobuf
syntax = "proto3";

package my_messages;

message DinoHunter {
  string first_name = 1;
  int32 id = 2;
  map<string, string> attribs = 3;

  enum DinoType {
    IDK = 0;
    VEGGIESAURUS = 1;
    MEATIESAURUS = 2;
    PEOPLEEATINGSAURUS = 3;
  }

  message Dino {
    string name = 1;
    DinoType type = 2;
  }

  repeated Dino dinos = 4;
}

message Position {
  float x = 1;
  float y = 2;
}
```

We now need the `protoc`-generated Python code in order to use these messages.  For convenience, we'll just download a copy posted in the `protobag` repo:

In [2]:
if not os.path.exists('MyMessages_pb2.py'):
    !wget http://fixme/this/path

from MyMessages_pb2 import DinoHunter
from MyMessages_pb2 import Position

## Note: you can prove to yourself that the downloaded file matches the schema above using the code below:
# import MyMessages_pb2
# from google.protobuf.descriptor_pb2 import FileDescriptorProto
# fd = FileDescriptorProto()
# MyMessages_pb2.DESCRIPTOR.CopyToProto(fd)
# print(fd)

OK! Let's create some hunters:

In [3]:
max_hunter = DinoHunter(
      first_name='py_max',
      id=1,
      dinos=[
        {'name': 'py_nibbles', 'type': DinoHunter.PEOPLEEATINGSAURUS},
      ])
print(max_hunter)

lara_hunter = DinoHunter(
      first_name='py_lara',
      id=2,
      dinos=[
        {'name': 'py_bites', 'type': DinoHunter.PEOPLEEATINGSAURUS},
        {'name': 'py_stinky', 'type': DinoHunter.VEGGIESAURUS},
      ])

print(lara_hunter)

first_name: "py_max"
id: 1
dinos {
  name: "py_nibbles"
  type: PEOPLEEATINGSAURUS
}

first_name: "py_lara"
id: 2
dinos {
  name: "py_bites"
  type: PEOPLEEATINGSAURUS
}
dinos {
  name: "py_stinky"
  type: VEGGIESAURUS
}



## Writing and Reading Protobuf messages to a protobag
<a id='write_plain'></a>
### Plain Messages (`protobag.MessageEntry`)

`protobag` archives are just zip archives that contain string-serialized Protobuf messages. (Tar and other formats are also supported via [libarchive](https://github.com/libarchive/libarchive)).  So `protobag` offers a simple API for **writing** messages to an archive:

In [4]:
bag = protobag.Protobag(path='example.zip')
writer = bag.create_writer()
writer.write_msg("hunters/py_max", max_hunter)
writer.write_msg("hunters/py_lara", lara_hunter)
writer.close()

You can verify that the above just wrote a zip archive for you:

In [5]:
!which unzip > /dev/null || (apt-get update && apt-get install unzip)
!unzip -l example.zip

Archive:  example.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       72  1980-01-01 00:00   hunters/py_max
       86  1980-01-01 00:00   hunters/py_lara
     8691  1980-01-01 00:00   /_protobag_index/bag_index/1595357957.0.stampedmsg.protobin
---------                     -------
     8849                     3 files


Hmm, what's that `/_protobag_index/bag_index/xxxxx.stampedmsg.protobin` file?

By default, `protobag` not only saves those messages but also **indexes Protobuf message descriptors** so that your `protobag` readers don't need your proto schemas to decode your messages.  (You can also disable this indexing if you wish.  For further discussion, see [the root README.md](todo link)


To **read** specific messages from a `protobag` archive, you can use this simple API:

In [6]:
bag = protobag.Protobag(
        path='example.zip',
        
        # Tell protobag to use our protoc-generated python code:
        msg_classes=(DinoHunter, Position))
entry = bag.get_entry("hunters/py_max")
print(entry)

MessageEntry:
  entryname: hunters/py_max
  type_url: type.googleapis.com/my_messages.DinoHunter
  has serdes: True
  has descriptor_data: False
  msg:
first_name: "py_max"
id: 1
dinos {
  name: "py_nibbles"
  type: PEOPLEEATINGSAURUS
}



### Time-Series Data (`protobag.StampedEntry`)
`protobag` features a **topic-timestamp-message** API for recording time-series data.  This API is modeled after [`rosbag`](http://wiki.ros.org/rosbag), [LCM log files](https://lcm-proj.github.io/log_file_format.html) (where topics are called "channels"), and your favorite message bus systems like Kafka or AWS SQS.  Each **topic** has Protobuf messages of a single type, and each message has a nanosecond-precision timestamp (using the `google.protobuf.Timestamp` object, which has built-in conversion to other timestamp datastructures like Python `datetime`s).  

Protobag has special handling for these timestamped entries:
  * For writing:
      * Topics organized into archive "folders" and filenames are chosen automatically.
      * Protobag indexes Protobuf Message Descriptors as described above.
      * Protobag indexes the messages for efficient time-ordered playback.
  * For reading:
     * Protobag offers a simple [Selection](fixme link) API for reading specific sets of topics, time ranges, or even just individual events.
     * Protobag offers a [`TimeSync`](fixme link) (in C++ FIXME add python API) for synchronizing topics that have messages recorded at different rates.  
  

Using our Dino Hunters example, we'll log (**write**) the 2D positions of a dino and a hunter during a chase scene:

In [7]:
bag = protobag.Protobag(path='example.zip')
writer = bag.create_writer()
for t in range(10):
    lara_pos = Position(x=t, y=t+1)
    writer.write_stamped_msg("positions/lara", lara_pos, t_sec=t)

    toofz_pos = Position(x=t+2, y=t+3)
    writer.write_stamped_msg("positions/toofz", toofz_pos, t_sec=t)
writer.close()

We can now **read** them using the `protobag.SelectionBuilder` helper tool:

In [12]:
bag = protobag.Protobag(
        path='example.zip',
        
        # Tell protobag to use our protoc-generated python code:
        msg_classes=(DinoHunter, Position))

print("Read just the positions of toofz:")
sel_toofz = protobag.SelectionBuilder.select_window(topics=["positions/toofz"])
for entry in bag.iter_entries(selection=sel_toofz):
    print("Time: %s Position: %s %s" % (entry.timestamp.ToDatetime(), entry.msg.x, entry.msg.y))
print()
print()
    
print("Read *all* timeseries data:")
sel_all_time_series = protobag.SelectionBuilder.select_window()
for entry in bag.iter_entries(selection=sel_all_time_series):
    print("Topic: %s Time: %s Position: %s %s" % (entry.topic, entry.timestamp.ToDatetime(), entry.msg.x, entry.msg.y))

Read just the positions of toofz:
Time: 1970-01-01 00:00:00 Position: 2.0 3.0
Time: 1970-01-01 00:00:01 Position: 3.0 4.0
Time: 1970-01-01 00:00:02 Position: 4.0 5.0
Time: 1970-01-01 00:00:03 Position: 5.0 6.0
Time: 1970-01-01 00:00:04 Position: 6.0 7.0
Time: 1970-01-01 00:00:05 Position: 7.0 8.0
Time: 1970-01-01 00:00:06 Position: 8.0 9.0
Time: 1970-01-01 00:00:07 Position: 9.0 10.0
Time: 1970-01-01 00:00:08 Position: 10.0 11.0
Time: 1970-01-01 00:00:09 Position: 11.0 12.0


Read *all* timeseries data:
Topic: positions/lara Time: 1970-01-01 00:00:00 Position: 0.0 1.0
Topic: positions/toofz Time: 1970-01-01 00:00:00 Position: 2.0 3.0
Topic: positions/lara Time: 1970-01-01 00:00:01 Position: 1.0 2.0
Topic: positions/toofz Time: 1970-01-01 00:00:01 Position: 3.0 4.0
Topic: positions/lara Time: 1970-01-01 00:00:02 Position: 2.0 3.0
Topic: positions/toofz Time: 1970-01-01 00:00:02 Position: 4.0 5.0
Topic: positions/lara Time: 1970-01-01 00:00:03 Position: 3.0 4.0
Topic: positions/toofz Tim

### Raw Data (`protobag.RawEntry`)
We can also just put raw data like text files or images into our protobag archive because it's just a zip file.  The C++ `protobag` API even includes an [`ArchiveUtil.hpp` module](fixme) TODO FIXME LINK that has helper functions for common archive operations like zipping a directory or unarchiving a tar file.  The raw write API makes `protobag` skip all indexing and type-tracking activity.