## Python Ingestion
The purpose of this notebook is to demonstrate how to create a custom `DataParser` that can parse CSV files into BTrDB streams. 

### Overview
Pgimport works by splitting the overall task of data ingestion into two processes. The first process is handled by `DataParsers`, which are responsible for locating files containing data to ingest and turning that data into `StreamData` objects. `StreamData` contains arrays of timestamps and values, as well as metadata (collection name, tags, annotations). `StreamData` objects are passed to `DataIngestors`, which are responsible for mapping `StreamData` objects to BTrDB streams (or creating a new stream if it doesn't exist yet) and inserting points.

This example uses `MyCSVParser`, which is an implementation of the `DataParser` interface. Most ingestions will require a custom implementation of the `DataParser` interface, because it will contain bespoke code to find and parse files that will almost definitely have unique formats/oddities. Writing a valid `DataParser` will be the responsibility of the user, whereas the `DataIngestor` should be suitable for all most cases.

In [1]:
import os
import btrdb

from pgimport.csv_parser import MyCSVParser
from pgimport.ingest import DataIngestor

In [2]:
# instantiate CSVParser with path for stream data and collection prefix
# NOTE: update with path to local data files
cp = MyCSVParser("../data/csv/", collection_prefix="test_ingest")

# locate files
files = cp.collect_files()
print(f"found {len(files)} files")

found 4 files


In [3]:
# Connect to BTrDB, instantiate ingestor and insert data
conn = btrdb.connect(profile=os.environ["BTRDB_PROFILE"])

ingestor = DataIngestor(conn)
for streams in cp.instantiate_streams(files):
    ingestor.ingest(streams)

