Skip to content

Transform Adobe Audience Manager Data Feed log files into Newline Delimited Json files fore easier ingestion and lighter digestion

License

Notifications You must be signed in to change notification settings

divisadero/aam-cdf-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AAM CDF Parser

package version License: MIT dependencies

Here you have a tool for parsing Adobe Audience Manager CDF files. Those are the log files from the DMP platform from Adobe. They use a particular separation notation to represent in a line what is indeed a hierarchical structure.

Motivation

In te process of ingesting this data into a tool like BigQuery or any data science platform we, at Divisadero, needed to transform this data into something accessible.

The easiest way of preserving the whole line/log structure without loosing any data seemed to be another line/log structure, but easier to access.

Installation

Simply use your preffered node package manager to add it:

yarn add aam-cdf-parser

Local tryout

A small demonstration script is provided to checkout how it works in general lines. It can be found in cmd.js. You can invoke it with:

./cmd.js input.gz output.ndj

Create tables

We provide a small script for creating the table in BigQuery with the current schema (defined in schema.json). To create the table just load the schema on the Web UI or call the script from te terminal like so:

./mktable.js my_dataset

This creates in BigQuery a partitioned table (by event time, not ingestion) so you can insert generated files directly.

Usage

Method parse

parse(in: InputStream, out: OutputStream): Stream

It has several convenience methods/wrappers arround the main parse method. Which is the primitive method, and the core of the library. It chains several stream transformations and returns the last one (just in case you want to keep on chaining).

const {parse} = require('aam-cdf-parser');
// ...
const input // = some.method.to.get.an.inputStream();
const output // = some.method.to.get.an.outputStream();
const onFinish = () => {console.log('done')};
parse(input, output).on('finish', onFinish);

Method promiseParse

promiseParse(in: InputStream, out: OutputStream): Promise<boolean>

Import it into your code either with require or Import

const {promiseParser} = require('aam-cdf-parser');
// ...
const input // = some.method.to.get.an.inputStream();
const output // = some.method.to.get.an.outputStream();
const onFinish = () => {console.log('done')};
promiseParser(input, output).then(onFinish);

Method callbackParse

callbackParse(in: InputStream, out: OutputStream, callback: Function)

Import it into your code either with require or Import

const {callbackParse} = require('aam-cdf-parser');
// ...
const input // = some.method.to.get.an.inputStream();
const output // = some.method.to.get.an.outputStream();
const onFinish = () => {console.log('done')};
callbackParse(input, output, onFinish);

Method local

local(in: String, out: String, callback: Function)

Import it into your code either with require or Import

const {callbackParse} = require('aam-cdf-parser');
// ...
const input = 'my-input-cdf-file.gz';
const output = 'my-output-file.json';
const onFinish = () => {console.log('done')};
local(input, output).then(onFinish);

About

Transform Adobe Audience Manager Data Feed log files into Newline Delimited Json files fore easier ingestion and lighter digestion

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published