LDJSON Query

Query data from a large line delimited JSON files using JavaScript efficiently using node:stream. Not to be confused with JSON-LD (JSON linked date)

Description

Line delimitered JSON is a file with many valid JSON documents inside of it which are seperated by a simple line break. These files are useful when you have a lot of JSON data that may not fit into memory which need to be filtered and sorted.

This package will let you stream over a LDJSON file, running a query function in order to filter documents you do not need, keeping large files out of RAM allowing your service to remain fast and light. It does this by using Node read streams, filtering and transforming the payload and then writing to a new file using a Write stream.

Installing

To install this package, you can download it from your favourite package manager.

npm install @danm/ld-json-query
yarn add @danm/ld-json-query
pnpm add @danm/ld-json-query

Getting Started

Import

ESM
import ldjson from '@danm/ld-json-query';
CommonJS
const ldjson = require('@danm/ld-json-query);

API

Argument	Description
src	The source of where the file you want to process is found. This can be a valid S3 URL or a file system path
opts	Options
query	Query

Options

An object to pass into the 2nd opts argument

Property	Type	Optional	Default	Description
clearOld	boolean	true	false	If true, any old output will be deleted
inflate	boolean	true	false	If the file has been compressed, set this to true to decompress during the stream
deflate	boolean	true	false	Will compress the output of the stream
verbose	boolean	true	false	Print all logs, not just errors
output	string	false		The location of where the outputed file should be stored. The output currently doesn't support S3
aws	AWS S3 Config Object	required if using S3		If retrieving the file from S3, this object is required

Query

A function to pass into the 3rd argument

(payload: any) => payload: any | null

Example

File System

You can load a LDJSON file from the file system by passing in a file system path. The file is GZIP'd so you can pass in the inflate = true property to the option object.

The query function will filter out any JSON documents which do not match the 3 conditions. Return the object that you want to be added to the new output file filtered-output.ldjson. Return false will omit the object from the output.

import ldjson from '@danm/ld-json-query';

ldjson(
  './file.ldjson.gz',
  { inflate: true, output: './filtered-output.ldjson' },
  (json) => {
    if (json.app_name === 'news' && json.event_name === 'rm.play' && json.av_broadcasting_type === 'Live') return json;
    return null;
  },
)

S3

In addition to being able to load files from the file system, you can stream a file from S3. This does not download the file first, but streams it from AWS. If you are running this package on a system with a small amout of disk space or memory, neither will be compremised because nothing is downloaded to disk. The output is however written to the file system uncompressed.

When the function starts, any old files that that match the output file name will be cleared. Additionally, extra logging will be provided due to the verbose option.

import ldjson from '@danm/ld-json-query';

ldjson(
  's3://mybucket/my-file.ldjson.gz',
  { clearOld: true, verbose: true, inflate: true, output: './filtered-output.ldjson' },
  (json) => {
    if (json.app_name === 'news' && json.event_name === 'rm.play' && json.av_broadcasting_type === 'Live') return json;
    return null;
  },
)

Next steps

Tests
Write output to S3
Command line options
Additional cloud providers

PR's welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

src

src

.eslintrc.json

.eslintrc.json

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

Repository files navigation

LDJSON Query

Description

Installing

Getting Started

Import

API

Options

Query

Example

File System

S3

Next steps

About

Languages

License

danm/ld-json-query

Folders and files

Latest commit

History

Repository files navigation

LDJSON Query

Description

Installing

Getting Started

Import

API

Options

Query

Example

File System

S3

Next steps

About

Topics

Resources

License

Stars

Watchers

Forks

Languages