Skip to content

dwinkler1/JSONLines.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JSONLines

Stable Dev Build Status

A simple package to read (parts of) a JSON Lines files. The main purpose is to read files that are larger than memory. The two main functions are LineIndex and LineIterator which return an index of the rows in the given file and an iterator over the file, respectively. The LineIndex is Tables.jl compatible and can directly be piped into e.g. a DataFrame if every row in the result has the same schema (i.e. the same variables). See also materialize and columnwise. It allows memory-efficient loading of rows of a JSON Lines file. In order to select the rows skip and nrows can be used to index nrows rows after skipping skip rows. The file is mmaped and only the required rows are loaded into RAM. Files must contain a valid JSON object (denoted by {"String1":ELEMENT1, "String2":ELEMENT2, ...}) on each line. JSON parsing is done using the JSON3.jl package. Lines can be separated by \n or \r\n and some whitespace characters are allowed at the beginning of a line before the JSON object and the newline character (basically all that can be represented as a single UInt8). Typically a file would look like this:

{"name":"Daniel","organization":"IMSM"}
{"name":"Peter","organization":"StatMath"}

There is experimental support for JSON Arrays on each line where the first line after skip contains the names of the columns.

["name", "organization"]
["Daniel", "IMSM"]
["Peter", "StatMath]

This should work but is not tested thoroughly. Please report any usecase that is not working.

Getting Started

(@v1.5) pkg> add JSONLines