JSONSki_python is the Python binding port for JSONSki
JSONSki is a streaming JSONPath processor with fast-forward functionality. During the streaming, it can automatically fast-forward over certain JSON substructures that are irrelavent to the query evaluation, without parsing them in detail. To make the fast-forward efficient, JSONSki features a highly bit-parallel solution that intensively utilizes bitwise and SIMD operations that are prevelent on modern CPUs to implement the fast-forward APIs.
You can download the PIP package from here - https://pypi.org/project/JSONSki/
pip install JSONSki
import jsonski as jski
print(jski.loadSingleRecord("$.features[150].actor.login", "datasets/test.json"))
- We interface the following method:
jski.loadSingleRecord(args1, args2) //args1 - String(query) and args2 - String(file_location)
- CPUs: 64-bit ALU instructions, 256-bit SIMD instruction set, and the carry-less multiplication instruction (pclmulqdq)
- Operating System: Linux, MacOs (Intel Chips only)
- C++ Compiler: g++ (7.4.0 or higher)
Before starting to use JSONSki-API you need to assure you have the following prerequisites:
-
Python (v3.7) see: Installing Python
-
C++ : g++ (v7.4.0 and above) see: Installing C++
JSONPath is the basic query language of JSON data. It refers to substructures of JSON data in a similar way as XPath queries are used for XML data. For the details of JSONPath syntax, please refer to Stefan Goessner's article.
| Operator | Description |
|---|---|
$ |
root object |
. |
child object |
[] |
child array |
* |
wildcard, all objects or array members |
[index] |
array index |
[start:end] |
array slice operator |
Consider a piece of geo-referenced tweet in JSON
{
"coordinates": [
40.74118764, -73.9998279
],
"user": {
"id": 6253282
},
"place": {
"name": "Manhattan",
"bounding_box": {
"type": "Ploygon",
"pos": [
[-74.026675, 40.683935],
......
]
}
}
}| JsonPath | Result |
|---|---|
$.coordinates[*] |
all coordinates |
$.place.name |
place name |
$.place.bounding_box.pos[0] |
first position of the bounding box in place |
$.place.bounding_box.pos[0:2] |
first two positions of the bounding box in place |
JSONski is a powerful and user-friendly API designed to streamline data handling and processing tasks, particularly when dealing with JSON-based data. It offers a range of functions to manipulate and manage data, making complex operations effortless. Two key functions within the JSONski API are jski.loadSingleRecord and jski.loadRecords:
-
loadSingleRecord(args1, args2) //args1 - String(query) and args2 - String(file_location): loads the whole input file as one single record (allow newlines in strings and other legal places). -
loadRecords(args1, args2) //args1 - String(query) and args2 - String(file_location): loads multiple records from the input file (all newlines are treated as delimiters; no newlines (except for\nand\rin JSON strings) are allowed within a record);.
Below is an example usage of JSONSki pip package.
#JSONSki
import jsonski as jski
import time
start_time = time.time()
print(jski.loadSingleRecord("$[*].entities.urls[*].url","./JSONSki/dataset/twitter_sample_large_record.json"))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed jsonski time:", elapsed_time, "seconds")
#Python`s inbuilt JSON parser
import json as j
start_time = time.time()
def parse_json_file(file_path):
with open(file_path, 'r') as file:
json_data = j.load(file)
return json_data
json_file_path = './JSONSki/dataset/twitter_sample_large_record.json'
print(parse_json_file(json_file_path))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed default_python_json time:", elapsed_time, "seconds")
- Note: The code snippet above benchmarks performance for JSONSki parsing VS Python in-built parsing.
[1] Lin Jiang and Zhijia Zhao. JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022.
@inproceedings{jsonski,
title={JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding},
author={Lin Jiang and Zhijia Zhao},
booktitle={Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
year={2022}
}
