Refactor codebase to use independent module to parse incoming HTTP requests #618

vkuznet · 2020-01-16T20:31:37Z

This PR tries to address issues with large memory footprint in DBS server, see full discussion in #599

The code is refactored in the following way:

I replaced a common pattern used in DBSWriterModel.py, see

-            body = request.body.read()
-            indata = cjson.decode(body)
+            indata = parseFileObject(request.body, method='cjson')

I provided new parsers.py module which implement different (de)serialization methods, e.g. cjson, json, json_stream, yaml
the new module allows to write custom serialization of input data streams, e.g. we can write custom C-module to optimize serialization of incoming data
I provided test example to tests different formats, e.g.

# use cjson format, and run tests 3 times
Server/Python/src/dbs/utils/parsers.py --fin=blocks.json --format=cjson --times=3
# use cjson format, and run tests 3 times
Server/Python/src/dbs/utils/parsers.py --fin=blocks.json --format=json --times=3
# use json_stream format, and run tests 3 times
Server/Python/src/dbs/utils/parsers.py --fin=blocks.json_stream --format=json_stream --times=3
# use yaml format, and run tests 3 times
Server/Python/src/dbs/utils/parsers.py --fin=blocks.yaml --format=yaml --times=3

The parser module provides helper function to convert JSON into json_stream format (discussed in DBS API BulkBlock input size control #599)

Due to dynamic nature of python memory allocation it is hard to evaluate an impact of particular format on long running DBSServer, but this PR will allow to easily switch and tests usage of different formats. But to do that the clients which will interact with DBS server will need to send data in proper format, e.g. in json_stream, such that we can measure memory footprint of DBS server in that case.

The provided convert2json_stream function allows to convert either given json (dict) object or file object which contains json data stream, e.g.

# example how to convert json to json_stream
from dbs.utils.parsers import convert2json_stream
import json
data={"data":1, "foo":[1,2,3]}
convert2json_stream(data)
# this will produce the following output
{
"foo"
:
[1
, 2
, 3
]
,
"data"
:
1
}
# if you want to write this output to output file you will do
obj= open('YOUR_FILE_NAME', 'w')
convert2json_stream(data, obj)

# similar if you do have file object which contains json stream you may use it
fobj = open('YOUR_FILE.json')
convert2json_stream(fobj)

# similarly I provide convert2yaml function which can convert given JSON to YAML
data={"data":1, "foo":[1,2,3]}
print(convert2json_stream(data))
data: 1
foo:
- 1
- 2
- 3

With this module we can perform various test on DBS server using different input data formats.

…quests

yuyiguo · 2020-01-16T21:32:59Z

Valentin,

I will look into this after I have DBS partition is done. It may take a few months.

vkuznet added 2 commits January 16, 2020 14:51

Refactor codebase to use independent module to parse incoming HTTP re…

42795d4

…quests

Add default value for number of tests to run

3319874

vkuznet mentioned this pull request Jan 16, 2020

DBS API BulkBlock input size control #599

Open

Fix yaml conversion

3bf71ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor codebase to use independent module to parse incoming HTTP requests #618

Refactor codebase to use independent module to parse incoming HTTP requests #618

vkuznet commented Jan 16, 2020

yuyiguo commented Jan 16, 2020

Refactor codebase to use independent module to parse incoming HTTP requests #618

Are you sure you want to change the base?

Refactor codebase to use independent module to parse incoming HTTP requests #618

Conversation

vkuznet commented Jan 16, 2020

yuyiguo commented Jan 16, 2020