-
Notifications
You must be signed in to change notification settings - Fork 77
Closed

Description
Loading a large set of data in batches results in 100% RAM utilization. The xmltodict
in combination with Arango seems to be an issue. If I remove the Arango part it works, if I remove the xmltodict
it also works.
Not sure if this an Arango, Python driver, xmltodict
or my own code issue..
def load_datadb():
sirp = db.arangodb().connectdb()
batch = sirp.batch(return_result=False)
mypath = "./data_xml/oct-datadb/november/"
for root, dirs, filenames in os.walk(mypath):
count = + 1
if (count == 5):
count = 0
batch.commit()
for f in filenames:
print(f)
file = open(os.path.join(root, f), 'rb')
xml = xmltodict.parse(file)
data = {
"_key": hashlib.sha1(json.dumps(xml).encode()).hexdigest(),
"id": hashlib.sha1(json.dumps(xml).encode()).hexdigest(),
"type": "nieuw-data",
"data": {
"meta": {
"source-time": str(datetime.datetime.utcnow()),
"source": f
},
"payload": xml
}
}
file.close()
batch.collection('source').insert(data)
data = None
Metadata
Metadata
Assignees
Labels
No labels