Skip to content

Extreme RAM usage #65

@ghost

Description

Loading a large set of data in batches results in 100% RAM utilization. The xmltodict in combination with Arango seems to be an issue. If I remove the Arango part it works, if I remove the xmltodict it also works.

Not sure if this an Arango, Python driver, xmltodict or my own code issue..

def load_datadb():
    sirp = db.arangodb().connectdb()
    batch = sirp.batch(return_result=False)

    mypath = "./data_xml/oct-datadb/november/"
    for root, dirs, filenames in os.walk(mypath):

        count = + 1
        if (count == 5):
            count = 0
            batch.commit()

        for f in filenames:
            print(f)
            file = open(os.path.join(root, f), 'rb')
            xml = xmltodict.parse(file)
            data = {
                "_key": hashlib.sha1(json.dumps(xml).encode()).hexdigest(),
                "id": hashlib.sha1(json.dumps(xml).encode()).hexdigest(),
                "type": "nieuw-data",
                "data": {
                    "meta": {
                        "source-time": str(datetime.datetime.utcnow()),
                        "source": f
                    },
                    "payload": xml
                }
            }

            file.close()
            batch.collection('source').insert(data)
            data = None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions