Add dataloader lib #5

DHopkinson-DI · 2025-07-21T15:45:34Z

Same concept as TorQ's/Kx dataloader script.
Tested for kdb+ 4.1 on Linux

Fixes #38

dataloader/dataloader.q

dataloader/dataloader.md

jonnypress · 2025-07-30T17:36:35Z

dataloader/dataloader.md

+
+
+This package is used for automated customisable dataloading and database creation and is a generalisation of http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles. 
+Load all delimeted files in a directory into memory in configurable chunk sizes then output the resulting tables to disk in kdb+ partiioned format.


partiioned - typo

I think this is underselling it a bit. It will not load it all into memory and then to disk. It will load the data chunk by chunk, so the aim is to mininimise memory usage. The memory usage for this should be related to the maximum of

the space required to load into memory and save one chunk of data

the memory required to sort the resultant table

So therefore we should be able to load large volumes of on-disk data using a relatively small memory footprint.

A couple of examples would be good:

loading very large files, but in small chunks

loading data across partitions from a number of small files e.g. if we had a month worth of AAPL data in one file, and a month of MSFT data in one file, I believe the way this is structured it would handle it relatively efficiently.

Added documentation highlighting the advantages of chunking

To account for the new import method, have removed the top level (.loader) namespacing. Also eliminated the second level .util namespacing as seemed superfluous. Unclear around how setting globals within a namespace from within a function will be impacted by new import changes so have changed how globals are set within the init function.

Account for removing of namespacing in q script; may need to change this again once mechanism for package importing becomes clearer

File which creates private namespace of functionality and exposes public interface

dataloader/dataloader.q

dataloader/init.q

dataloader/util.q

… code

jonathonmcmurray · 2025-10-27T09:38:33Z

dataloader/dataloader.q

-      flip loadparams[`headers]!(loadparams[`types];loadparams[`separator])0:rawdata]
+  / loads data in from delimited file, applies processing function, enumerates and writes to db
+  / NOTE: it is not trivial to check user has inputted headers correctly, assume they have
+  data:$[(`$"," vs rawdata 0)~loadparams`headers;                                               / check if first row matches headers provided


as per https://github.com/DataIntellectTech/kdbx-packages/blob/main/style.md please place comments on preceding line

unresolving this comment as comments are still in-line

dataloader/dataloader.q

DHopkinson-DI added 2 commits July 21, 2025 15:39

add dataloader.q and dataloader.md

6b4b5ad

update dataloader.q

4d0b7f3

DHopkinson-DI requested a review from jonathonmcmurray July 21, 2025 15:45

DHopkinson-DI self-assigned this Jul 21, 2025

jonathonmcmurray reviewed Jul 23, 2025

View reviewed changes

filter logic, remove block syntax, update init

79155e7

DHopkinson-DI requested a review from jonathonmcmurray July 23, 2025 15:47

jonathonmcmurray mentioned this pull request Jul 24, 2025

Package: Dataloader #38

Closed

jonathonmcmurray linked an issue Jul 24, 2025 that may be closed by this pull request

Package: Dataloader #38

Closed

jonnypress reviewed Jul 30, 2025

View reviewed changes

jamiechandler99 self-assigned this Jul 30, 2025

jamiechandler99 added 2 commits August 18, 2025 23:30

Update dataloader PR to align with style and comments. Add testing

732f40c

Add additional test cases, update doc

7703906

jamiechandler99 requested a review from jonnypress August 19, 2025 23:12

cstirling-dataintellect added 8 commits September 25, 2025 15:06

Remove namespacing

8dec401

Account for removing of namespacing in q script; may need to change this again once mechanism for package importing becomes clearer

Create init file for module

4fa96b7

File which creates private namespace of functionality and exposes public interface

Reference private namespace for globals

92cb1ec

Add separate util script for module

6215b0f

Update dataloader.q

f11c386

Update init.q

5dfc67f

Update util.q

fd625a7

jonathonmcmurray reviewed Oct 24, 2025

View reviewed changes

overhauled the dataloader package tests and tidy up/refactor a lot of…

8219c26

… code

jonathonmcmurray reviewed Oct 27, 2025

View reviewed changes

eliotrobinson added 3 commits October 28, 2025 15:15

update initailising the package and refactor variables

5c425ae

fix comments

b822cc7

fix comments again...

034a01f

jonathonmcmurray approved these changes Oct 29, 2025

View reviewed changes

eliotrobinson merged commit 93a1f03 into main Oct 29, 2025

eliotrobinson deleted the add_dataloader_lib branch October 29, 2025 11:44



		This package is used for automated customisable dataloading and database creation and is a generalisation of http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles.
		Load all delimeted files in a directory into memory in configurable chunk sizes then output the resulting tables to disk in kdb+ partiioned format.

Add dataloader lib #5

Add dataloader lib #5

Uh oh!

Conversation

DHopkinson-DI commented Jul 21, 2025 • edited by jonathonmcmurray Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonnypress Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

jamiechandler99 Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathonmcmurray Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jonathonmcmurray Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DHopkinson-DI commented Jul 21, 2025 •

edited by jonathonmcmurray

Loading

jonathonmcmurray Oct 28, 2025 •

edited

Loading