Skip to content

Csv loading and dictionary

Daniel Gregoire edited this page Feb 5, 2024 · 12 revisions

[Is this really about csv loading? Where is the csv loading function?

Q0. How do you load a delimited text file?

A0. Here is the documentation from \: related to loading delimited text, which includes CSV:

Load delimited text file (no column names):
(types;delim)0:`f
Load delimited text file (with column names):
(types;,delim)0:`f
c 1-byte char, b 1-byte int, s 2-byte int, i 4-byte int, f 4-byte float,
d 8-byte double, " " 1-byte skip, I int, F float, C string, S string (sym), DTZMm Y?

Q1. what does csv loader return

A1. The csv loader returns K list Z containing 2 elements :

   - h the header which is a vector of symbols (i.e. its K type is -4)

   - c the cols, as a list of vectors of possibly different types, but the same size

The caller can make a dictionary out of this using operator "!"

d: h!c

Q2. how do I create a dict and what type is it

A2. creating a dict needs a vector of symbols (col header h) and list of vectors (the column data)

h: `name`elo`age								  / vector of 3 col headers

c: (`Dent`Beeblebrox`Superman`Prefect;  1100 1600 3000 1800;  32 35 42 23)    / list of 3 vectors, each size 4

d: h!c				    // or you can do it literally as below

d:`name`elo`age!(`Dent`Beeblebrox`Superman`Prefect;1100 1600 3000 1800; 32 35 42 23)   / dict of h to c

But note that the dict, once it is created, has a different structure described below.

Also note that the creation of dictionary does NOT make copies of the vectors holding data of each column.

It just increases the refcount on those vectors.

In that sense, creating a table is quite efficient.

The type for dict is 5, i.e. t=5 in its K object.

For the above example, n=3 in its K object which tells code that 3 columns exist.

The K* pointer points to a list containing 3 K objects.

Each object (call it C), representing a column, is a 2 element list.

The first element of C is always a single symbol, the column header (i.e. (t=4, n=1))

The second element of C is a vector itself.

For a simple scalar type column it could be a simple vector, say of ints, (e.g. t=??, n=4 if there are 4 rows)

Open question - can table allow you to have heterogenous column where second element would be vector of K pointers????

  1. Viewed as a picture, 3 col 4 row dict is:
   d (5,3)
     |
     |---------> list element 1: col1 (t=0,n=2)
     |    |
     |    |--> header (t=4, n=1)     `name
     |    |--> values (t=-4, n=4)    ( `Dent\`Beeblebrox\`Superman\`Prefect)
     |
     |---------> list element 2: col2 (t=0,n=2)
     |    |
     |    |--> header (t=4, n=1)     `elo
     |    |--> values (t=-1, n=4)    (1100 1600 3000 1800)
     |
     |---------> list element 3: col3 (t=0,n=2)
     |    |
     |    |--> header (t=4, n=1)     `age
     |    |--> values (t=-1, n=4)    (32 25 42 23 )