In [ ]:
#;.pykx.disableJupyter()

In [ ]:
# https://code.kx.com/pykx/3.0/examples/jupyter-integration.html#q-first-mode
import pykx as kx
kx.util.jupyter_qfirst_enable()

**Learning Outcomes**

To understand: 
* Creating and using file paths 
* Useful in-built functions: `key` `hsym` & `sv`
* Using file handles
* Loading and saving data in kdb+/q native format
* Loading and saving data to Text, CSV & JSON format


# Introduction
It is very important to know how to manipulate files and navigate folders in kdb+. Until now we have primarily operated on tables in memory. This session focuses on how we can save data to disk. 

# Creating file paths
For saving and reading files we should be familiar with the general file path structure. A file handle is a symbol that represents the name of a directory or a file on persistent storage. 

Syntax :``` `:path/nameOfFile ```

The current directory is denoted with a single dot, and like all file handles in kdb+/q are stored as symbols with a leading `:` :

```
`:.                   //current directory                            

`:folder/otherFolder    // relative path      

`:c:/folder/otherFolder     // full path   
```

## The `key` command 
We can use these filepaths in conjunction with the [`key`](https://code.kx.com/q/ref/key/) command in kdb+/q to return a list of the files in a provided directory filepath. 

In [None]:
key `:.        //list the files present in the current directory

Let's try and create a file path in q using the information that we have up until now: 

In [None]:
`:data/test.csv       //creating a simple file path with no white spaces

In [None]:
`$":data/test 1.csv"  //files with white spaces are provided as strings first and then cast to symbol

If we pass a file or folder to `key` that does not exist, we will get back a non-typed empty list:

In [None]:
key `$":Working with Files.ipynb"          //path returned
key `:notHere.ipynb                        //non existant files return ()
()~key `:notHere.ipynb               

##### Exercise
Using `key`, count the number of files in the current directory

In [None]:
count key `:.

In [None]:
//your answer here

## The `hsym` function
The [`hsym`](https://code.kx.com/q/ref/hsym/) function can be used to turn a symbol/string into a file path by adding a leading colon, if it's not already there. 

Suppose we look first at the files and directories available in our current working directory: 

In [None]:
key `:. 

Let's change the  first symbol into a file handle using `hsym` and then further inspect that directory: 

In [None]:
hsym first key `:.         //making this a file handle 
key hsym first key `:.     //using key on this new directory to see what files exist there

In order to recognize these files as referencing to our disk structure we need the leading colon, without it we are just applying key to a symbol.

In [None]:
key first key `:.
key `randomSymbol
key[`randomSymbol]~key first key `:.

We also don't need to worry about whether a symbol is already a filepath when using `hsym`, applying it multiple times has no adverse affects: 

In [None]:
hsym `:path
hsym hsym `:path

## sv 

The dyadic [`sv`](https://code.kx.com/q/ref/sv/) function is heavily overloaded and in particular when used with file handles behaves differently and we can use it to help build file paths.

Building a file path with `sv` is very easy and the syntax is as follows:

``` ` sv `:[list of folders]```

Let's build on the operators we've already seen and use the `sv` operator to build the full filepaths for all files in the .ipynb_checkpoints folder: 

In [None]:
show cp:hsym[first key `:.]   //our checkpoint directory

In [None]:
cp,/: key cp                  //joining our directory to each file within it - making paired lists
first cp,/: key cp            //the format of our first list

In [None]:
` sv' cp,/: key cp            //making our full file paths for each file within that directory 

Above we use `'` - the each both operator - to apply the `sv` command pairwise using <code>`</code> with each of our file lists. 


For `sv` to work correctly in this fashion, the first symbol in the list must be a filehandle (i.e. start with a colon).

In [None]:
` sv `a`b`c           //can use this for namespace indexing
` sv `:a`b`c          //use this for building file paths

##### Exercise 

Use `sv` to create the function `pathToTable` that takes three inputs - a directory path (e.g. <code>\`:.</code>), a date (e.g. `2020.01.01`) and a tableName as a symbol (e.g. `trade`). 

This function should return a filepath like the below:  
    
    pathToTable[`:.;2020.01.01;`trade]
    `:./2020.01.01/trade


In [None]:
pathToTable:{[dir;date;tab]
            ` sv (dir;`$string date;tab)}
pathToTable[`:.;2020.01.01;`trade]

In [None]:
//your answer here

# Saving and loading kdb+/q data

Most often when working with kdb+/q, the data we will be storing and working with will be in native kdb+/q binary format. In this section we discuss how to save and load this data. 

## Saving kdb+/q data 
Saving data in kdb+/q format is pretty straightforward - we can do so using the [`set`](https://code.kx.com/q/ref/get/#set) command.

The `set` command takes two inputs: 
* The file path (as a handle) where we want to store our data
* The data itself 

For example: 

In [None]:
trade:([]date:.z.d+til 10;sym:10?`IBM`KX`JPM;price:10?100.) //we have created a trades table to save down
`:tradeTable  set  trade   //we are returned the name of the file upon success

Another way in which we can achieve the same results is by using `.` amend to directly amend our data on disk: 

In [None]:
.[`:modifyDiskAtThisPoint;();:;1 2 3]   //modifying the specified file on disk to perform the following action 
                                            //in this case, the file doesn't exist so is created

In [None]:
.[`:modifyDiskAtThisPoint;();,;4]       //direct on disk append

##### Exercise 

Create a table called `covidCasesPerCountry` with two columns:
* ``country:`US`UK`CHINA`SPAIN``
* ``noOfCases:4500000 302000 84000 282000``

Save this table down as a flat file.

In [None]:
covidCasesPerCountry:([]country:`US`UK`CHINA`SPAIN;noOfCases:4500000 302000 84000 282000)
`:covidCasesPerCountry set covidCasesPerCountry //saving it as a flat file

In [None]:
//write your answer here

## Loading kdb+/q data 
Loading our saved data back into our kdb+/q process is again straightforward - we can do so using the [`get`](https://code.kx.com/q/ref/get) command.

The keyword `get` takes one input, which in this case is the file path handle where our data is located: 

In [None]:
get `:tradeTable
get `:modifyDiskAtThisPoint     //we also have our appended data from the last action

In [None]:
get `:notHere    //trying to return a file that doesn't exist throws an error

##### Exercise 

Load the `covidCasesPerCountry` table into memory and assign it to the variable `c`.

In [None]:
c:get `:covidCasesPerCountry
c 

In [None]:
//write your answer here

# Parsing other formats: Text, CSV & JSON
We don't exist in a vacuum, and so it's important to know about how to load other common datatypes. In particular, we focus on .txt data (commonly used for logs), .csv data (common table format) and .json files (commonly used messaging format).

## Text Data 
Many files are of .txt format, or can be read as if they were. In particular, log files are often stored as text files as they are expected to be human readable. 

### Writing to text files
Often times we may want to write to a file for example when logging from our process. 

Let's assume we want to create a new file to start writing a log to, we can create this file by opening a handle to it: 

In [None]:
myFileHandle: hopen `:myLog.txt  //creating a text log file 
myFileHandle                     //handles are stored as integers 

In [None]:
key `:myLog.txt    //we can see this in our current directory now! - it returns the path so we know it exists

In the above, we used the [`hopen`](https://code.kx.com/q/ref/hopen/) command to create a link between our current kdb+/q process and the file on disk. The handle is stored as an integer and we can use this value to send string data to our new text file: 

In [None]:
myFileHandle "Writing some text " //writing data to our new file 
myFileHandle "on the same line"   //continuing our write on the same line

In [None]:
neg[myFileHandle] "this will end with a new line "   //writing then starting a new line 
neg[myFileHandle] "Now on the next line"             //Check this in the file!

When we use the negative of the file handle the message is sent with a return carraige (`\n`) appended, meaning the next message sent will start on the next line. 

##### Exercise 

First create a handle to our log file (<code>\`:myLog.txt</code>) and call this `LOG_HANDLE`.

Next, write a function called `protectedAdd` which wraps the `+` operator in protected evaluation. In the event of an error write an error message to our log file (using the global `LOG_HANDLE`) in the following format: 
    
    <current timestamp> | ERROR | protectedAdd failed with error <error message>
    
and return a `0b` from the function. 

Verify your function works by calling `protectedAdd[1;"123"]`

In [None]:
LOG_HANDLE: hopen `:myLog.txt
protectedAdd:{.[+;(x;y);
            {[err] errorMsg: string[.z.p],"| ERROR | protectedAdd failed with error:",err; 
               neg[LOG_HANDLE] errorMsg; 0b }]};
                   
protectedAdd[1;2]   //works fine without logging
protectedAdd[1;"123"]  //returns 0b and we see a message in our file 

In [None]:
//your answer here 

### Loading data from text files 
We can use the in-built [`read0`](https://code.kx.com/q/ref/read0/) function to read in a text file as a list of strings.

In [None]:
read0 `:myLog.txt    //we can see the error we sent there now

We can then parse this in whatever way we choose using string manipulation techniques. 

In [None]:
" " vs' read0 `:myLog.txt  //splitting each line where spaces occur

## CSV 

One of the most commonly used formats, many data sets are provided as csv files. 

### Saving to CSV

Sometimes, we may want to export data from kdb+/q into csv files. We can achieve this using the `save` function, specifying a *csv* file extension.

We can use our in-memory `trade` table:

In [None]:
show summary: select num:count i , avg price by sym from trade // a summary of our trade table data 
save `:summary.csv                                             //we can also use the full path here either

You can now see this file in the Jupyter Tree, you can open it there, or download it and open it in excel.

###### Exercise

Save the table `covidCasesPerCountry` as a csv file.

In [None]:
save `:covidCasesPerCountry.csv

In [None]:
//write your answer here

### Loading from CSV

When reading in a csv or text file, we can use [`0:`](https://code.kx.com/q/ref/file-text/)  to parse the string input to a table format. To do so we need to specify the type of each column, along with the delimiter we want to use to separate the text data into columns.

In [None]:
meta summary //checking types in summary table

We can use this information to load in the csv file back to a kdb+/q table as follows:

In [None]:
newsummary:("SJF";enlist csv) 0: `$":summary.csv" 
newsummary

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>
<p style='color:#273a6e'><i> If we don't want to load a column from our text table we can leave an empty space instead of type indicator! </i></p>

In [None]:
("S F";enlist csv) 0: `$":summary.csv" 

The `enlist` in the above is an indication that the first row of our data is actually the table headers. If we remove this, we get our data returned as a list of lists, rather than as a table: 

In [None]:
("SJF";csv) 0: `$":summary.csv" 

You'll notice the column headings are not included in two of our lists - do you know why? 

This is because we are casting our lists to the appropriate list types indicated, in this case "J" and "F". There is no way to represent the string "num" as a whole literal long type, and so this returns a null value, similarly for "price". 

###### Exercise 

Load the csv file associated with the table `covidCasesPerCountry` into memory

In [None]:
meta covidCasesPerCountry //checking the type
("SJ";enlist csv) 0: hsym `$"covidCasesPerCountry.csv" //using hsym to create the file handle

In [None]:
//write your function here

## JSON

A commonly used message format. 

### Saving to JSON

Data can also be exported from kdb+ into json files.

Again, we achieve this using the `save` function. However, we now use a *json* file extension.

In [None]:
//This will only work if you are running kdb+ 4.0
save `$":summary.json"  //keyed tables can be saved this way to JSON 

In [None]:
summary:0!summary; 
save `$":summary.json"  //saving our unkeyed table

##### Exercise 

Create a new table called `actorsWithMostOscars` which contains these columns:
* name:("Katherine Hepburn";"Ingrid Bergman";"Walter Brennan";"Daniel Day-Lewis"; "Jack Nicholson";"Meryl Streep")
* numOfOscars:4 3 3 3 3 3

Save the table as a json file 

In [None]:
actorsWithMostOscars:([]name:("Katherine Hepburn";"Ingrid Bergman";"Walter Brennan";"Daniel Day-Lewis"; "Jack Nicholson";"Meryl Streep");4 3 3 3 3 3)
actorsWithMostOscars
save `$":actorsWithMostOscars.json"

In [None]:
//write your answer here

### Loading from JSON

We may also want to import data into kdb+ from json files. We achieve this using the `load` function. 

Note that this will replace the current variable *summary* with the loaded version.

In [None]:
load `$":summary.json"  //loading also sets the variable `summary in the process
summary

Unlike csv, json has some idea of data-types. However, since everything in json is either a *number* or a *string*, the sym and num columns have changed type.

In [None]:
meta summary 

We can convert them to *symbol* and *long* respectively, using `update`.

In [None]:
update "S"$sym, "j"$num from `summary
meta summary

##### Exercise 

Load in the json file associated with the table `actorsWithMostOscars` that you created above. Update the name column to a symbol data-type.

In [None]:
load `:actorsWithMostOscars.json 
update "S"$name from `actorsWithMostOscars //updating name to symbol column
meta actorsWithMostOscars

In [None]:
//write your answer here