In [1]:
#;.pykx.disableJupyter()

In [2]:
# https://code.kx.com/pykx/3.0/examples/jupyter-integration.html#q-first-mode
import pykx as kx
kx.util.jupyter_qfirst_enable()

PyKX now running in 'jupyter_qfirst' mode. All cells by default will be run as q code. 
Include '%%py' at the beginning of each cell to run as python code. 


# String Manipulation 

**Learning Outcomes**

To understand

* How to create and print strings
* How to use strings for logging
* Common string parsing functions
* Regex string comparison
* Searching strings

# Introduction

String manipulation is very important in kdb+/q because strings are such a commonly used datatype, much of the data you will need to work with will be stored in this format. String manipulation can be useful for many operations - e.g. logging messages from processes.

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:15px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> This is an entire section dedicated to strings! Gentle reminder that a "string" in kdb+/q is a list of characters. It's important to remember that these are lists, as this has implications for the functions we can use with them and the types of errors we are most likely to encounter. (Hint:<code>'length</code> !) </i></p>

# Creating and printing strings

You can create a string from another datatype in q by applying the [`string`](https://code.kx.com/q/ref/string/) operator to it, for example:

In [4]:
type string 20   // creating a string from a long
string 6*5  // multiplying and creating a string from result
string .z.d // creating a string from today's date

10h
30
2025.02.16


We can also create strings manually in the same way that we created lists before: 

In [7]:
"hello world"
"string"~("s";"t";"r";"i";"n";"g") //equal to a list of characters
("hello"; "world")   //are these two lines equivalent? Why/Why not? 

hello world
1b
"hello"
"world"


They're not! The first item is a list of characters but just one list. The second is a list of lists - each word is a separate list of characters.

In [None]:
type "hello world"
type ("hello"; "world") 
type each ("hello"; "world") 

##### Exercises 

Create a string with the current time as a timestamp.

In [None]:
string .z.p

In [10]:
// Write your code here
string .z.p

2025.02.16D08:03:44.144316978


## Displaying Output

If we want to write and display our own debugging messages we first need to know how to display output. If we don't want to return our final evaluation we can use a `;` at the end of the expression, similar to how we separate items in a list - this indicates that our q expression is complete.

In [11]:
"Hello" 
"Farewell";   // repressing output
"Cruel"; 
"World"

Hello
World


To [print](https://code.kx.com/q/ref/display/) an expression to the console (or standard out) the function `0N!` is used.

In [12]:
0N!"It's about time";  // what would we see if we remove the ; ?

"It's about time"


If we run remove the suppression `;` we will see the print and also have our output returned so we will see this twice.

In [None]:
0N!"It's about time"

Another way to write to [stdout](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)) is to use `1` & `-1` - similarly for stderr we can use `2` & `-2`

In [13]:
-1"Hello World"; a:`other`stuff;

Hello World


In [14]:
-2 "Errors in Jupyter are indicated by a red band";2+2  //NB execution isn't stopped by writing to stderr

4


Errors in Jupyter are indicated by a red band


These operations are extremely useful when you wish to log the progress of a q script or function. We can also see the difference between `1` & `-1`. 

In [17]:
x:0 
while[x<3; 
    1 "Starting with parameter ",string[x];
    -1 " ... Finished";             
    x:x+1;              //incrementing x
    ]

Starting with parameter 0 ... Finished
Starting with parameter 1 ... Finished
Starting with parameter 2 ... Finished


What would happen if both print messages were `-1` ?

In [18]:
//-1 writes to standard out with a carraige return (\n) meaning the next comment will be on the new line
x:0 
while[x<3; 
    -1 "Starting with parameter ",string[x];
    -1 " ... Finished";             
    x:x+1;              //incrementing x
    ]

Starting with parameter 0
 ... Finished
Starting with parameter 1
 ... Finished
Starting with parameter 2
 ... Finished


# String Manipulation

## Cutting/Creating strings
**Vectors from scalar `vs` and Scalars from vectors `sv`**

These keywords are useful if we wish to [tokenize](https://en.wikipedia.org/wiki/Lexical_analysis) and/or build strings.

In [19]:
vs[";";"a=10;b=20;c=IBM"]    //split up the string on the “;” character 
";" vs "a=10;b=20;c=IBM"     //equivalent to the above

"a=10"
"b=20"
"c=IBM"
"a=10"
"b=20"
"c=IBM"


Resulting in a list of strings. The delimiter we choose to use does not have to be atomic though, we could instead pass a string pattern:

In [20]:
show a:vs[";*";"a=10;*b=20;*c=IBM"] //spliting up the string on the string “;*”
vs["="] each a                      //splitting the substrings by "="

,"a" "10" 
,"b" "20" 
,"c" "IBM"
"a=10"
"b=20"
"c=IBM"


The `sv` function is used to create one big string from a list of smaller strings. In this context, it is therefore the opposite of `vs`. For example:

In [23]:
sv[";";a]                          //create a combined string using the provided delimiter "|"

a=10;b=20;c=IBM


<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:5px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> These functions are also useful when splitting symbols, generating file paths and more! - <a href="https://code.kx.com/q/ref/vs/">Further Reading</a></i></p>


##### Exercise
Use `vs` to write each word of the string "Its about time!" on seperate line.

In [None]:
vs[" ";"Its about time!"]

In [24]:
// Write your code here
vs[" ";"It's about time"]

"It's"
"about"
"time"


##### Exercise 

Given the list of strings `("AAPL";"TD12kdi12";"34.21")`, combined these together to create a single pipe ("|") delimited string.

In [None]:
"|" sv ("AAPL";"TD12kdi12";"34.21")

In [28]:
// Write your code here
show x:("AAPL";"TD12kdi12";"34.21")
sv["|";x]

AAPL|TD12kdi12|34.21
"AAPL"
"TD12kdi12"
"34.21"


## String padding 

`ltrim` is a monadic function that removes **leading** or **left** whitespace from strings, similarly `rtrim` is a monadic function that removes **trailing** or **right** whitespaces from strings. Combining the two, [`trim`](https://code.kx.com/q/ref/trim/#trim) will  remove both leading and trailing whitespace from strings.

In [29]:
show a:ltrim "             abc   "
show b:rtrim "  abc              "
show c:trim  "         abc       "
"abc"~a
"abc"~b
"abc"~c

0b
0b
1b
"abc   "
"  abc"
"abc"


###### Exercise
Remove the space from the beginning and end of the string `" KDB is Fun ! "`

In [None]:
trim " KDB is Fun ! "

In [31]:
// write your code here 
trim " KDB is Fun ! "

KDB is Fun !


If we wanted instead to **add** padding, rather than removing it we can use one of the lesser known overloads of the `$` operator, in this cases acting as [Pad](https://code.kx.com/q/ref/pad/). This will add additional padding to strings to either the left (positive) or right (negative), upto a fixed size of string.

In [None]:
10$"example"
-10$"example"

What do you think will happen if we use a size smaller than our string - for example, what would `5$"example"` return? 

In [None]:
5$"example"

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:5px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Be careful when using padding that you know the length of your longest string - or you might end up losing data!</a></i></p>

## String Casing

`lower` is a monadic function that converts strings and symbols to lower case.<br>
`upper` is a monadic function that converts strings and symbols to upper case. [Further Reading](https://code.kx.com/q/ref/lower/#upper)

In [32]:
`small~lower `SMALL
"BIG"~upper "big"

1b
1b


##### Exercise
Write `" KDB is Fun ! "` in all lower case, and in all capitals.

In [None]:
lower " KDB is Fun ! "
upper " KDB is Fun ! "

In [33]:
// write your code here 
a: " KDB is Fun ! "
lower a
upper a


 kdb is fun ! 
 KDB IS FUN ! 


<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:5px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Strings in kdb+/q are case sensitive! So <code>"Qbee"</code> is not equivalent to <code>"qbee"</code>. </a></i></p>

# String comparison and search

Now that we know how to manipulate our strings, the next thing we need to consider is how can we make comparisons between strings, and look for particular patterns within them.


## Caution - how NOT to compare strings

Since strings are list of characters, rather than individual atoms we can't do what we would do automatically with other datatypes and check for equality: 

In [35]:
"tin" = "t"

100b


The reason for this is because in order to check equality between lists, we need two lists of the same size, or an atom and a list:

In [36]:
1= 1 2 
"a"= "ab"    //textual equivalent
"tin" = "man"

10b
10b
001b


We could see that it would be possible to do the comparisons if the strings are the same length but it's not necessarily as performant as our next keyword `like`, and certainly not as efficient!

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:35px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> If we want to compare the performance of different code implementations, we can preface our code with the following: <code>\ts MY_CODE_HERE</code> which will give us the <a href="https://code.kx.com/q/basics/syscmds/#ts-time-and-space">time and space</a> associated with running that piece of code. We can run the test multiple times and return the totals back by adding the number of repeated timings we'd like to do e.g. to run our code 20000 times we can say <code>\ts:20000 MY_CODE_HERE</code>. The time is returned in milliseconds and the space is measure in bytes.</a></i></p>

We can use this utility to check the performance of a number of different methods to compare strings - because kdb+/q is so performant we have to repeat this a number of times to get a time measurement that registers. 

In [41]:
\ts:50000  all "tin" = "man"    //we need to add all here, since they only match if ALL characters match
\ts "tin"~"man"           //we need to have an EXACT match
\ts:50000 "tin" like "man"  //your best friend when comparisng strings is like!
"tin" like "ton"  

16 1264
0 1312
5 1024
0b


## String comparison using `like`
The keyword [`like`](https://code.kx.com/q/ref/like/) is used to: 
* Compare strings with other strings
* Compare strings with symbols
* Compare symbols with symbols *(If you're looking for an exact match, this won't be as performant as `=` for example)*.
* Perform REGEX comparisons

It supports one or more wild card characters at the front, middle or tail of the list. The like function will return either Boolean true or Boolean false.

In [42]:
y:"IBM.OQ"               //A simple string
like[y;"IBM*"]           //Does the input string begin in “IBM” - followed by anything
y like "IBM*"            //infix notation

1b
1b


In [None]:
like[y;"*.OQ"]           // checks does the input string end in “.OQ”.

Here,  `*` is a [wildcard](https://code.kx.com/q/basics/regex/).

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:20px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> You can use wildcards <code>*</code> at the beginning <b>AND</b> end to match for a particular pattern anywhere in the string e.g. <code>"*BEE*"</code> would match <code>"QBEE"</code> and <code>"ITS BEE-EE-A-U-T-FUL"</code>. Trying to extend the wildcard to search for multiple patterns though, like <code>"*this*and*then*"</code> is not supported. </a></i></p>

There are many other REGEX expressions that can be used in kdb+/q - here are some examples: 

In [44]:
like[y;"I?M*"]                       //the “?” is a single wild card character

1b


In [43]:
like["this";"[tT]his"]               //the 1st character can either be “t” or “T”

1b


In [45]:
like["his";"[tT]his"]               

0b


In [47]:
like["Winners2020";"W*20[12][0-9]"]  //will match with winners from 2010 -> 2029 

1b


The `like` function can even handle symbols:

In [46]:
show z:`$y                         //cast the string y to be an atomic symbol `IBM.OQ 
like[z;"IBM*"]                     //like works as before 

1b
`IBM.OQ


##### Exercise 
Use the `like` function to find where all of the ID's contain the string "JPM" in the list <code>\`JPM.AB\`JPM.CD\`BM.QW\`MSFT.AB</code>

In [48]:
l:`JPM.AB`JPM.CD`IBM.QW`MSFT.AB
like[l;"JPM*"]
l like "JPM*"

1100b
1100b


In [51]:
// write your code here
l: `JPM.AB`JPM.CD`BM.QW`MSFT.AB
like[l; "JPM*"]

1100b


## String Search (and Replace)

[String search](https://code.kx.com/q/ref/ss/) `ss[x;y]` searches a string x for the index of where the value y appears. `ss` supports pattern matching capabilities of like.

In [52]:
s:"toronto ontario"
s ss "ont"             //pattern occurs at indexes 3 and 8 

3 8


In [53]:
s ss "[ir]o"           //mix with regex search 

2 13


In [55]:
s ss "t?r"             //match with tor and tar (? can be any character)

0 10


##### Exercise
Find the index of all the "o"'s in the string `s:"toronto ontario"`

In [None]:
show s:"toronto ontario"
ss[s;"o"]
s ss "o"    //infix notation

In [56]:
// write your code here 
ss[s;"o"]

1 3 6 8 14


**String Search Replace**

[String search replace](https://code.kx.com/q/ref/ss/#ssr) `ssr[x;y;z]` searches a string x for a pattern y and replaces it with a third pattern z.

In [57]:
show s:"toronto ontario"
ssr[s;"ont";"x"]                                    / replace "ont" by "x"

torxo xario
"toronto ontario"


In [58]:
ssr[s;"t?r";upper]                                  / replace matches by their uppercase

TORonto onTARio


<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:5px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> You cannot use * to match with ss and ssr </i></p>

##### Exercise
Replace all the "o"'s with "s"'s.

In [None]:
ssr[s;"o";"s"]

In [59]:
// write your code here 
ssr[s;"o";"s"]

tsrsnts sntaris


##### Exercise 
Similar to how we applied the function `upper` to the match, can you remove the "o"'s from the string `s` defined above, only where they are followed by an "n"? 

In [60]:
ssr[s;"on";1_]

tornto ntario


In [62]:
// write your code here 
ssr[s;"on";"n"]

tornto ntario
