C-only version of substr_esc #13

brodieG · 2018-01-16T01:40:01Z

currently ansi_substr2 does a lot of the work in R.

The text was updated successfully, but these errors were encountered:

brodieG · 2018-02-10T15:51:29Z

There are some complications here b/c we'd like to minimize how many times we scan through each unique string, but we also don't necessarily want to allocate and store states at every selected cut point. This is probably what we should do when we get to handling the issue:

Actually confirm that we can get a substantial performance improvement of substr by testing taking substrings of a long string repeatedly; if we can't then there really isn't much of a benefit as with substrings of any substantial size the bulk of the current computation in is substr anyway.
Compute how many ESC sequences there are between the lowest cut-point and the highest cut-point
- We will store these ESC sequences with their states
- What do we do about sequential ESC sequences? Ideally we only store one entry per sequence of ESC sequences, but that means we have to parse each sequence on the first pass.
- This is all under the presumption that there are fewer ESC sequences than distinct cut points
- Additionally, one big issue with this is that it only works well for byte encoded strings, as we still have to find the byte that corresponds to a particular width or character between two ESC sequences. THIS COULD BE A DEAL BREAKER.
Once we have the recorded points, we can binary search for them since we'll presumably store them sorted.
- We can even keep the most recent search path for beginning and end cut points as those should only be log(N) long and we can possibly re-use them with the subsequent cut points to reduce how many hops we have to do

Alternate:

Compute state at every cut point
- TBD whether we need different states for starting and ending cut points

brodieG · 2021-07-04T01:56:30Z

There does not seem to be enough juice in this to warrant the effort.

brodieG added the enhancement label Jan 16, 2018

brodieG changed the title ~~C-only version of substr_csi~~ C-only version of substr_esc Feb 10, 2018

brodieG modified the milestone: 0.20 Feb 10, 2018

brodieG closed this as completed Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C-only version of substr_esc #13

C-only version of substr_esc #13

brodieG commented Jan 16, 2018

brodieG commented Feb 10, 2018

brodieG commented Jul 4, 2021

C-only version of substr_esc #13

C-only version of substr_esc #13

Comments

brodieG commented Jan 16, 2018

brodieG commented Feb 10, 2018

brodieG commented Jul 4, 2021