Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return ses() output in more computable form? #143

Closed
hadley opened this issue Mar 29, 2020 · 5 comments
Closed

Return ses() output in more computable form? #143

hadley opened this issue Mar 29, 2020 · 5 comments

Comments

@hadley
Copy link

hadley commented Mar 29, 2020

i.e.. I'm doing this:

ses <- function(x, y) {
  out <- diffobj::ses(x, y, max.diffs = 100)
  out <- rematch2::re_match(out, paste0(
    "(?:(?<x1>\\d+),)?(?<x2>\\d+)",
    "(?<t>[acd])",
    "(?:(?<y1>\\d+),)?(?<y2>\\d+)"
  ))[1:5]

  out$x1 <- ifelse(out$x1 == "", out$x2, out$x1)
  out$y1 <- ifelse(out$y1 == "", out$y2, out$y1)

  out$x1 <- as.integer(out$x1)
  out$x2 <- as.integer(out$x2)
  out$y1 <- as.integer(out$y1)
  out$y2 <- as.integer(out$y2)

  out
}

but I'm sure you could compute that more efficiently at a lower level.

@brodieG
Copy link
Owner

brodieG commented Mar 29, 2020

Internally diffobj::ses calls diffobj:::diff_myers and the returns the raw data (note ?diff_myers is out of date, I'm realizing):

> xx <- diffobj:::diff_myers(letters[1:5], letters[2:6])
> str(xx)
Formal class 'MyersMbaSes' [package "diffobj"] with 6 slots
  ..@ a     : chr [1:5] "a" "b" "c" "d" ...
  ..@ b     : chr [1:5] "b" "c" "d" "e" ...
  ..@ type  : Factor w/ 3 levels "Match","Insert",..: 3 1 2
  ..@ length: int [1:3] 1 4 1
  ..@ offset: int [1:3] 1 2 5
  ..@ diffs : int 2

The strings ses outputs are produced by the as.character method used by show.

So short term you can get the data that way. Longer term I can add a method that returns this.

@brodieG
Copy link
Owner

brodieG commented Mar 29, 2020

Also, if you can give me a sense of what time frame this feature would be useful to you (i.e. when is too late to be useful) let me know. I'm way behind on all sorts of things so absent specific timelines things are likely to sit for a while.

@hadley
Copy link
Author

hadley commented Mar 29, 2020

I’m already using the function above so there’s no rush. The only advantage would be a small performance boost, but I doubt it’s a bottleneck for me anyway.

@brodieG brodieG added this to the 0.2.5 milestone Apr 6, 2020
@brodieG
Copy link
Owner

brodieG commented May 8, 2020

Just added ses_dat which I think gives you the data in a good format for your pruposes. This is the example. It's now in the development branch.

image

@hadley
Copy link
Author

hadley commented May 8, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants