if possible, do not use irrelevant packages in a minimal reproducible example [MRE]: we have base::strsplit, so stringr::str_split is unnecessary (BTW, the latter is just a wrapper of stringi::stri_split - better to use that package)
split your processing pipeline into minimal steps, and profile those minimal steps: in the current case, dt[, c("id", feature), with = FALSE] just returns the original table, the splitting step could be created directly when you create the example table, and the performance degradation occurs in the unlist-by-group step (so only this last step is relevant)
@sandoronodi thanks for the report and I can reproduce. It is very related to #4646. I tried the #4655 workaround and it seems to address performance. Could you see if it solves your actual use case as well?
Also, and I am not sure if this is necessarily a more readable approach, but this is faster method to get the same result. You could probably tweak it more if this is a bottleneck, but this would allow you to move forward with 1.13.0.
'list_col'=sample(c('', '', 'a', 'a:b', 'a:b:c'), 20000, TRUE))
l= strsplit(x, ":")
lens[lens==0L] =1L##for those without matches, we'll still have `list_col_` for each row based on OP. Therefore, we need those rows.partial_text= paste0(feature, "_")
list(id= rep(id, lens),
feature_names= unlist(Map(function(y) if (length(y)) paste0(partial_text, y) elsepartial_text, l), use.names=FALSE)
## A tibble: 2 x 13## expression min median `itr/sec` mem_alloc## <bch:expr> <bch> <bch:t> <dbl> <bch:byt>##1 potential_solution 50ms 56.5ms 18.0 1.33MB##2 OP_extract_perf_branch 156ms 158.3ms 6.24 2MB