You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, the decode() method for slices does not work when all cpos pairs in the partition have exactly the same length/distance. In the speech bundle above, this is the case for speech 8094 (among many others).
Both pairs comprise exactly ten tokens. In this case, the decode() method does not work as intended:
x <- decode(speeches@objects[[8094]])
... decoding p_attribute lemma
Error in `[.data.table`(y, , `:=`((p_attr), get_token_stream(.Object, :
Supplied 22 items to be assigned to 11 items of column 'lemma'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
And I think it is this line that does the damage:
y <- data.table(cpos = unlist(apply(.Object@cpos, 1, function(row) row[1]:row[2])))
I think, the apply() part says: for every row in matrix, return the sequence from value in column one to value in column two. In case all resulting integer vectors have the same length, apply() apparently returns a matrix instead of a list, which then cannot be unlisted. In the next stop of the decode() method, only the number of rows of the matrix is taken into account.
I guess, the easiest way, albeit not necessarily the most elegant or fastest one to solve this issue is to check the return value of apply before creating y.
Thanks for reporting this issue in such a comprehensible and informed manner.
Indeed, I think I had seen this issue before. But I am guilty of not having followed the DRY principle (don't repeat yourself). So I used various varions of the snippet apply(.Object@cpos, 1, function(row) row[1]:row[2]) to unfold a region matrix. Not all of them were able to deal with the matrix that is returned when all regions have the same length, including the one you detected.
The best solution I had found (robust and fast) is still a cpos() method for matrix input I had written a few weeks ago:
A new version of polmineR available at the dev branch uses this method whenever a region matrix needs to be turned into single corpus positions. I have also written a unit test as a safeguard that we do not fall behind the current state of affairs.
The not-so-beautiful part of the solution is that the primary purpose of the cpos()-method is to get a region matrix. Ultimately, the method should be renamed. get_region_matrix() would be a plausible solution, I guess.
After some initial testing I would agree that this approach works nicely. I also would think that get_region_matrix() seems like a reasonably expressive name for the initial purpose of the cpos() method.
As far as I am concerned, this issue is solved and can be closed. Thank you very much for the swift fix.
This is probably a very special scenario but because I encountered it in a real-life application, I report it. I needed to decode speeches.
However, the
decode()
method for slices does not work when all cpos pairs in the partition have exactly the same length/distance. In the speech bundle above, this is the case for speech 8094 (among many others).Both pairs comprise exactly ten tokens. In this case, the
decode()
method does not work as intended:And I think it is this line that does the damage:
I think, the
apply()
part says: for every row in matrix, return the sequence from value in column one to value in column two. In case all resulting integer vectors have the same length,apply()
apparently returns a matrix instead of a list, which then cannot be unlisted. In the next stop of thedecode()
method, only the number of rows of the matrix is taken into account.I guess, the easiest way, albeit not necessarily the most elegant or fastest one to solve this issue is to check the return value of apply before creating y.
As for the decode method, see also issue #120.
The text was updated successfully, but these errors were encountered: