Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KWIC crashes when all results are filtered out by positivelist #91

Closed
ChristophLeonhardt opened this issue Jun 19, 2019 · 4 comments
Closed

Comments

@ChristophLeonhardt
Copy link
Contributor

library(polmineR)
use("GermaParl")
x <- partition("GERMAPARL", lp = 17, role = c("gov", "mp")) %>%
  kwic(query = '"[Aa]ttack.*"', positivelist = c("Messer"), cqp = TRUE) 

Leads to a legit crash of RStudio.

polmineR_0.7.11.9024

@ChristophLeonhardt
Copy link
Contributor Author

In addition, in other corpora, it doesn't crash but throws an error:

x <- partition("BB", lp = 5, role = c("gov", "mp")) %>%
  kwic(query = '"[Aa]ttack.*"', positivelist = c("Messer.*", ".*[Mm]ord.*"))

... get encoding: latin1
... get cpos and strucs
... getting corpus positions
... number of hits: 12
... checking that all p-attributes are available
... getting token id for p-attribute: word
... filtering by positivelist
... number of hits droped due to positivelist: 12
no remaining hits after applying positivelist, returning NULL object... generating contexts
Error in nrow(ctxt@cpos) :
trying to get slot "cpos" from an object of a basic class ("NULL") with no slots

@ChristophLeonhardt
Copy link
Contributor Author

I think I have a possible explanation:

In the context-method, there is the following line:
if (!is.null(positivelist)) ctxt <- trim(ctxt, positivelist = positivelist, regex = regex, verbose = verbose)

It is possible that all elements within ctxt are getting dropped here. The problem is that this isn't being checked again in the code that follows the positivelist trim:

if (!is.null(stoplist)) ctxt <- polmineR:::trim(ctxt, stoplist = stoplist, regex = regex, verbose = verbose)

And even if ctxt isn't NULL here, it could be after the stoplist is used for filtering. To solve this, it might be possible to just but

if (!is.null(stoplist) && !is.null(ctxt)) ctxt <- polmineR:::trim(ctxt, stoplist = stoplist, regex = regex, verbose = verbose)

and check again afterwards if ctxt is NULL now.

if (is.null(ctxt)) return(NULL)

@ablaette
Copy link
Collaborator

Thanks a lot, particularly for developing a potential solution! The check whether ctxt is not NULL is exactly what is required to avoid the error.

For checking whether everything works, I used the following code, which adopts the new workflow to create subcorpora / partitions.

library(polmineR)
use("GermaParl")

corpus("GERMAPARL") %>%
  subset(lp == "17" & role %in% c("gov", "mp")) %>%
  kwic(query = '"[Aa]ttack.*"', positivelist = "Messer", cqp = TRUE) 

corpus("GERMAPARL") %>%
  kwic(query = '"[Aa]ttack.*"', cqp = TRUE, positivelist = "Messer")

The bugfix is included in polmineR v0.7.11.9026 I just pushed to the dev branch.

@ablaette
Copy link
Collaborator

I have introduced a test that will check that the return value is NULL if all matches have been dropped due to a positivelist. As this may safeguard the confidence that everything works as intended, I will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants