New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CQP query syntax #193
Comments
Using the '&' as a logical operator does not work with the CQP syntax. You need to follow the regular expression syntax. Something like this: k <- kwic(faz, query = '"(.*[aA]usländer.*|.*[mM]igration.*|Deutschland|.*[iI]ntegration.*|.*[aA]bschieb.*)"') Yet I do not think this is exactly what you had in mind. Maybe explain your objective in plain words first. A valuable reference is the CQP tutorial: http://cwb.sourceforge.net/files/CQP_Tutorial.pdf |
Hey Emily, if you're still looking for a solution, try this one: kwic(faz, query = '"(.*[aA]usländer.*|.*[mM]igration.*)" []{0,4} "(Deutschland|Bundesrepublik|BRD)" []{0,4} "(.*[iI]ntegration.*|.*[aA]bschieb.*)"', cqp = TRUE) A query for multiple words in CQP works like For reasons of flexibility, I added the expression In any case it is very helpful to check, which expressions from the corpus are found with your cqp-query. For a quick overview, try count(faz, query = '"(.*[aA]usländer.*|.*[mM]igration.*)" []{0,4} "(Deutschland|Bundesrepublik|BRD)" []{0,4} "(.*[iI]ntegration.*|.*[aA]bschieb.*)"', cqp = TRUE, breakdown = TRUE) which returns a count of all matches from the corpus On a general note: depending on your research, it often makes sense not to make your cqp-expression too complex but to split it up into multiple but simpler expressions, especially when you are working with greedy expressions like .* and have to deal with many hits. Best, |
Hi @mxi-hug, Thank you for your feedback on this! I ended up splitting the query into smaller and simpler expressions, as I'm searching for entire articles that contain at least one word from each set of words. So I could have used the first option you provided and put in the expression Thanks again! Best, |
Hi everyone,
I'm just beginning to work with polmineR and trying to search for articles containing at least one word from 3 groups of words. I'm struggling to write this syntax properly and Google searches haven't helped me so far.
Here's what I want, with the query pulling articles containing at least one word from each of the 3 word groups:
kwic (faz, query=[one group of words] & [another group of words] & [third group of words]
And here's a simplified version of what I have ("faz" is the corpus):
kwic(faz, query='[word = ".*[aA]usländer.*" | word = ".*[mM]igration.*"] & [word = "Deutschland" | word = "Bundesrepublik" | word = "BRD"] & [word = ".*[iI]ntegration.*" | word = ".*[aA]bschieb.*"]', cqp=TRUE)
But I don't think this is correct, as the number of articles that come up isn't changing when I add more words to the search string. So I think it's just pulling all the articles that have at least one of the words in the whole search string.
Can anyone provide some suggestions? Thank you so much!
Best,
Emily
The text was updated successfully, but these errors were encountered: