Regular Expressions and R #226
Comments
@AdrianLJones I was having this same issue. As @jennybc mentioned in class today, in R you have to escape the escape: |
@samhinshaw's got it! |
Along the same line of thought -- i.e. weirdness of "Regular Expression and R" -- why does this
results to |
This is going to be unsatisfying, but I've never been able to get the \w \d family of character classes to work as part of another bracketed character class (like your example which is \d and ","). In those cases, you can use > grep("^[,0-9]+$", c(letters, "123", "1,234", "mints"), value = TRUE)
[1] "123" "1,234"
> grep("^[,[:digit:]]+$", c(letters, "123", "1,234", "mints"), value = TRUE)
[1] "123" "1,234" I think that problem might be related to the fact that items in square brackets are interpreted literally, e.g. |
I guess a weird hack that would allow you to use \w, \d is: > grep("^(\\d|,)+$", c(letters, "123", "1,234", "mints"), value = TRUE)
[1] "123" "1,234" But that's getting a bit odd, and would get messy very quickly (each new additional character would need another |
I generally find that regular expressions behave much more predictably (ie, as I would expect them to work) when using
|
Thanks for the help, I've encountered another problem. Thanks, Adrian |
The code you posted removes the curly quotes outright (the second argument of gsub you have there is 'nothing'). To replace the quotes with a straight quote, you need to put an escaped straight quote in quotes as the second argument. See below: > (string <- c("’curly’", "’quotes’"))
[1] "’curly’" "’quotes’"
> gsub('[’"]','', string)
[1] "curly" "quotes"
> gsub('[’"]','\'', string)
[1] "'curly'" "'quotes'" |
I'm sorry, I wasn't clear. I did not want to replace the curly quotes with anything, so that was not my confusion. My problem is that after trying to remove curly quotes with the code 'new_names3 <- gsub('[’"]','', new_names2)' there were still curly quotes present and I don't know why. Interestingly your code works just fine at removing your curly quotes, but not the quotes from those three problematic candies. I think maybe this needs assistance from the office hours. Thanks Adrian |
@csiu |
Is there some particularity to the syntax for R that I don't know about?
In the Regex tester I put in
\D|\d{3,}
with the idea of selecting all characters or digits that are longer than length one or two. My plan was to turn those to null as a way to clean the Age variable in the candy data set, and this seemed to work in the test bed.But when I try 'candy_age_time <- candy_age_time %>% mutate(clean.Age = gsub("\D|\d{3,}", '', Age))'
I get
Error: '\D' is an unrecognized escape in character string starting ""\D"
What is going wrong?
Thanks,
Adrian
The text was updated successfully, but these errors were encountered: