# MATH 3345 R Lesson Notebook 7

## Regular Expressions (RegEx)

This notebook accompanies Chapter 15 of _**R for Data Science (2nd Ed.)**_

### Set Up (Section 15.1.1)

The textbook has you load tidyverse.  Another option is to load the **dplyr**, **stringr**, **tidyr**, **purrr**, and **ggplot2** libraries, which are the packages needed for these exercises.

**NOTE:** _Only run the 'install.packages' steps below if the ```library``` command to load the specified package generates an error message._

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("dplyr")

In [None]:
library(dplyr)

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("stringr")

In [None]:
library(stringr)


In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("tidyr")

In [None]:
library(tidyr)


In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("purrr")

In [None]:
library(purrr)

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("ggplot2")

In [None]:
library(ggplot2)


### Load Library for Data
Some of these activities require the ```babynames``` data set in the **babynames** library. If the library fails to load, remember to install it first using _```install.packages```_ in the cell below.

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("babynames")

In [None]:
library(babynames)

In [None]:
glimpse(babynames)

In [None]:
head(babynames)

### Explore Examples

Follow the steps for re-creating the examples as shown by typing the code given in each example of the sections indicated below. Insert a new code cell to carry out each example.

To get the most out of these exercises, **_do NOT copy/paste the code from the online textbook._** Type it yourself so that you will become more familiar with how the commands work. 

### Creating a String (Section 15.2)

Create a new code cell to carry out _each_ example in the section. Also consider creating one or more **markdown** cells to write some notes to yourself about what each example is illustrating.

#### IMPORTANT NOTES

- The version of **stringr** currently installed on the UNG JupyterHub _does not have_ the most updated version of the ```str_view``` function. If you are working in UNG JupyterHub, _**the ```str_view``` output will be different than what is shown in the text**_. For instance, the instruction ```str_view(fruit, "berry")``` behaves differently in older versions of the software.  The newest version _**only**_ displays items in the vector that match the pattern, as shown in the text:

```
  [6] │ bil<berry>
  [7] │ black<berry>
 [10] │ blue<berry>
 [11] │ boysen<berry>
 [19] │ cloud<berry>
 [21] │ cran<berry>
 ... and 8 more
```
However, the older version on JupyterHub displays _ALL_ items in the vector, _**and highlights those that match the pattern**_.

1. apple
2. apricot
3. avocado
4. banana
5. bell pepper
6. bil```berry```
7. black```berry```
8. blackcurrant
```
... and 72 more
```


### Key Functions (Section 15.3)

Create a new code cell to carry out _each_ example in the section. Also consider creating one or more **markdown** cells to write some notes to yourself about what each example is illustrating.

#### IMPORTANT NOTES

- The version of **tidyr** currently installed on the UNG JupyterHub _does not have_ the ```separate_wider_regex``` function described in this section. Below is an alternative method using older functions in R to achieve the same results for the example in Section 15.3.4.

### Replace the following code
```
df %>% 
  separate_wider_regex(
    str,
    patterns = c(
      "<", 
      name = "[A-Za-z]+", 
      ">-", 
      gender = ".",
      "_",
      age = "[0-9]+"
    )
  )
```

#### with the code below:

```
df %>%
  separate(
    str,
    into = c(NA,"name", NA,"gender", "age"),
    sep = "([-_><])" 
  )
```


### Pattern Details (Section 15.4)

Create a new code cell to carry out _each_ example in the section. Also consider creating one or more **markdown** cells to write some notes to yourself about what each example is illustrating.

#### IMPORTANT NOTES
- The version of **stringr** currently installed on the UNG JupyterHub _does not have_ the most updated version of the ```str_view``` function. Examples in this section are affected as follows:
   - Whenever this function is used with only one argument (such as ```strview(x)```), replace it with ```strview(x,"\\n")``` if you are working on the UNG JupyterHub.
   - When ```str_view``` is used to match multiple patterns in a string, the older version will only find the first pattern unless the function is given a _vector_ with the same number of strings as there are patterns. For instance, in the example shown in Section 15.4.2: 
   ```
   str_view("abc", c("$", "^", "\\b"))
   ```
   will only match the first pattern; the code can be modified as follows to match all 3 patterns:
   
   ```
   str_view(c("abc","abc","abc"), c("$", "^", "\\b"))
   ```

   - When there are multiple instances of a pattern inside a string, the older version of ```str_view``` will only find the first occurrence (e.g., the ```str_view("a-b-c", "[a-c]")``` example in Section 15.4.3)
   - If working on UNG JupyterHub, remember in Section 15.4.6 to replace
   ```
   sentences %>% 
  str_replace("(\\w+) (\\w+) (\\w+)", "\\1 \\3 \\2") %>% 
  str_view()
   ```
   with
   ```
   sentences %>% 
  str_replace("(\\w+) (\\w+) (\\w+)", "\\1 \\3 \\2") %>% 
  str_view("\\n")
   ```
   to get the desired behavior.




### Pattern Control (Section 15.5)

Create a new code cell to carry out _each_ example in the section. Also consider creating one or more **markdown** cells to write some notes to yourself about what each example is illustrating.


### Practice (Section 15.6)

Create a new code cell to carry out _each_ example in the section. Also consider creating one or more **markdown** cells to write some notes to yourself about what each example is illustrating.

#### IMPORTANT NOTES

Remember that if you are working in UNG JupyterHub, you will need to modify the second example in Section 15.6.2 as follows:

#### Replace
```
str_view(words[!str_detect(words, "[aeiou]")])
```
#### with
```
str_view(words[!str_detect(words, "[aeiou]")],"\\n")
```

_There are a few other examples where you will need to add the ```"\\n"``` argument to ```str_view``` if working in UNG JupyterHub. Be on the lookout for these!_

## OPTIONAL: Section 15.7
### Regular Expressions in Other Places 

_**This section is optional and you may omit it.**_

