-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step 2: Parse First Name - Final Project #67
Comments
Are we using split strings in this or am I way off? |
Yep, you can build from what you did in lab 3 finding the first or last words in titles. |
Lab 3 is the one I struggled on the most. This is what I have so far, although it is just breaking up the strings. How do I pull the 2nd word from each string, when there are multiple strings? I might need a little extra help on this step of the assignment. get_first_name <- function( x ) get_first_name() |
What is the right delimiter to use for the split? Is it a space in this context? |
Describe your pseudo code |
Some code to get you started: > x <- head( d$Full.Name, 5 )
> x
[1] "ABBASI, Mohammad" "ARQUIZA, Jose Maria Reynaldo Apollo"
[3] "Aaberg, Kelsea" "Abadjivor, Enyah"
[5] "Abayesu, Precious"
>
> # DROP LAST NAMES:
> x.list <- strsplit( x, ", " )
> x.list
[[1]]
[1] "ABBASI" "Mohammad"
[[2]]
[1] "ARQUIZA" "Jose Maria Reynaldo Apollo"
[[3]]
[1] "Aaberg" "Kelsea"
[[4]]
[1] "Abadjivor" "Enyah"
[[5]]
[1] "Abayesu" "Precious"
>
> # get second element in each vector
> x.list[[1]][2]
[1] "Mohammad"
> x.list[[2]][2]
[1] "Jose Maria Reynaldo Apollo"
> x.list[[3]][2]
[1] "Kelsea"
>
> # scale this with a loop?
>
> x.second <- NULL
> for( i in 1:length(x) )
+ {
+ x.second[i] <- x.list[[i]][2]
+ }
> x.second
[1] "Mohammad" "Jose Maria Reynaldo Apollo"
[3] "Kelsea" "Enyah"
[5] "Precious"
>
> # ALTERNATIVELY use a lapply (list apply) function:
> #
> # GET SECOND VALUE IN EACH VECTOR
> # function(x){ x[2] }
>
> x2 <- lapply( x.list, function(x){ x[2] } )
> x2
[[1]]
[1] "Mohammad"
[[2]]
[1] "Jose Maria Reynaldo Apollo"
[[3]]
[1] "Kelsea"
[[4]]
[1] "Enyah"
[[5]]
[1] "Precious"
> x3 <- unlist( x2 )
> x3
[1] "Mohammad" "Jose Maria Reynaldo Apollo"
[3] "Kelsea" "Enyah"
[5] "Precious" Now you have simplified the problem. Next step is to get the first name in each string: "Jose Maria Reynaldo Apollo" --> "Jose" Split it apart again, this time using spaces. And extract the first value in each vector. |
Same, this step is really rough for me. Pretty humbling... |
Here is some code to get you started: For the second part you basically repeat the same steps, but use a space as the split value then grab the first element in each vector instead of the second. This will drop middle names for cases like Jose:
|
If writing code that doesn't work makes you humble then at this point I might be a saint :-) |
Here is some test data with both 2020 and 2019 formats (some have space after the comma, some don't): x <-
c("ABBASI, Mohammad", "ARQUIZA, Jose Maria Reynaldo Apollo",
"Aaberg,Kelsea", "Abadjivor, Enyah", "Abayesu,Precious", "Abbas, James",
"Abbaszadegan, Morteza", "Abbe, Scott", "Abbl, Norma", "Abbott, Joshua",
"Abbott, Joshua", "Abdollahi,Amir", "Abdou, Olgeanna", "Abdurhman, Abdurazak",
"Abel, John", "Abele, Kelsey", "Aberle,James", "Abhyankar, Aditya",
"Abi Karam, Karam", "AbiNader,Millan", "Aboalam, Safaa", "Abraha, Naomi",
"Abramchuk, Mykola", "Abrams, Cristen", "Abrams,Kristen") |
Hi,
Also, when running the functions for step 2.
This error appears.
![image](https://user-images.githubusercontent.com/76553425/135770907-d8500782-c47a-473c-aea1-e1cec60bae61.png)
Please, could you have an explanation for it?
Thanks
The text was updated successfully, but these errors were encountered: