Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 2: Parse First Name - Final Project #67

Open
Dimetry-Adel opened this issue Oct 3, 2021 · 11 comments
Open

Step 2: Parse First Name - Final Project #67

Dimetry-Adel opened this issue Oct 3, 2021 · 11 comments

Comments

@Dimetry-Adel
Copy link

Hi,

Also, when running the functions for step 2.

top.25 <- head( d$Full.Name,25 )
first.25 <- get_first_name( name=top.25 )

data.frame( top.25, first.25 ) %>% pander()

This error appears.
image

Please, could you have an explanation for it?
Thanks

@lecy
Copy link
Collaborator

lecy commented Oct 3, 2021

You need to write the name parsing function called get_first_name() that will return only the first name.

The code is showing you what it should look like when working properly (full name vs first name):

image

@voznyuky
Copy link

voznyuky commented Oct 5, 2021

Are we using split strings in this or am I way off?

@lecy
Copy link
Collaborator

lecy commented Oct 5, 2021

Yep, you can build from what you did in lab 3 finding the first or last words in titles.

@voznyuky
Copy link

voznyuky commented Oct 5, 2021

Lab 3 is the one I struggled on the most. This is what I have so far, although it is just breaking up the strings. How do I pull the 2nd word from each string, when there are multiple strings? I might need a little extra help on this step of the assignment.

get_first_name <- function( x )
{
first.names <- strsplit( d$Full.Name, " " )
return( first.names )
}

get_first_name()

@lecy
Copy link
Collaborator

lecy commented Oct 5, 2021

What is the right delimiter to use for the split? Is it a space in this context?

@lecy
Copy link
Collaborator

lecy commented Oct 5, 2021

Describe your pseudo code

@lecy
Copy link
Collaborator

lecy commented Oct 6, 2021

Some code to get you started:

> x <- head( d$Full.Name, 5 )
> x
[1] "ABBASI, Mohammad"                    "ARQUIZA, Jose Maria Reynaldo Apollo"
[3] "Aaberg, Kelsea"                      "Abadjivor, Enyah"                   
[5] "Abayesu, Precious" 
> 
> # DROP LAST NAMES: 
> x.list <- strsplit( x, ", " )
> x.list
[[1]]
[1] "ABBASI"   "Mohammad"

[[2]]
[1] "ARQUIZA"                    "Jose Maria Reynaldo Apollo"

[[3]]
[1] "Aaberg" "Kelsea"

[[4]]
[1] "Abadjivor" "Enyah"    

[[5]]
[1] "Abayesu"  "Precious"

> 
> # get second element in each vector
> x.list[[1]][2]
[1] "Mohammad"
> x.list[[2]][2]
[1] "Jose Maria Reynaldo Apollo"
> x.list[[3]][2]
[1] "Kelsea"
> 
> # scale this with a loop?
> 
> x.second <- NULL
> for( i in 1:length(x) )
+ {
+    x.second[i] <- x.list[[i]][2]
+ }
> x.second
[1] "Mohammad"                   "Jose Maria Reynaldo Apollo"
[3] "Kelsea"                     "Enyah"                     
[5] "Precious"                  
> 
> # ALTERNATIVELY use a lapply (list apply) function: 
> # 
> # GET SECOND VALUE IN EACH VECTOR
> # function(x){ x[2] }
> 
> x2 <- lapply( x.list, function(x){ x[2] } )
> x2
[[1]]
[1] "Mohammad"

[[2]]
[1] "Jose Maria Reynaldo Apollo"

[[3]]
[1] "Kelsea"

[[4]]
[1] "Enyah"

[[5]]
[1] "Precious"

> x3 <- unlist( x2 )
> x3
[1] "Mohammad"                   "Jose Maria Reynaldo Apollo"
[3] "Kelsea"                     "Enyah"                     
[5] "Precious"    

Now you have simplified the problem. Next step is to get the first name in each string:

"Jose Maria Reynaldo Apollo" --> "Jose"

Split it apart again, this time using spaces. And extract the first value in each vector.

@Sean-In-The-Library
Copy link

Sean-In-The-Library commented Oct 8, 2021

Lab 3 is the one I struggled on the most. This is what I have so far, although it is just breaking up the strings. How do I pull the 2nd word from each string, when there are multiple strings? I might need a little extra help on this step of the assignment.

get_first_name <- function( x ) { first.names <- strsplit( d$Full.Name, " " ) return( first.names ) }

get_first_name()

Same, this step is really rough for me. Pretty humbling...

@lecy
Copy link
Collaborator

lecy commented Oct 8, 2021

Here is some code to get you started:

#83 (comment)

For the second part you basically repeat the same steps, but use a space as the split value then grab the first element in each vector instead of the second. This will drop middle names for cases like Jose:

"ARQUIZA, Jose Maria Reynaldo Apollo"

@lecy
Copy link
Collaborator

lecy commented Oct 8, 2021

If writing code that doesn't work makes you humble then at this point I might be a saint :-)

@lecy
Copy link
Collaborator

lecy commented Oct 8, 2021

Here is some test data with both 2020 and 2019 formats (some have space after the comma, some don't):

x <- 
c("ABBASI, Mohammad", "ARQUIZA, Jose Maria Reynaldo Apollo", 
"Aaberg,Kelsea", "Abadjivor, Enyah", "Abayesu,Precious", "Abbas, James", 
"Abbaszadegan, Morteza", "Abbe, Scott", "Abbl, Norma", "Abbott, Joshua", 
"Abbott, Joshua", "Abdollahi,Amir", "Abdou, Olgeanna", "Abdurhman, Abdurazak", 
"Abel, John", "Abele, Kelsey", "Aberle,James", "Abhyankar, Aditya", 
"Abi Karam, Karam", "AbiNader,Millan", "Aboalam, Safaa", "Abraha, Naomi", 
"Abramchuk, Mykola", "Abrams, Cristen", "Abrams,Kristen")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants