Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping weights following final linkage #8

Open
JoshRoll opened this issue Aug 30, 2021 · 0 comments
Open

Keeping weights following final linkage #8

JoshRoll opened this issue Aug 30, 2021 · 0 comments

Comments

@JoshRoll
Copy link

Hi there,
I am still learning reclin functionality but have been pretty happy with this package so thank you so much for your work and making this utility available.

I am working on validating the linkage and proper threshold and in so doing trying to retain the weights but for some reason i am losing them and just looking for confirmation that the approach i am taking should work. When i join the Linked.. data frame with the P_Links_Att.. data frame on Id.x and Id.y i would expect this to give me all the weights onto that linked data set up the vast majority of the Linked.. records don't get a weight and looking into the p object and associated P_Link_Atts.. data frame there are many linkages shown in the Linked.. dataframe that are not in the weights.

My presumption is that all the x and y values are row names so create separate columns titled "Id.x" and "Id.y" as joining vectors but maybe thats where i am going wrong.

My goal is to just be able to retain the weights values after applying the link() function so i can check how my linkage does based on weight so i can adjust. Thanks for any help and hope this issue is clear. Sorry to not be able to supply data but its filled with PII but if a more workable example is necessary i can build some vignette data.

      #Blocking
	p <- pair_blocking(Select_Ems..,Select_Partic.., c("County","Crash_Week"), large = FALSE)
	
       #Compare the records on their linkage keys - basic
	#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex"))
	#Compare using Jaro-Winkler
	#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex","Crash_Date"),  default_comparator = 
          jaro_winkler(0.9), overwrite = TRUE)
	p <- compare_pairs(p, by = 
             c("First_Name","Middle_Initial","Last_Name","DOB_Day","DOB_Month","DOB_Year","Sex","Crash_Date"),  
        default_comparator = jaro_winkler(0.9), overwrite = TRUE)	      

     #Force 1 to 1 linkage
       p_4 <- select_n_to_m(p, "weight", var = "ntom", threshold = 2.2)
	
     #Keep only links with x id
	Linked.. <- link(p_4, all_x=TRUE, all_y = FALSE)
     #Create a data frame object of linked data attributes 
	P_Link_Atts.. <- as.data.frame(p) %>% mutate(Id.x = as.character(x), Id.y = as.character(y))
    #Join weights 
    Linked.. <- left_join(Linked.., P_Link_Atts.., by = c("Id.x","Id.y"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant