Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe make match_name case-insensitive to input columns sector and technology #257

Closed
jdhoffa opened this issue Aug 6, 2020 · 1 comment · Fixed by #271
Closed

Maybe make match_name case-insensitive to input columns sector and technology #257

jdhoffa opened this issue Aug 6, 2020 · 1 comment · Fixed by #271
Assignees

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Aug 6, 2020

match_name is currently case sensitive to the input columns sector and technology (and probably others) of ald_demo. A bank has complained about this.

I think it wouldn't hurt to be a little more flexible around this... although I'm not necessarily convinced. We would need to then make the rest of the r2dii ecosystem case insensitive, as this would likely probagate down to r2dii.analysis etc. Anyway, worth a discussion!

Thanks @georgeharris2deg

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)

loanbook_demo %>% 
  match_name(ald_demo)
#> # A tibble: 502 x 27
#>    id_loan id_direct_loant… name_direct_loa… id_intermediate… name_intermedia…
#>    <chr>   <chr>            <chr>            <chr>            <chr>           
#>  1 L1      C294             Yuamen Xinneng … <NA>             <NA>            
#>  2 L3      C292             Yuama Ethanol L… IP5              Yuama Inc.      
#>  3 L3      C292             Yuama Ethanol L… IP5              Yuama Inc.      
#>  4 L5      C305             Yukon Energy Co… <NA>             <NA>            
#>  5 L5      C305             Yukon Energy Co… <NA>             <NA>            
#>  6 L6      C304             Yukon Developme… <NA>             <NA>            
#>  7 L6      C304             Yukon Developme… <NA>             <NA>            
#>  8 L8      C303             Yueyang City Co… <NA>             <NA>            
#>  9 L9      C301             Yuedxiu Corp One IP10             Yuedxiu Group   
#> 10 L10     C302             Yuexi County AA… <NA>             <NA>            
#> # … with 492 more rows, and 22 more variables: id_ultimate_parent <chr>,
#> #   name_ultimate_parent <chr>, loan_size_outstanding <dbl>,
#> #   loan_size_outstanding_currency <chr>, loan_size_credit_limit <dbl>,
#> #   loan_size_credit_limit_currency <chr>, sector_classification_system <chr>,
#> #   sector_classification_input_type <chr>,
#> #   sector_classification_direct_loantaker <dbl>, fi_type <chr>,
#> #   flag_project_finance_loan <chr>, name_project <lgl>,
#> #   lei_direct_loantaker <lgl>, isin_direct_loantaker <lgl>, id_2dii <chr>,
#> #   level <chr>, sector <chr>, sector_ald <chr>, name <chr>, name_ald <chr>,
#> #   score <dbl>, source <chr>

loanbook_demo %>% 
  match_name(
    mutate(
      ald_demo, 
      sector = toupper(sector)
      )
  )
#> Warning: Found no match.
#> # A tibble: 0 x 27
#> # … with 27 variables: id_loan <chr>, id_direct_loantaker <chr>,
#> #   name_direct_loantaker <chr>, id_intermediate_parent_1 <chr>,
#> #   name_intermediate_parent_1 <chr>, id_ultimate_parent <chr>,
#> #   name_ultimate_parent <chr>, loan_size_outstanding <dbl>,
#> #   loan_size_outstanding_currency <chr>, loan_size_credit_limit <dbl>,
#> #   loan_size_credit_limit_currency <chr>, sector_classification_system <chr>,
#> #   sector_classification_input_type <chr>,
#> #   sector_classification_direct_loantaker <dbl>, fi_type <chr>,
#> #   flag_project_finance_loan <chr>, name_project <lgl>,
#> #   lei_direct_loantaker <lgl>, isin_direct_loantaker <lgl>, id_2dii <lgl>,
#> #   level <lgl>, sector <lgl>, sector_ald <lgl>, name <lgl>, name_ald <lgl>,
#> #   score <lgl>, source <lgl>

Created on 2020-08-06 by the reprex package (v0.3.0)

@maurolepore
Copy link
Contributor

Sounds like a reasonable request. This package is all about matching, and in other parts of the process (when matching company names) we do match in a case-insensitive way (I believe).

I can't say much about the impact on r2dii.analysis, but considered in isolation this makes sense.

@maurolepore maurolepore self-assigned this Aug 10, 2020
@maurolepore maurolepore added this to To do in r2dii via automation Aug 10, 2020
maurolepore added a commit to maurolepore/r2dii.match that referenced this issue Aug 10, 2020
r2dii automation moved this from To do to Done Aug 12, 2020
maurolepore added a commit that referenced this issue Aug 12, 2020
Closes #257

Co-authored-by: Jackson Hoffart <jackson.hoffart@gmail.com>
jdhoffa referenced this issue in RMI-PACTA/r2dii.analysis Aug 13, 2020
Relates to 2DegreesInvesting/r2dii.match#257
@AlexAxthelm AlexAxthelm removed this from Done in r2dii Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants