Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: match_name should aggregate across all similar loans prior to outputting results #335

Open
jdhoffa opened this issue Dec 1, 2020 · 3 comments
Labels
ADO Add issue to ADO feature a feature request or enhancement

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Dec 1, 2020

In the reprex below, we see two almost identical loans, with two different values for id_loan. The corresponding output of match_name will have this repeated as many times as there are different id_loan.

I'm not sure if there is an internal reason that we decided to do this, but if it's possible it would be easier for the user to only have to manually validate these output one.

library(r2dii.match)

lbk <- tibble::tribble(
  ~sector_classification_system, ~id_ultimate_parent,             ~name_ultimate_parent, ~id_direct_loantaker,                ~name_direct_loantaker, ~sector_classification_direct_loantaker, ~id_loan,
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    3511,     "L1",
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    3511,     "L2"
)

ald <- tibble::tribble(
  ~name_company, ~sector,                ~alias_ald,
  "alpine knits india pvt. limited", "power", "alpineknitsindiapvt ltd"
)

match_name(lbk, ald) %>% 
  dplyr::select(id_loan, name, sector, name_ald, sector_ald, score, level) %>% 
  prioritize()
#> # A tibble: 2 x 7
#>   id_loan name              sector name_ald           sector_ald score level    
#>   <chr>   <chr>             <chr>  <chr>              <chr>      <dbl> <chr>    
#> 1 L1      Alpine Knits Ind… power  alpine knits indi… power          1 ultimate…
#> 2 L2      Alpine Knits Ind… power  alpine knits indi… power          1 ultimate…

Created on 2020-12-01 by the reprex package (v0.3.0)

AB#10177

@jdhoffa
Copy link
Member Author

jdhoffa commented Dec 1, 2020

Thanks @georgeharris2deg

@maurolepore
Copy link
Contributor

I'm not sure if there is an internal reason that we decided to do this, but if it's possible it would be easier for the user to only have to manually validate these output one.

This output would be explained by us picking rows with distinct values of only id_loan. We could probabbly detect the similarity in other columns. The decision seems to depend on how much of a problem this is and if it is worth adding the complexity in the code.

@jdhoffa jdhoffa added the feature a feature request or enhancement label Feb 6, 2024
@jdhoffa jdhoffa self-assigned this Feb 6, 2024
@jdhoffa jdhoffa added the ADO Add issue to ADO label Feb 6, 2024
@jdhoffa jdhoffa changed the title match_name should aggregate across all similar loans prior to outputting results feat: match_name should aggregate across all similar loans prior to outputting results Mar 6, 2024
@jdhoffa jdhoffa added ADO Add issue to ADO and removed ADO Add issue to ADO labels Mar 6, 2024
@jdhoffa
Copy link
Member Author

jdhoffa commented Mar 26, 2024

Updating that recent inspection shows that this is still the case:

library(r2dii.match)

lbk <- tibble::tribble(
  ~sector_classification_system, ~id_ultimate_parent,             ~name_ultimate_parent, ~id_direct_loantaker,                ~name_direct_loantaker, ~sector_classification_direct_loantaker, ~id_loan,
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    "D35.1",     "L1",
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    "D35.1",     "L2"
)

ald <- tibble::tribble(
  ~name_company, ~sector,                ~alias_ald,
  "alpine knits india pvt. limited", "power", "alpineknitsindiapvt ltd"
)

match_name(lbk, ald) %>% 
  dplyr::select(id_loan, name, sector, name_abcd, sector_abcd, score, level) %>% 
  prioritize()
#> # A tibble: 2 × 7
#>   id_loan name                          sector name_abcd sector_abcd score level
#>   <chr>   <chr>                         <chr>  <chr>     <chr>       <dbl> <chr>
#> 1 L1      Alpine Knits India Pvt. Limi… power  alpine k… power           1 ulti…
#> 2 L2      Alpine Knits India Pvt. Limi… power  alpine k… power           1 ulti…

Created on 2024-03-26 with reprex v2.1.0

@jdhoffa jdhoffa removed their assignment Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADO Add issue to ADO feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants