feat: `match_name` should aggregate across all similar loans prior to outputting results #335

jdhoffa · 2020-12-01T13:37:22Z

In the reprex below, we see two almost identical loans, with two different values for id_loan. The corresponding output of match_name will have this repeated as many times as there are different id_loan.

I'm not sure if there is an internal reason that we decided to do this, but if it's possible it would be easier for the user to only have to manually validate these output one.

library(r2dii.match)

lbk <- tibble::tribble(
  ~sector_classification_system, ~id_ultimate_parent,             ~name_ultimate_parent, ~id_direct_loantaker,                ~name_direct_loantaker, ~sector_classification_direct_loantaker, ~id_loan,
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    3511,     "L1",
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    3511,     "L2"
)

ald <- tibble::tribble(
  ~name_company, ~sector,                ~alias_ald,
  "alpine knits india pvt. limited", "power", "alpineknitsindiapvt ltd"
)

match_name(lbk, ald) %>% 
  dplyr::select(id_loan, name, sector, name_ald, sector_ald, score, level) %>% 
  prioritize()
#> # A tibble: 2 x 7
#>   id_loan name              sector name_ald           sector_ald score level    
#>   <chr>   <chr>             <chr>  <chr>              <chr>      <dbl> <chr>    
#> 1 L1      Alpine Knits Ind… power  alpine knits indi… power          1 ultimate…
#> 2 L2      Alpine Knits Ind… power  alpine knits indi… power          1 ultimate…

^{Created on 2020-12-01 by the reprex package (v0.3.0)}

AB#10177

The text was updated successfully, but these errors were encountered:

jdhoffa · 2020-12-01T13:37:30Z

Thanks @georgeharris2deg

maurolepore · 2020-12-02T13:25:04Z

I'm not sure if there is an internal reason that we decided to do this, but if it's possible it would be easier for the user to only have to manually validate these output one.

This output would be explained by us picking rows with distinct values of only id_loan. We could probabbly detect the similarity in other columns. The decision seems to depend on how much of a problem this is and if it is worth adding the complexity in the code.

jdhoffa · 2024-03-26T13:26:27Z

Updating that recent inspection shows that this is still the case:

library(r2dii.match)

lbk <- tibble::tribble(
  ~sector_classification_system, ~id_ultimate_parent,             ~name_ultimate_parent, ~id_direct_loantaker,                ~name_direct_loantaker, ~sector_classification_direct_loantaker, ~id_loan,
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    "D35.1",     "L1",
  "NACE",              "UP15", "Alpine Knits India Pvt. Limited",               "C294", "Yuamen Xinneng Thermal Power Co Ltd",                                    "D35.1",     "L2"
)

ald <- tibble::tribble(
  ~name_company, ~sector,                ~alias_ald,
  "alpine knits india pvt. limited", "power", "alpineknitsindiapvt ltd"
)

match_name(lbk, ald) %>% 
  dplyr::select(id_loan, name, sector, name_abcd, sector_abcd, score, level) %>% 
  prioritize()
#> # A tibble: 2 × 7
#>   id_loan name                          sector name_abcd sector_abcd score level
#>   <chr>   <chr>                         <chr>  <chr>     <chr>       <dbl> <chr>
#> 1 L1      Alpine Knits India Pvt. Limi… power  alpine k… power           1 ulti…
#> 2 L2      Alpine Knits India Pvt. Limi… power  alpine k… power           1 ulti…

^{Created on 2024-03-26 with reprex v2.1.0}

jdhoffa added the feature a feature request or enhancement label Feb 6, 2024

jdhoffa self-assigned this Feb 6, 2024

jdhoffa added the ADO Add issue to ADO label Feb 6, 2024

jdhoffa changed the title ~~match_name should aggregate across all similar loans prior to outputting results~~ feat: match_name should aggregate across all similar loans prior to outputting results Mar 6, 2024

jdhoffa added ADO Add issue to ADO and removed ADO Add issue to ADO labels Mar 6, 2024

jdhoffa removed their assignment Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `match_name` should aggregate across all similar loans prior to outputting results #335

feat: `match_name` should aggregate across all similar loans prior to outputting results #335

jdhoffa commented Dec 1, 2020 •

edited by azure-boards bot

Loading

jdhoffa commented Dec 1, 2020

maurolepore commented Dec 2, 2020

jdhoffa commented Mar 26, 2024

feat: match_name should aggregate across all similar loans prior to outputting results #335

feat: match_name should aggregate across all similar loans prior to outputting results #335

Comments

jdhoffa commented Dec 1, 2020 • edited by azure-boards bot Loading

jdhoffa commented Dec 1, 2020

maurolepore commented Dec 2, 2020

jdhoffa commented Mar 26, 2024

feat: `match_name` should aggregate across all similar loans prior to outputting results #335

feat: `match_name` should aggregate across all similar loans prior to outputting results #335

jdhoffa commented Dec 1, 2020 •

edited by azure-boards bot

Loading