feat: replace `id_r2dii` with `company_id` input from `ald_demo` #375

jdhoffa · 2021-07-05T08:17:53Z

Currently, we uniquely identify matches using an internally generate id, which is the unique group indices for sector and company.

ald_demo now contains the column company_id which uniquely identifies each company. HOWEVER, it likely does not uniquely identify each company + sector combination. We should consider how we can move towards using this company_id column, and move away from the id_r2dii column.

Relates to #355

AB#10179

The text was updated successfully, but these errors were encountered:

jdhoffa · 2021-09-21T18:40:27Z

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)

select(ald_demo, company_id, name_company) %>% 
  unique() %>% 
  head()
#> # A tibble: 6 × 2
#>   company_id name_company                    
#>   <chr>      <chr>                           
#> 1 1          aba hydropower generation co ltd
#> 2 2          achinsky glinoziemskij kombinat 
#> 3 3          affinity renewables inc.        
#> 4 4          africa oil corp                 
#> 5 5          africon shipping sa             
#> 6 6          agni steels private limited

^{Created on 2021-09-21 by the reprex package (v2.0.0)}

maurolepore · 2021-09-21T18:57:23Z

Currently, we uniquely identify matches using ... the unique group indices for sector and company.

the [new] column company_id ... uniquely identifies each company [but does't consider sector].

So is this the job?:

Produce group indices from company_id + sector
(instead of company + sector).

maurolepore · 2021-09-21T19:02:47Z

Also, are you saying that now ald_demo is no longer only a demo dataset but instead a "real" dataset we use to match companies in the wild with an id we came up with?

I need to read the comments in the PR that addes company_id to ald_demo -- likely I can answer my own question. But still I want to capture my first reaction to this "mixed" responsability of ald_demo in case I didn't notice the issue the first time.

jdhoffa · 2021-09-21T19:04:00Z

No, it is definitely still a completely demo dataset.

jdhoffa · 2021-09-21T19:05:06Z

The only change that was made to ald_demo is that it gained the column company_id (with entries of made-up company ids, still fake). This change just reflects the fact that the REAL data (that is owned and distributed by Asset Resolution) also gained that column (and the real data obviously has real ID values in that column).

jdhoffa · 2021-09-21T19:05:30Z

The change proposed here is basically to make use of that new column that is now being served with the most recent data.

jdhoffa · 2021-09-21T19:15:48Z

I've tried to explain this more clearly in code, along with a failing test:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)

# match_name currently outputs a column id_2dii. this "ID" is generated 
# arbitrarily by the function
out <- r2dii.data::loanbook_demo %>% 
  match_name(r2dii.data::ald_demo)

out %>% 
  select(id_2dii, level, sector_ald, name_ald)
#> # A tibble: 410 × 4
#>    id_2dii level                 sector_ald name_ald                            
#>    <chr>   <chr>                 <chr>      <chr>                               
#>  1 UP15    ultimate_parent       power      alpine knits india pvt. limited     
#>  2 IP19    intermediate_parent_1 power      ykk usa, inc.                       
#>  3 UP288   ultimate_parent       power      university of iowa                  
#>  4 DL305   direct_loantaker      power      yukon development corp              
#>  5 UP104   ultimate_parent       power      garland power & light               
#>  6 DL304   direct_loantaker      power      yukon development corp              
#>  7 UP83    ultimate_parent       power      earthpower technologies sydney pty …
#>  8 UP163   ultimate_parent       power      kraftwerk mehrum gmbh               
#>  9 UP138   ultimate_parent       power      jai bharat gum & chemicals ltd.     
#> 10 UP32    ultimate_parent       power      bhagwan energy ltd.                 
#> # … with 400 more rows

# ald_demo now has a column `company_id`
r2dii.data::ald_demo %>% 
  select(company_id, name_company) %>% 
  unique() %>% 
  head()
#> # A tibble: 6 × 2
#>   company_id name_company                    
#>   <chr>      <chr>                           
#> 1 1          aba hydropower generation co ltd
#> 2 2          achinsky glinoziemskij kombinat 
#> 3 3          affinity renewables inc.        
#> 4 4          africa oil corp                 
#> 5 5          africon shipping sa             
#> 6 6          agni steels private limited

# let's look at the company "alpine knits ..." which is in both ald_demo and the 
# demo output of `match_name`

# in ald_demo
r2dii.data::ald_demo %>% 
  filter(grepl("alpine", name_company)) %>% 
  select(company_id, name_company) %>% 
  unique() %>% 
  head()
#> # A tibble: 1 × 2
#>   company_id name_company                   
#>   <chr>      <chr>                          
#> 1 16         alpine knits india pvt. limited

# in output of match_name
out %>% 
  filter(grepl("alpine", name_ald)) %>% 
  select(id_2dii, level, sector_ald, name_ald)
#> # A tibble: 1 × 4
#>   id_2dii level           sector_ald name_ald                       
#>   <chr>   <chr>           <chr>      <chr>                          
#> 1 UP15    ultimate_parent power      alpine knits india pvt. limited

# in the match_name output, I would want to see instead a column `company_id` 
# with the value "16" (as above)

# You could, for example, use the following as a failing test
out_alpine <- out %>% 
  filter(grepl("alpine", name_ald))

expect_equal(out_alpine$company_id, 16L) 
#> Error in expect_equal(out_alpine$company_id, 16L): could not find function "expect_equal"
# of course this will fail, since the column company_id does not yet exist

^{Created on 2021-09-21 by the reprex package (v2.0.0)}

jdhoffa · 2021-09-21T19:17:44Z

Currently, we uniquely identify matches using ... the unique group indices for sector and company.

the [new] column company_id ... uniquely identifies each company [but does't consider sector].

So is this the job?:

Produce group indices from company_id + sector
(instead of company + sector).

I actually no longer think this is an issue. company_id does not need to be unique to the sector on output, because we also output explicitly the column sector_ald. (ie. the user can always explicitly determine uniquely companies by usng the combination of company_id and sector_ald from the output data).

^If the above doesn't make sense, don't worry. But I believe you can disregard my initial concerns about sector.

jdhoffa · 2021-09-21T19:20:22Z

One last consideration, is that this will change how the argument match_name(..., overwrite = ...) is handled, as explained here: https://2degreesinvesting.github.io/r2dii.match/articles/r2dii-match.html#maybe-overwrite-matches

In particular, overwrite expects a dataframe input, with a column id_2dii. This will also need to be replaced by company_id

jacobvjk · 2021-09-29T15:08:18Z

Just want wo highlight that I would very much support the use of company_id soon, as tis is also what we would normally use to match data from different sources in stress testing. The credit risk stress test for banks would be so much more robust if I did not only have to join on company names for which I can't be sure I am doing the right transformation of the format. Big vote from my side

jdhoffa · 2024-02-06T15:31:28Z

Relates to #212

jdhoffa mentioned this issue Jul 5, 2021

Add company ID to ald_demo #355

Closed

jacobvjk mentioned this issue Sep 29, 2021

879 use eikon data loanbook workflow 2DegreesInvesting/r2dii.climate.stress.test#60

Merged

jdhoffa added medium Likely finished in under a week feature a feature request or enhancement breaking change ☠️ API change likely to affect existing code ADO Add issue to ADO labels Feb 6, 2024

jdhoffa self-assigned this Feb 6, 2024

jdhoffa mentioned this issue Mar 6, 2024

breaking change: replace the concept of id_2dii with the AI specific company ID #212

Closed

jdhoffa removed the ADO Add issue to ADO label Mar 6, 2024

jdhoffa changed the title ~~Replace id_r2dii with company_id input from ald_demo~~ feat: replace id_r2dii with company_id input from ald_demo Mar 6, 2024

jdhoffa added ADO Add issue to ADO and removed ADO Add issue to ADO labels Mar 6, 2024

jdhoffa removed their assignment Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace `id_r2dii` with `company_id` input from `ald_demo` #375

feat: replace `id_r2dii` with `company_id` input from `ald_demo` #375

jdhoffa commented Jul 5, 2021 •

edited by azure-boards bot

Loading

jdhoffa commented Sep 21, 2021

maurolepore commented Sep 21, 2021 •

edited

Loading

maurolepore commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021 •

edited

Loading

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jacobvjk commented Sep 29, 2021

jdhoffa commented Feb 6, 2024

feat: replace id_r2dii with company_id input from ald_demo #375

feat: replace id_r2dii with company_id input from ald_demo #375

Comments

jdhoffa commented Jul 5, 2021 • edited by azure-boards bot Loading

jdhoffa commented Sep 21, 2021

maurolepore commented Sep 21, 2021 • edited Loading

maurolepore commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021 • edited Loading

jdhoffa commented Sep 21, 2021

jdhoffa commented Sep 21, 2021

jacobvjk commented Sep 29, 2021

jdhoffa commented Feb 6, 2024

feat: replace `id_r2dii` with `company_id` input from `ald_demo` #375

feat: replace `id_r2dii` with `company_id` input from `ald_demo` #375

jdhoffa commented Jul 5, 2021 •

edited by azure-boards bot

Loading

maurolepore commented Sep 21, 2021 •

edited

Loading

jdhoffa commented Sep 21, 2021 •

edited

Loading