Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add company ID to ald_demo #355

Closed
vintented opened this issue Jan 26, 2021 · 16 comments
Closed

Add company ID to ald_demo #355

vintented opened this issue Jan 26, 2021 · 16 comments

Comments

@vintented
Copy link

It would be great to add company IDs (unique numeric sequence) to ald_demo so Asset Resolution can start integrating them into the PACTA for Bank datasets. This addition would ensure that production is always aggregated to the correct entity and help with QA issues and tracking.

Let me know if I can provide any additional details.

@maurolepore
Copy link
Contributor

Thanks! @jdhoffa what do you think?

@jdhoffa
Copy link
Member

jdhoffa commented Jan 27, 2021

I think this is a good call, but have a hunch it may introduce some necessary changes to r2dii.match. In particular, I think that in principle, this ID should replace the id_r2dii ID that we generate in that package. HOWEVER, if for some reason the IDs provided by AR turn out to be corrupt, it could break r2dii.match, so maybe we could do a check in the beginning of match_name(), use the AR IDs if they satisfy our uniqueness criteria (unique by company name + sector), otherwise use id_r2dii.

In any case, I think we should start with a PR here, introducing the new column, and leave this PR open while we test the effects on downstream packages. It should be a concerted effort, so let's flag this in the next PACTA dev prioritization session, and figure out when we'll have time to do it.

@vintented
Copy link
Author

@jdhoffa to confirm the company IDs are unique. @tposey28 I thought adding IDs to the PACTA for Banks dataset would be a useful addition :)

@maurolepore
Copy link
Contributor

Can you explain a bit what the company ID are and where they come from? Can you show an example?

This addition would ensure that production is always aggregated to the correct entity and help with QA issues and tracking.

@jdhoffa, would it be safer to move backwards? is there a way to map the matched output back to the company IDs so that we can get the benefits of the IDs in r2dii.analysis without changing upstream code?

@vintented
Copy link
Author

@maurolepore sorry about the slow response. It has been a busy month, already. The company IDs are generated in Asset Resolution database and are unique to each entity. If needed, @tposey28 can provide additional details.

Here are some examples:
Company ID | Company Name
478460 | Interoil Argentina As
931 | Boeing Co/The

@maurolepore
Copy link
Contributor

maurolepore commented Feb 2, 2021

Thanks @vintented

I'm I right in understanding that the ald_demo you with would look like this?:

devtools::load_all()
#> Loading r2dii.match

ald_wish <- fake_ald(
  id_ar = c(478460, 931), 
  name_company = tolower(c("Interoil Argentina As", "Boeing Co/The"))
)

ald_wish
#> # A tibble: 2 x 4
#>   name_company          sector alias_ald                id_ar
#>   <chr>                 <chr>  <chr>                    <dbl>
#> 1 interoil argentina as power  alpineknitsindiapvt ltd 478460
#> 2 boeing co/the         power  alpineknitsindiapvt ltd    931

Created on 2021-02-02 by the reprex package (v0.3.0)

(Here id_ar means id_<asset resolution> but could be any other meaningful name.)

@jdhoffa
Copy link
Member

jdhoffa commented Feb 2, 2021

The company IDs are generated in Asset Resolution database and are unique to each entity.

Are they unique to each entity + sector combination? ie. Interoil Argentina As would likely have the same id, regardless if we're talking about the power or gas sector? (This is fine, it's just something we need to think about when we work to fix match_name to allow us to use it.)

@vintented
Copy link
Author

@jdhoffa they are unique to company regardless of the sector, or in other words, consistent across sectors.

@maurolepore
Copy link
Contributor

This addition would ensure that production is always aggregated to the correct entity and help with QA issues and tracking.

@jdhoffa, am I right in thinking that the benefit that @vintented wants is at the level of r2dii.analysis -- not further upstream?

To change things upstream we can, but do we have to? Or can we first get the benefits in a safer way then roll the solution deeper into the dependency tree?

@tposey28
Copy link

tposey28 commented Feb 2, 2021

Hey jumping in here quickly!
They are unique, but not by sector. Sometimes I use a hacky concatenation of the AR ID and '-sector' to get a unique ID at that level, but the IDs are unique for the entity at a company level.

The IDs are kept consistent by comparing the Bloomberg IDs and LEIs if there is financial data, if not then Global Data ID if available, if not then unique simplified name and country combination. We match run this logic against every new quarter of data. Of course an old company may slip in with a new ID due to it failing at all of these steps, but then Vincent and I will often go through and reconcile these with the old IDs (the important noticeable ones at least) by looking for old IDs that lost production and new IDs that gained production.

The main benefit for this is if Global Data or Bloomberg changes a name, but not their ID, we will update the name without changing the AR ID. This is useful for clients who may otherwise argue that a company disappeared.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 4, 2021

@maurolepore I think all of this is fine:

  • Add the new column to ald_demo (and ensure it doesn't break any downstream tests, I don't think it will do anything as is tbh)
  • See if we can use of the new ID column in match_name() and see if it can replace id_r2dii (likely in concert with sector using group_indices()). Just adding the column to ald_demo won't output anything new/ useful in match_name() I don't think.

I will open the first point as a draft PR today, and let's see if anything breaks and go from there.

@vintented
Copy link
Author

@jdhoffa and @maurolepore super exciting! You would be surprised how excited people are about company IDs. Let me know if I can help in anyway :)

@georgeharris2deg
Copy link

@jdhoffa @maurolepore @tposey28 @vintented @daisy-pacheco @Lauramirez-2ii

This is needed for some open engagement with emerging market banks.
Not sure of the prioritisation process but just wanted to bump this up this list.
Happy to discuss

Thanks,

@georgeharris2deg
Copy link

Reconsidering the above comment and noting the conversation form PACTA - AR call

This solution of adding Unique IDs to each AR data release for Banks and the subsequent code changes that this will require is no longer a top priority. This would be good to have for March 2021 and will help a bank to preserve there matches from a previous matching exercise, year on year.

A short term solution for a bank wanting to match now using the old (q4 2019) data ALD is to proceed with matching and then use an excel bridging file to manually carry over the old matches to the new data set q4 2020.
NB - this is the solution for open emerging markets banks

thanks all!

@jdhoffa jdhoffa transferred this issue from RMI-PACTA/r2dii.data Feb 22, 2021
@jdhoffa
Copy link
Member

jdhoffa commented Feb 22, 2021

I moved this issue to r2dii.match, as the remainder of the fix will be in actually allowing the user to implement the id_company in the matching process.

@jdhoffa
Copy link
Member

jdhoffa commented Jul 5, 2021

Closing in favour of #375

@jdhoffa jdhoffa closed this as completed Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants