Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Borderline col in classification Bridges #163

Closed
georgeharris2deg opened this issue Oct 2, 2020 · 8 comments · Fixed by #170
Closed

Borderline col in classification Bridges #163

georgeharris2deg opened this issue Oct 2, 2020 · 8 comments · Fixed by #170

Comments

@georgeharris2deg
Copy link

My understanding t=is that the Borderline column should flag as FALSE when the sector classification code is perfectly in scope.

    code   description            Borderline

eg - 3456, auto manufacture, FALSE
-3457, auto parts, TRUE

This is not the case - instead, all FALSE borderline flags are for sectors not in scope.
Perhaps my understanding is incorrect?

Thanks - George

@georgeharris2deg georgeharris2deg added the bug Something isn't working label Oct 2, 2020
@maurolepore
Copy link
Contributor

maurolepore commented Oct 2, 2020

Good point @georgeharris2deg, I think this is confusing because the documentation of borderline is circular:

borderline (character): Flag indicating if 2dii sector and classification are a borderline match.

It uses "borderline" to define "borderline".

It seems that borderline refers not to the sector but to a finer level of classification -- the code within the sector; but I only understand (or think I understand) after something like this -- which is too much effort for something we could document up front:

dplyr::filter(r2dii.data::sic_classification, borderline == TRUE)
#> # A tibble: 13 x 4
#>    code  description                                         sector   borderline
#>    <chr> <chr>                                               <chr>    <lgl>     
#>  1 33210 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE      
#>  2 33220 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE      
#>  3 33230 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE      
#>  4 33290 other petroleum/synthesised products n.e.c.         oil and… TRUE      
#>  5 34240 manufacture of cement, lime and plaster             cement   TRUE      
#>  6 34250 manufacture of articles of concrete, cement and pl… cement   TRUE      
#>  7 35310 casting of iron and steel                           steel    TRUE      
#>  8 36100 manufacture of electric motors, generators and tra… power    TRUE      
#>  9 36200 manufacture of electricity distribution and contro… power    TRUE      
#> 10 38200 manufacture of bodies (coachwork) for motor vehicl… automot… TRUE      
#> 11 38309 manufacture of other motor vehicle parts and acces… automot… TRUE      
#> 12 50320 electrical contracting                              power    TRUE      
#> 13 73000 air transport                                       aviation TRUE

Created on 2020-10-02 by the reprex package (v0.3.0)

@jdhoffa, could you unpack the definition of borderline? Here is fine, I'm happy to submit the PR.

@maurolepore maurolepore added enhancement and removed bug Something isn't working labels Oct 2, 2020
@jdhoffa
Copy link
Member

jdhoffa commented Oct 2, 2020

@georgeharris2deg so I have defined borderline as indicating "maybe".
As in, if it is power, and a perfect match, borderline is false.
If is in transmission of power, borderline is true.
If it is DEFINITELY out of scope, then borderline is false

I hope that makes sense.

@maurolepore
Copy link
Contributor

@jdhoffa and @georgeharris2deg, based on your comments above I propose to rewrite the definiiton of borderline to this:

borderline (character): The value TRUE indicates that the match is uncertain between the 2dii sector and the classification. The value FALSE indicates that the match is certainly perfect or the classification is certainly out of 2dii's scope.

Does this make sense? Do you have any edits to make?

maurolepore added a commit that referenced this issue Oct 3, 2020
Closes #163

I included the script in the git repo so we can find it later if
we need it.
maurolepore added a commit that referenced this issue Oct 3, 2020
Closes #163

I included the script in the git repo so we can find it later if
we need it.
@georgeharris2deg
Copy link
Author

@jdhoffa - Great thanks and yes that was my understanding of it as well.

Then my question is that there are no Borderline = FALSE and sector = In scope in the NACE bridge.
This would indicate that there are no definite in scope NACE codes.
Rather, all the borderline = FALSE represents out of scope codes.

Thanks

(NB I haven't checked the other codes but I suspect its the same)

@maurolepore
Copy link
Contributor

Here are all the combinations between borderline and sector for all datasets. @georgeharris2deg, do you
see any problem?:

library(r2dii.data)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

r2dii.data:::enlist_datasets("r2dii.data") %>% 
  keep(~hasName(.x, "borderline")) %>% 
  map(~count(.x, borderline, sector)) %>% 
  map(~arrange(.x, borderline, sector))
#> $gics_classification
#> # A tibble: 14 x 3
#>    borderline sector           n
#>    <lgl>      <chr>        <int>
#>  1 FALSE      automotive       3
#>  2 FALSE      aviation         4
#>  3 FALSE      coal             1
#>  4 FALSE      not in scope   229
#>  5 FALSE      oil and gas      4
#>  6 FALSE      power            6
#>  7 FALSE      shipping         2
#>  8 FALSE      steel            1
#>  9 TRUE       automotive       4
#> 10 TRUE       cement           2
#> 11 TRUE       coal             1
#> 12 TRUE       oil and gas      4
#> 13 TRUE       shipping         1
#> 14 TRUE       steel            1
#> 
#> $isic_classification
#> # A tibble: 9 x 3
#>   borderline sector           n
#>   <lgl>      <chr>        <int>
#> 1 FALSE      not in scope   695
#> 2 TRUE       automotive      18
#> 3 TRUE       aviation         7
#> 4 TRUE       cement           2
#> 5 TRUE       coal             9
#> 6 TRUE       oil and gas     13
#> 7 TRUE       power            4
#> 8 TRUE       shipping         9
#> 9 TRUE       steel           11
#> 
#> $nace_classification
#> # A tibble: 9 x 3
#>   borderline sector           n
#>   <lgl>      <chr>        <int>
#> 1 FALSE      not in scope   840
#> 2 TRUE       automotive       8
#> 3 TRUE       aviation         7
#> 4 TRUE       cement           9
#> 5 TRUE       coal             7
#> 6 TRUE       oil and gas     16
#> 7 TRUE       power           10
#> 8 TRUE       shipping        12
#> 9 TRUE       steel           20
#> 
#> $naics_classification
#> # A tibble: 11 x 3
#>    borderline sector           n
#>    <lgl>      <chr>        <int>
#>  1 FALSE      automotive       3
#>  2 FALSE      aviation         5
#>  3 FALSE      cement           1
#>  4 FALSE      coal             3
#>  5 FALSE      not in scope  1024
#>  6 FALSE      oil and gas      2
#>  7 FALSE      power            8
#>  8 FALSE      shipping         3
#>  9 FALSE      steel            3
#> 10 TRUE       power            2
#> 11 TRUE       steel            3
#> 
#> $sector_classifications
#> # A tibble: 17 x 3
#>    borderline sector           n
#>    <lgl>      <chr>        <int>
#>  1 FALSE      automotive       7
#>  2 FALSE      aviation        10
#>  3 FALSE      cement           1
#>  4 FALSE      coal             5
#>  5 FALSE      not in scope  2994
#>  6 FALSE      oil and gas      9
#>  7 FALSE      power           15
#>  8 FALSE      shipping         6
#>  9 FALSE      steel            6
#> 10 TRUE       automotive      32
#> 11 TRUE       aviation        15
#> 12 TRUE       cement          15
#> 13 TRUE       coal            17
#> 14 TRUE       oil and gas     37
#> 15 TRUE       power           19
#> 16 TRUE       shipping        22
#> 17 TRUE       steel           36
#> 
#> $sic_classification
#> # A tibble: 14 x 3
#>    borderline sector           n
#>    <lgl>      <chr>        <int>
#>  1 FALSE      automotive       1
#>  2 FALSE      aviation         1
#>  3 FALSE      coal             1
#>  4 FALSE      not in scope   233
#>  5 FALSE      oil and gas      3
#>  6 FALSE      power            1
#>  7 FALSE      shipping         1
#>  8 FALSE      steel            2
#>  9 TRUE       automotive       2
#> 10 TRUE       aviation         1
#> 11 TRUE       cement           2
#> 12 TRUE       oil and gas      4
#> 13 TRUE       power            3
#> 14 TRUE       steel            1

Created on 2020-10-07 by the reprex package (v0.3.0)

@georgeharris2deg
Copy link
Author

Hi Mauro,
thanks, fo showing that break down and changing the description.
The description is now very clear.

Based on your example above I would then say that the NACE and ISIC bridges are wrong (in the borderline column)

There should be a FALSE for every sector. In the ISIC and NACE this is not the case implying that there is no code that is perfectly correct (in scope)

This isn't an issue for running the analysis - however, I see a potential issue when assessing coverage using the method designed by @jdhoffa here

unless I am missing something?

I will try to find the time to review all the bridges and let you know when I do.

Thanks again

@maurolepore
Copy link
Contributor

maurolepore commented Oct 9, 2020

Reopening.

@jdhoffa can you please address this comment?:

Based on your example above I would then say that the NACE and ISIC bridges are wrong (in the borderline column)

There should be a FALSE for every sector. In the ISIC and NACE this is not the case implying that there is no code that is perfectly correct (in scope)

@maurolepore maurolepore reopened this Oct 9, 2020
@jdhoffa
Copy link
Member

jdhoffa commented Oct 12, 2020

Looking into it now!

maurolepore pushed a commit that referenced this issue Oct 13, 2020
Closes #163
Signed-of-by: Mauro Lepore <maurolepore@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants