New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Borderline col in classification Bridges #163
Comments
Good point @georgeharris2deg, I think this is confusing because the documentation of
It uses "borderline" to define "borderline". It seems that dplyr::filter(r2dii.data::sic_classification, borderline == TRUE)
#> # A tibble: 13 x 4
#> code description sector borderline
#> <chr> <chr> <chr> <lgl>
#> 1 33210 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE
#> 2 33220 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE
#> 3 33230 petrol, fuel oils, lubricating oils and greases, p… oil and… TRUE
#> 4 33290 other petroleum/synthesised products n.e.c. oil and… TRUE
#> 5 34240 manufacture of cement, lime and plaster cement TRUE
#> 6 34250 manufacture of articles of concrete, cement and pl… cement TRUE
#> 7 35310 casting of iron and steel steel TRUE
#> 8 36100 manufacture of electric motors, generators and tra… power TRUE
#> 9 36200 manufacture of electricity distribution and contro… power TRUE
#> 10 38200 manufacture of bodies (coachwork) for motor vehicl… automot… TRUE
#> 11 38309 manufacture of other motor vehicle parts and acces… automot… TRUE
#> 12 50320 electrical contracting power TRUE
#> 13 73000 air transport aviation TRUE Created on 2020-10-02 by the reprex package (v0.3.0) @jdhoffa, could you unpack the definition of |
@georgeharris2deg so I have defined borderline as indicating "maybe". I hope that makes sense. |
@jdhoffa and @georgeharris2deg, based on your comments above I propose to rewrite the definiiton of
Does this make sense? Do you have any edits to make? |
Closes #163 I included the script in the git repo so we can find it later if we need it.
Closes #163 I included the script in the git repo so we can find it later if we need it.
@jdhoffa - Great thanks and yes that was my understanding of it as well. Then my question is that there are no Borderline = FALSE and sector = In scope in the NACE bridge. Thanks (NB I haven't checked the other codes but I suspect its the same) |
Here are all the combinations between library(r2dii.data)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
r2dii.data:::enlist_datasets("r2dii.data") %>%
keep(~hasName(.x, "borderline")) %>%
map(~count(.x, borderline, sector)) %>%
map(~arrange(.x, borderline, sector))
#> $gics_classification
#> # A tibble: 14 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE automotive 3
#> 2 FALSE aviation 4
#> 3 FALSE coal 1
#> 4 FALSE not in scope 229
#> 5 FALSE oil and gas 4
#> 6 FALSE power 6
#> 7 FALSE shipping 2
#> 8 FALSE steel 1
#> 9 TRUE automotive 4
#> 10 TRUE cement 2
#> 11 TRUE coal 1
#> 12 TRUE oil and gas 4
#> 13 TRUE shipping 1
#> 14 TRUE steel 1
#>
#> $isic_classification
#> # A tibble: 9 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE not in scope 695
#> 2 TRUE automotive 18
#> 3 TRUE aviation 7
#> 4 TRUE cement 2
#> 5 TRUE coal 9
#> 6 TRUE oil and gas 13
#> 7 TRUE power 4
#> 8 TRUE shipping 9
#> 9 TRUE steel 11
#>
#> $nace_classification
#> # A tibble: 9 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE not in scope 840
#> 2 TRUE automotive 8
#> 3 TRUE aviation 7
#> 4 TRUE cement 9
#> 5 TRUE coal 7
#> 6 TRUE oil and gas 16
#> 7 TRUE power 10
#> 8 TRUE shipping 12
#> 9 TRUE steel 20
#>
#> $naics_classification
#> # A tibble: 11 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE automotive 3
#> 2 FALSE aviation 5
#> 3 FALSE cement 1
#> 4 FALSE coal 3
#> 5 FALSE not in scope 1024
#> 6 FALSE oil and gas 2
#> 7 FALSE power 8
#> 8 FALSE shipping 3
#> 9 FALSE steel 3
#> 10 TRUE power 2
#> 11 TRUE steel 3
#>
#> $sector_classifications
#> # A tibble: 17 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE automotive 7
#> 2 FALSE aviation 10
#> 3 FALSE cement 1
#> 4 FALSE coal 5
#> 5 FALSE not in scope 2994
#> 6 FALSE oil and gas 9
#> 7 FALSE power 15
#> 8 FALSE shipping 6
#> 9 FALSE steel 6
#> 10 TRUE automotive 32
#> 11 TRUE aviation 15
#> 12 TRUE cement 15
#> 13 TRUE coal 17
#> 14 TRUE oil and gas 37
#> 15 TRUE power 19
#> 16 TRUE shipping 22
#> 17 TRUE steel 36
#>
#> $sic_classification
#> # A tibble: 14 x 3
#> borderline sector n
#> <lgl> <chr> <int>
#> 1 FALSE automotive 1
#> 2 FALSE aviation 1
#> 3 FALSE coal 1
#> 4 FALSE not in scope 233
#> 5 FALSE oil and gas 3
#> 6 FALSE power 1
#> 7 FALSE shipping 1
#> 8 FALSE steel 2
#> 9 TRUE automotive 2
#> 10 TRUE aviation 1
#> 11 TRUE cement 2
#> 12 TRUE oil and gas 4
#> 13 TRUE power 3
#> 14 TRUE steel 1 Created on 2020-10-07 by the reprex package (v0.3.0) |
Hi Mauro, Based on your example above I would then say that the NACE and ISIC bridges are wrong (in the borderline column) There should be a FALSE for every sector. In the ISIC and NACE this is not the case implying that there is no code that is perfectly correct (in scope) This isn't an issue for running the analysis - however, I see a potential issue when assessing coverage using the method designed by @jdhoffa here unless I am missing something? I will try to find the time to review all the bridges and let you know when I do. Thanks again |
Reopening. @jdhoffa can you please address this comment?:
|
Looking into it now! |
Closes #163 Signed-of-by: Mauro Lepore <maurolepore@gmail.com>
My understanding t=is that the Borderline column should flag as FALSE when the sector classification code is perfectly in scope.
eg - 3456, auto manufacture, FALSE
-3457, auto parts, TRUE
This is not the case - instead, all FALSE borderline flags are for sectors not in scope.
Perhaps my understanding is incorrect?
Thanks - George
The text was updated successfully, but these errors were encountered: