Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate why some cells have no values #58

Closed
Robsteranium opened this issue Apr 14, 2021 · 8 comments
Closed

Investigate why some cells have no values #58

Robsteranium opened this issue Apr 14, 2021 · 8 comments
Assignees
Labels
bug Something isn't working data Related to some underlying/upstream data issue question Further information is requested

Comments

@Robsteranium
Copy link
Contributor

With all geo codelists selected, some rows (2 and 5) have no cell values in the geo column.

Not sure how this could be. Maybe their dimension points to a codelist but they don't use any of it's values...

@Robsteranium Robsteranium added the question Further information is requested label Apr 14, 2021
@Robsteranium Robsteranium self-assigned this Apr 15, 2021
@Robsteranium
Copy link
Contributor Author

This is being caused by some codes being in more than one scheme, e.g.

{
  "@id": "http://data.europa.eu/nuts/code/UKC",
  "label": "NORTH EAST (ENGLAND)",
  "scheme": [
    "http://data.europa.eu/nuts/scheme/2010",
    "http://data.europa.eu/nuts/scheme/2016",
    "http://data.europa.eu/nuts/scheme/2013",
    "data/gss_data/trade/ons-international-trade-in-services-by-subnational-areas-of-the-uk#scheme/location",
    "data/gss_data/trade/ons-international-exports-of-services-from-subnational-areas-of-the-uk#scheme/service-origin-geography",
    "data/gss_data/trade/ons-quarterly-country-and-regional-gdp#scheme/reference-area"
  ]
}

This breaks the assumption that we can group codes by codelist in the cells.

We could ofc still do this, but then the same UKC code would appear 6 times (once under each scheme). Indeed we could already see that the same code is used in other dataset-specific schemes in other rows - that's the very purpose of the table!

We could try to filter the list of schemes to those relevant - e.g. removing those dataset-specific schemes from other datasets. Even if we could easily determine this we would still have the multiple harmonised schemes (here one per NUTS version). This might be useful information, but it's not particularly relevant to the dataset search/ comparison because the filters themselves express all the user cares about codelist versions (whether their code of interest is present).

We might just need to remove the codelist grouping altogether. This grouping is less important given that mixing schemes within datasets will be rarer than between them. We could still possibly provide this information (e.g. with a popover) but not use it to structure the layout. Instead we'd just show an ellipsised list of codes.

The facet match would then have the codelist level removed, looking instead like:

{:facets
  ({:name "Geography",
    :dimensions
    ({:ook/uri
      "data/gss_data/trade/ons-quarterly-country-and-regional-gdp#dimension/reference-area",
      :codes
      ({:ook/uri "http://data.europa.eu/nuts/code/UKC",
        :ook/type "skos:Concept",
        :priority ["2" "6"],
        :label "NORTH EAST (ENGLAND)",
        :narrower
        ["http://data.europa.eu/nuts/code/UKC1"
         "http://data.europa.eu/nuts/code/UKC2"],
        :broader
        ["http://data.europa.eu/nuts/code/UK"
         "data/gss_data/trade/international-trade-in-services-by-subnational-areas-of-the-uk#concept-scheme/location/nuts"],
        :notation "UKC",
        :scheme
        ["http://data.europa.eu/nuts/scheme/2010"
         "http://data.europa.eu/nuts/scheme/2016"
         "http://data.europa.eu/nuts/scheme/2013"
         "data/gss_data/trade/ons-international-trade-in-services-by-subnational-areas-of-the-uk#scheme/location"
         "data/gss_data/trade/ons-international-exports-of-services-from-subnational-areas-of-the-uk#scheme/service-origin-geography"
         "data/gss_data/trade/ons-quarterly-country-and-regional-gdp#scheme/reference-area"],
        :used "false"})})})}

In fact we might like to enrich this with codelist labels if we're going to show them in a popover.

@Robsteranium
Copy link
Contributor Author

Ok, working this through... it gets confusing because you can mix schemes by facet even with 1:1 dimension:codelist because the facet combines dimensions. We can distinguish these using the dimensions as grouping variable (rather than codelists as originally planned).

@Robsteranium
Copy link
Contributor Author

We've now used dimension as a grouping variable and lifted the query size limits. This seems to fill most of the blanks but some remain.

e.g. this search for Germany doesn't seem to include an example code on for the "ONS UK total trade" dataset. The count is correct (filters observations for Germany) but the cell is blank and the link is wrong.

@Robsteranium
Copy link
Contributor Author

This can sometimes be cause be sparsity e.g. this search shows a dataset which does include "BOP Services" and "Exports", but the first-matched observation for "BOP Services: Net financial transactions" doesn't match "Flow: Exports".

There may sometimes be no single observation that does both or it might be that the collapse just doesn't happen to find one with both (which might be solved by #52).

@Robsteranium Robsteranium added the bug Something isn't working label Apr 22, 2021
@kirahowe kirahowe added the data Related to some underlying/upstream data issue label Apr 23, 2021
@Robsteranium
Copy link
Contributor Author

I've got a draft implementation for #52 which doesn't appear to solve either of the above two cases ☹️

@Robsteranium
Copy link
Contributor Author

I've recreated the example from above with all geo codelists selected using the latest data from the beta environment. Now all the cells are populated.

@Robsteranium
Copy link
Contributor Author

Redoing the above example for Germany with the new data confirms this is still a problem.

@Robsteranium
Copy link
Contributor Author

Each of the previous examples is now solved on #68 (this mostly consists of increasing the default query size from 10).

One example was due to the child-dimension not being tied to the facet's parent dimension via rdfs:subPropertyOf.

Closing for now but we can re-open if new examples appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data Related to some underlying/upstream data issue question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants