On import update modified values #309

alexandru-m-g · 2014-11-14T13:15:29Z

Right now we are only importing new indicator values, we are not updating values which were modified since the last import.

CHANGES:

At import time, if an indicator value already exists with the same:source, indicator type, entity (and entity type), periodicity and start time BUT it has a different value THEN we need to update it ( and specify that it was modified by the current import )
Add a new feature that allows a user to REMOVE all the values for a specific dataseries ( for a specific indicator type + source)
DISABLE the feature that allows a user to delete all values from a certain import.

This is related to #278

alexandru-m-g · 2014-11-21T01:33:19Z

I have committed the change for allowing the update of modified values. Point (1) in the list of changes above.

alexandru-m-g · 2014-12-29T22:16:33Z

I'm adding more detailed requirements about (2) - deleting dataseries values, per a discussion with @cjhendrix:

From the main menu:
- In the menu Curated data, in the first section, there should be a new entry like Manage dataseries values
- This will show to the user a table ( similar to the one we have for indicator types ) with : indicator type, source, number of values and a delete/empty action for each of the dataseries
During the import process:

A similar action should be part of the import process. Basically, during the validation step ( after going to Detected CKAN resources and clicking: download and validate ) an option should be shown called Manage dataseries values
This will open in a new window, a table that is similar to the one from (1): so it will show information from the CPS database about the dataseries that are to be imported
This will allow the data team to completly remove dataseries values before they are reimported

Caveat:
If we go this way, there is a moment when there is no data in the database for a dataseries. It's the moment between the deletion of the dataseries' values and the reimporting of these from the file. This could lead to some charts showing up empty OR users calling the API directly to get incomplete data.

One solution could be to:

at first, to not really delete the indicator values for a dataseries, but instead mark them as to be deleted
then, when importing new values, mark them as not yet activated. Basically, these values should not be used in API or reports yet.
When the import finishes, we can run -- inside a single transaction -- an action that deletes the to be deleted values and activates the not yet activated values

@cjhendrix @seustachi we should discuss how we proceed about this issue

cjhendrix · 2014-12-30T08:45:16Z

Thanks for documenting this, Alex. I'll add it to the CPS planning doc.

As for the caveat, I have a couple of questions:

I know on the reports side, we would avoid most data gaps because of the caching that is in place. However, I'm not sure that's true on the API side. Is there any caching on CPS or CKAN or nginx that would make a gap in data availability unlikely?
How much more effort is it to implement the "staged delete" solution you describe above?

seustachi · 2014-12-30T10:31:03Z

It doesn't sound like an enormous effort.

The model should be change to have this new status in the unique key as well.
I can work on this issue as well, @alexandru-m-g @cjhendrix up to you.

cjhendrix · 2014-12-30T10:53:12Z

Then I suggest we build it to avoid gaps in data availability, either as Alex suggested or some other approach.

alexandru-m-g · 2014-12-30T23:25:49Z

I don't think it's a huge effort, I'd estimate like 1 day.

On the plus side, the solution suggested above, would solve another problem that we have: during import (which takes some time in our case) someone accessing the API might get incomplete data, and the result could also get cached.
One thing that needs to be taken care of during implementation though, is about updating/modifying indicator values in the import process. In order to keep data consistency:

If a value is found that needs to be updated it should be marked as to be deleted
Instead of updating the value that was found, a new value needs to be inserted and marked with not yet activated

cjhendrix · 2014-12-31T08:24:17Z

Sounds like the plan is clear then. Lets implement a solution like the one you have described that avoid any gaps in data availability.

seustachi · 2015-01-23T15:38:28Z

@alexandru-m-g @cjhendrix is this done ? or to roll to next sprint ?

alexandru-m-g · 2015-01-23T17:07:58Z

There are still some tasks that were not done in this ticket:

Add a new feature that allows a user to REMOVE all the values for a specific dataseries ( for a specific indicator type + source)
implement a caveat solution, maybe the one above - this is related to the point above
DISABLE the feature that allows a user to delete all values from a certain import.

alexandru-m-g added this to the Product Backlog milestone Nov 14, 2014

alexandru-m-g mentioned this issue Nov 14, 2014

Review the import behavior when reimporting a value #278

Closed

alexandru-m-g modified the milestones: Sprint 40, Product Backlog Nov 17, 2014

alexandru-m-g self-assigned this Nov 17, 2014

alexandru-m-g added a commit that referenced this issue Nov 21, 2014

#309 update modified values

8dbef83

alexandru-m-g modified the milestones: Sprint 41, Sprint 40 Nov 24, 2014

alexandru-m-g mentioned this issue Dec 15, 2014

Modified values are not updated #319

Closed

cjhendrix modified the milestones: Sprint 47, Sprint 41, Sprint 48 Jan 26, 2015

danmihaila added the CPS label Jun 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On import update modified values #309

On import update modified values #309

alexandru-m-g commented Nov 14, 2014

alexandru-m-g commented Nov 21, 2014

alexandru-m-g commented Dec 29, 2014

cjhendrix commented Dec 30, 2014

seustachi commented Dec 30, 2014

cjhendrix commented Dec 30, 2014

alexandru-m-g commented Dec 30, 2014

cjhendrix commented Dec 31, 2014

seustachi commented Jan 23, 2015

alexandru-m-g commented Jan 23, 2015

On import update modified values #309

On import update modified values #309

Comments

alexandru-m-g commented Nov 14, 2014

alexandru-m-g commented Nov 21, 2014

alexandru-m-g commented Dec 29, 2014

cjhendrix commented Dec 30, 2014

seustachi commented Dec 30, 2014

cjhendrix commented Dec 30, 2014

alexandru-m-g commented Dec 30, 2014

cjhendrix commented Dec 31, 2014

seustachi commented Jan 23, 2015

alexandru-m-g commented Jan 23, 2015