Create a ResourceField that allows exporting and importing related subsets of data #1375

pokken-magic · 2022-01-01T00:02:02Z

(This is fairly involved -- trying to get all the random ideas I had on this down before I forget them)

A Field created with a reference to a Resource could allow exporting an entire related model, which would enable (potentially) exporting subsets of data with arbitrary depth (where a resourcefield could have other resourcefields) --

So if you had a Book with Publishers and Authors, you could export that Book and all of the Publishers and Authors necessary for it. ResourceFields would need to be imported first, which likely would require significant UI changes -- such as an additional workflow step for each level of depth (if Publishers have Representatives, you'd have to load Representatives before Publishers, then Publishers and Authors before Books).

The power here is that you could create a more consistent dataset with complex realistic models, without having to take multiple steps to synchronize all the dependencies -- you just export a Book, and everything needed for that Book comes with it.

pokken-magic · 2022-01-01T00:24:02Z

Simplest approach might be to require YAML or Json for nested datasets, then you could fairly easily sequence the Resources in the order they need to be.

In Excel you could put each resource on a tab and import in tab order, and CSV you could hypothetically use multiple segments, but might not be worth the bother.

It could be feasible zip and index subsets (e.g. 1_Author.csv, 2_Publisher.csv, 3_Book.csv), but fairly ugly.

pokken-magic · 2022-02-27T02:12:24Z

I am not sure when I’ll get to it but I think I have a pretty good idea of how to approach this design.

Tablibhas a concept of a data book which seems like a good fit here.

So the thought is a ResourceField (or maybe widget) requires a foreign key widget and specifies the class of the resource at instantiation. When exporting a resource with a resource field export instead creates a data book and adds each resourcefields resource as a sheet in the databook ordered ahead of the primary resource sheet. It should be pretty trivial to filter the resource fields resource objects by the ones referred to by the main dataset thereby only bringing the needed data.

I need to play around with data books a little and see how the load and export functions work. My hope is we can update our importer to always use data books instead of datasets and thereby be able to import complete sequenced datasets (where resources can load in an order by dependency).

The ability to do complete internally consistent exports would make import-export an invaluable QA tool for django which is fairly difficult to work with subsets of data (dumpdata and loaddata are not very sophisticated for this purpose).

pokken-magic · 2022-03-25T21:39:16Z

I had to build this for work and it works beautifully, with the constraint that I could not figure out how to get it to spit out anything other than a json blob. I will see if I can get permission to post it, but the gist is:

I used a RelatedField to passing the parent object down to the widget, then a ResourceWidget that you tell the resource_class of the field, on init, then --

a render method that builds a queryset of whatever is in the field's RelatedManager, then passes that to the resource_class's export method, then renders that as json
a clean method that creates a Tablib dataset from Json, then passes that to the resource_Class's import_data method. Then you build a queryset from the RowResult.object_ids and return that

Possible it could be simplified to not need the custom field, but there's some magic in my RelatedField that makes it behave better.

Anyway the bottom line is that this approach of allowing exporting nested resources seems to work pretty well with JSON, and could potentially be added as a capability.

daniel-butler · 2022-04-15T04:01:52Z

I think this is really interesting! If you are able to include code snippits and a general idea of the export/import result that would be really helpful! I have a similar problem and honestly most data has complex relationships

pokken-magic · 2022-04-16T21:21:14Z

Sure!

So before I dive in, I will say that I use a custom Field that saves for enabling creating things through the relatedmanager/related field capability in Django. I have not tried using the ResourceWidget outside of this Field so it may be required to issue a Save.

Because of how m2m models save you at the minimum need to inherit your ResourceWidget from the m2m widget (since otherwise it will never save the newly created things).

What I did was made a ResourceWidget that inherited from M2m and accepted a resource as one of the arguments. Then its Clean and Render methods used the resource.

Since Resource export_data can accept a queryset, you just pass the queryset from the Field you are exporting, and serialize it as a chosen format (my ResourceWidget has another field to allow you to specify a serialization format). the Render function of your widget uses resource.export_data. you can get at the queryset with 'attr.all()'

On the Clean side, you use the format to create a tablib dataset, then pass that to import_data and return the queryset of the new things you made (which can be pulled from the results of import)

Waiting on permission from work to do a PR for this, but I think it's pretty intuitive once you start working on it. Long term it would probably be better to update the whole system to use tablib databooks, but that is a lot more work.

pokken-magic · 2022-04-16T21:30:27Z

The downside to this is I could not figure out a way to get the parent serialization format and inherit it, so you have to be explicit. You could change the parameters of various functions to pass the parent format down to the Field Render/Clean functions, but that is a lot of trouble.

The other downside is that I seriously doubt this would work with complex datasets using anything but Yaml/Json.

Annnnd, if you use Yaml you get out of order dictionaries because Tablib yaml dumper doesn't seem to allow you to change the setting that requires it.

So bottom line you're using Json if you want to do this and not have it be weird :)

pokken-magic · 2022-04-23T23:00:51Z

Does anyone understand the dynamic resource creation code in the admin site well enough to explain to me how to use it?

I was thinking it would be useful to allow dynamically generating a resource for resourcewidget if one is not supplied when I found that but the meta class code stuff went over my head at first.

If you know what model something is how would you make a resource from it in the fly concisely?

pokken-magic · 2022-04-25T19:58:04Z

🔥 I got the approval to release this code from work so I will be working on a draft PR out there since I need some comments.

pokken-magic · 2022-04-25T21:33:49Z

Figured I'd get it out there so people can yoink it for their own private repos if they want, it's very very useful even in its somewhat incomplete form (we're using the heck out of it for making testing datasets).

Love people's feedback on things I missed or other ideas for smoothing off the rough edges. Would very much like some test suite ideas.

Major things I catalogued as I was working

behavior of reverse relationship foreignkeys/inlines
formats - is it too opinionated to say nested relationships should just be json? (I have had all kinds of problems with Yaml, and I don't even want to start thinking about CSV/TVS)
behavior of the Field widget. There were was a really nice fix for post_save someone had written that might be a good way to approach figuring out how to assign values to fields (.set or = etc.) . The code I have there is pretty hacky.
What's the best way to validate this? I can write a bunch of widget tests and make sure things serialize OK, that might be the easiest approach.
How can I automate creating resources for things that don't have them? And would it be useful to have a ResourceOptions option that allows autocreating resources for all fk/m2m fields?

pokken-magic · 2022-04-25T21:39:16Z

If anyone wants to pull that branch down and try it, the BookAdmin has a nested resource you can try that should export a book's categories and authors.

pokken-magic · 2022-04-26T01:01:03Z

I realize this thread is getting kinda spammy but I didn't want to lose sight of a design thought I've been having. ResourceWidget works pretty well for the specific use case I am passionate about which is test data and data lifecycle (moving internally consistent subsets of data from place to place for one reason for another).

But what I am thinking on a bit lately is one of the distinguishing features of this library is really its broad format support, especially non-programmer friendly formats like Excel and CSV. Tablib's support for so many formats is a real differentiator between things like dumpdata/loaddata and DRF serializers.

So what I'm wondering is if, long term, we might be better off thinking about Tablib Databooks as a way to manage this without forcing down the Json pathway. I could see some ways to sequence nested data imports in ways that visualize better than deeply nested Json blobs (series' of CSV tables with line breaks between them or similar).

It might be OK to the initial JSON-only ResourceWidget and then dig more into Databooks?

pokken-magic · 2022-05-15T06:04:51Z

I’m going to be away for a bit on family business but figured I’d update to say that all my company’s code including resource widget is apparently working on 3.0.0b3 :) not sure how we got upgraded but it’s working.

Looking forward to trying multiple resource classes and detection of natural foreign keys

matthewhegarty · 2023-09-20T09:42:28Z

Closing - discussion continues in #445

pokken-magic added the question label Jan 1, 2022

pokken-magic mentioned this issue Jan 1, 2022

Explore Django Rest Framework extension using ModelSerializers #1376

Open

pokken-magic added the enhancement label Jan 18, 2022

pokken-magic mentioned this issue Jan 18, 2022

How to handle inlines #445

Open

pokken-magic mentioned this issue Apr 6, 2022

Resource registry for identifying the resource for a model #1416

Closed

pokken-magic linked a pull request Apr 25, 2022 that will close this issue

Add ResourceWidget for nested import-export of relationships #1430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a ResourceField that allows exporting and importing related subsets of data #1375

Create a ResourceField that allows exporting and importing related subsets of data #1375

pokken-magic commented Jan 1, 2022

pokken-magic commented Jan 1, 2022

pokken-magic commented Feb 27, 2022

pokken-magic commented Mar 25, 2022

daniel-butler commented Apr 15, 2022

pokken-magic commented Apr 16, 2022 •

edited

Loading

pokken-magic commented Apr 16, 2022

pokken-magic commented Apr 23, 2022

pokken-magic commented Apr 25, 2022

pokken-magic commented Apr 25, 2022 •

edited

Loading

pokken-magic commented Apr 25, 2022

pokken-magic commented Apr 26, 2022

pokken-magic commented May 15, 2022

matthewhegarty commented Sep 20, 2023

Create a ResourceField that allows exporting and importing related subsets of data #1375

Create a ResourceField that allows exporting and importing related subsets of data #1375

Comments

pokken-magic commented Jan 1, 2022

pokken-magic commented Jan 1, 2022

pokken-magic commented Feb 27, 2022

pokken-magic commented Mar 25, 2022

daniel-butler commented Apr 15, 2022

pokken-magic commented Apr 16, 2022 • edited Loading

pokken-magic commented Apr 16, 2022

pokken-magic commented Apr 23, 2022

pokken-magic commented Apr 25, 2022

pokken-magic commented Apr 25, 2022 • edited Loading

pokken-magic commented Apr 25, 2022

pokken-magic commented Apr 26, 2022

pokken-magic commented May 15, 2022

matthewhegarty commented Sep 20, 2023

pokken-magic commented Apr 16, 2022 •

edited

Loading

pokken-magic commented Apr 25, 2022 •

edited

Loading