-
-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a ResourceField that allows exporting and importing related subsets of data #1375
Comments
Simplest approach might be to require YAML or Json for nested datasets, then you could fairly easily sequence the Resources in the order they need to be. In Excel you could put each resource on a tab and import in tab order, and CSV you could hypothetically use multiple segments, but might not be worth the bother. It could be feasible zip and index subsets (e.g. 1_Author.csv, 2_Publisher.csv, 3_Book.csv), but fairly ugly. |
I am not sure when I’ll get to it but I think I have a pretty good idea of how to approach this design. Tablibhas a concept of a data book which seems like a good fit here. So the thought is a ResourceField (or maybe widget) requires a foreign key widget and specifies the class of the resource at instantiation. When exporting a resource with a resource field export instead creates a data book and adds each resourcefields resource as a sheet in the databook ordered ahead of the primary resource sheet. It should be pretty trivial to filter the resource fields resource objects by the ones referred to by the main dataset thereby only bringing the needed data. I need to play around with data books a little and see how the load and export functions work. My hope is we can update our importer to always use data books instead of datasets and thereby be able to import complete sequenced datasets (where resources can load in an order by dependency). The ability to do complete internally consistent exports would make import-export an invaluable QA tool for django which is fairly difficult to work with subsets of data (dumpdata and loaddata are not very sophisticated for this purpose). |
I had to build this for work and it works beautifully, with the constraint that I could not figure out how to get it to spit out anything other than a json blob. I will see if I can get permission to post it, but the gist is: I used a RelatedField to passing the parent object down to the widget, then a ResourceWidget that you tell the resource_class of the field, on init, then --
Possible it could be simplified to not need the custom field, but there's some magic in my RelatedField that makes it behave better. Anyway the bottom line is that this approach of allowing exporting nested resources seems to work pretty well with JSON, and could potentially be added as a capability. |
I think this is really interesting! If you are able to include code snippits and a general idea of the export/import result that would be really helpful! I have a similar problem and honestly most data has complex relationships |
Sure! So before I dive in, I will say that I use a custom Field that saves for enabling creating things through the relatedmanager/related field capability in Django. I have not tried using the ResourceWidget outside of this Field so it may be required to issue a Save. Because of how m2m models save you at the minimum need to inherit your ResourceWidget from the m2m widget (since otherwise it will never save the newly created things). What I did was made a ResourceWidget that inherited from M2m and accepted a resource as one of the arguments. Then its Clean and Render methods used the resource. Since Resource export_data can accept a queryset, you just pass the queryset from the Field you are exporting, and serialize it as a chosen format (my ResourceWidget has another field to allow you to specify a serialization format). the Render function of your widget uses resource.export_data. you can get at the queryset with 'attr.all()' On the Clean side, you use the format to create a tablib dataset, then pass that to import_data and return the queryset of the new things you made (which can be pulled from the results of import) Waiting on permission from work to do a PR for this, but I think it's pretty intuitive once you start working on it. Long term it would probably be better to update the whole system to use tablib databooks, but that is a lot more work. |
The downside to this is I could not figure out a way to get the parent serialization format and inherit it, so you have to be explicit. You could change the parameters of various functions to pass the parent format down to the Field Render/Clean functions, but that is a lot of trouble. The other downside is that I seriously doubt this would work with complex datasets using anything but Yaml/Json. Annnnd, if you use Yaml you get out of order dictionaries because Tablib yaml dumper doesn't seem to allow you to change the setting that requires it. So bottom line you're using Json if you want to do this and not have it be weird :) |
Does anyone understand the dynamic resource creation code in the admin site well enough to explain to me how to use it? I was thinking it would be useful to allow dynamically generating a resource for resourcewidget if one is not supplied when I found that but the meta class code stuff went over my head at first. If you know what model something is how would you make a resource from it in the fly concisely? |
🔥 I got the approval to release this code from work so I will be working on a draft PR out there since I need some comments. |
Figured I'd get it out there so people can yoink it for their own private repos if they want, it's very very useful even in its somewhat incomplete form (we're using the heck out of it for making testing datasets). Love people's feedback on things I missed or other ideas for smoothing off the rough edges. Would very much like some test suite ideas. Major things I catalogued as I was working
|
If anyone wants to pull that branch down and try it, the BookAdmin has a nested resource you can try that should export a book's categories and authors. |
I realize this thread is getting kinda spammy but I didn't want to lose sight of a design thought I've been having. ResourceWidget works pretty well for the specific use case I am passionate about which is test data and data lifecycle (moving internally consistent subsets of data from place to place for one reason for another). But what I am thinking on a bit lately is one of the distinguishing features of this library is really its broad format support, especially non-programmer friendly formats like Excel and CSV. Tablib's support for so many formats is a real differentiator between things like dumpdata/loaddata and DRF serializers. So what I'm wondering is if, long term, we might be better off thinking about Tablib Databooks as a way to manage this without forcing down the Json pathway. I could see some ways to sequence nested data imports in ways that visualize better than deeply nested Json blobs (series' of CSV tables with line breaks between them or similar). It might be OK to the initial JSON-only ResourceWidget and then dig more into Databooks? |
I’m going to be away for a bit on family business but figured I’d update to say that all my company’s code including resource widget is apparently working on 3.0.0b3 :) not sure how we got upgraded but it’s working. Looking forward to trying multiple resource classes and detection of natural foreign keys |
Closing - discussion continues in #445 |
(This is fairly involved -- trying to get all the random ideas I had on this down before I forget them)
A Field created with a reference to a Resource could allow exporting an entire related model, which would enable (potentially) exporting subsets of data with arbitrary depth (where a resourcefield could have other resourcefields) --
So if you had a Book with Publishers and Authors, you could export that Book and all of the Publishers and Authors necessary for it. ResourceFields would need to be imported first, which likely would require significant UI changes -- such as an additional workflow step for each level of depth (if Publishers have Representatives, you'd have to load Representatives before Publishers, then Publishers and Authors before Books).
The power here is that you could create a more consistent dataset with complex realistic models, without having to take multiple steps to synchronize all the dependencies -- you just export a Book, and everything needed for that Book comes with it.
The text was updated successfully, but these errors were encountered: