Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a desktop application for bulk ingest and editing of Islandora objects #1172

Open
mjordan opened this issue Jun 19, 2019 · 45 comments
Open
Labels
help wanted Seeking a volunteer or co-worker Type: Meta-issue Identifies multiple related tickets for ease

Comments

@mjordan
Copy link
Contributor

mjordan commented Jun 19, 2019

At iCampEU there was a great discussion of a desktop application for managing Islandora content. This wish has also been expressed in mjordan/claw_rest_ingester#8, and this Islandora Conference session will also generate discussion on this.

One of the use cases that came up at the Camp was the need for a repo manager to update a field value on large numbers of objects, large here being more than is convenient to do using Views Bulk Edit within a browser, for example several thousand objects.

@seth-shaw-unlv has expressed interest in working on this, and @ibrahimab, who was at the Camp, has also. Both of these community members mentioned that Electron would be an excellent environment to build this in so it can be used easily on Windows, Mac, and Linux desktops.

I suggest adding use cases for this tool so we can start refining user interfaces, interactions with Islandora 8.

Thanks EU Campers for the discussion!

@mjordan mjordan added help wanted Seeking a volunteer or co-worker Large labels Jun 19, 2019
@manez
Copy link
Member

manez commented Jun 19, 2019

On behalf of iCamp in Switzerland:

  • Can this tool have a task queue? Scheduled jobs?

  • With such a queue, can we set a time to wait between transactions?

  • Can we set other parameters (don't do 1000 at once, etc)?

  • Is this just for field values, or can we edit other things? Thumbnails, original files, etc?

  • What's the benefit of doing something like this, versus a Drush script?

    • Gives front-end users the ability to use it.
    • We can have both; make a command line tool and put a GUI on top.
  • What about a browser tool that sets parameters for a command line tool, instead of a desktop tool?

    • Could be another option for a front-end to a command line tool.
    • Could write a tool in Python and add a GUI in Electron.

@jonathangreen
Copy link
Contributor

jonathangreen commented Jun 19, 2019

I, too, am interested in this 🙋🏻‍♂️

@seth-shaw-unlv
Copy link
Contributor

  • What's the benefit of doing something like this, versus a Drush script?

    • Gives front-end users the ability to use it.
    • We can have both; make a command line tool and put a GUI on top.
  • What about a browser tool that sets parameters for a command line tool, instead of a desktop tool?

    • Could be another option for a front-end to a command line tool.
    • Could write a tool in Python and add a GUI in Electron.

This is essentially how WWU's Islandora (7) Batch Uploader (an Electron app) works. After It does a bunch of neat stuff with various web-services it builds the MODS XML then SCPs the images and metadata to the Islandora server and triggers a Drush migration via SSH.

We could do essentially the same, build a spreadsheet or collection of JSON documents in the app then SCP them over and trigger commands via SSH. What I don't like about this approach is that it requires managing user accounts for the server (or, even worse, a shared account).

Personally, I would prefer we stick to what we can trigger via REST (which should be pretty much everything except Drush commands, but those can be ported) so we can use the Drupal authentication instead.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 19, 2019 via email

@jonathangreen
Copy link
Contributor

jonathangreen commented Jun 19, 2019

I agree, REST all the way. I think we probably need a chunked file uploader, so we can load files that exceed the servers limits, but it can be chunked and checksum client side then uploaded. Something like PLUpload. Then we need to test the limits of that, hopefully we can support very large batches that way.

@seth-shaw-unlv
Copy link
Contributor

Ooooh... PLUpload looks interesting.

Also, I know our staff would like a spreadsheet experience for doing batch metadata editing. I've been looking around and Handsontable looks like the most promising option (it has auto-fill!). While they are open source, they aren't free for commercial use, but they do have a static non-commercial license key we can use.

@whikloj
Copy link
Member

whikloj commented Jun 19, 2019

I remember seeing the WWU batch uploader at iCamp San Diego, it was amazing. I would be 👍 to some Python with Electron on top. I'm sure we could pick @davidbasswwu's brain for some lessons learned.

@davidbasswwu
Copy link

Happy to help if I can. The IBU is the first time I've used Electron/Node/Vue, so there was a steep learning curve, but using that environment has been great to work with, so I think it was worth it. I'll think about lessons learned a bit more and note them here if anything in particular comes to mind, but feel free to ask me anything.

@rangel35
Copy link

I just recently uploaded the views bulk edit modules on my local vm so maybe I just haven't found it...but a search function to filter the objects to be "bulk edited" would be nice

@rosiel
Copy link
Member

rosiel commented Jun 19, 2019

Notes from today's call: @alxp suggested an Open Refine plugin

@manez
Copy link
Member

manez commented Jun 19, 2019

@rangel35 I think you can do what you want in Views Bulk Edit by exposing the filters on the view, so the results can be altered on the fly. Then check "all" and have at it!

@rangel35
Copy link

thanks @manez, I'll give that a try

@ibrahimab
Copy link

The best approach (imo) would be creating a REST API. The Electron app can then use this API and it leaves the door open for other developers to implement their own input (e.g. command/mobile).

@whikloj
Copy link
Member

whikloj commented Jun 21, 2019

@ibrahimab there are existing REST endpoints in Drupal to allow for the creation of nodes/media/files we use those and we have created...one (I think) but will keep making/exposing those as needed.

In my mind this would be a separate program that would use those various REST API endpoints to do these ingest/edit/delete actions

So when you say creating a REST API, you mean define these endpoints as an API?

My concern is that some of these endpoints are out of our control (ie. Drupal controls them) so we would need to duplicate them, provide a wrapper in-front of them or have an API tied to a specific version of Drupal.

Not that we can't do one of those, just not sure which is best but also maintainable.

@akuckartz
Copy link

Why a "desktop application" instead of a web application?

@whikloj
Copy link
Member

whikloj commented Jun 21, 2019

@akuckartz I think because there are/will be further Drupal development which will fill the gaps for web application, whereas (IMHO) this is something that staff who's primary job is to prepare and ingest data would use.

You could certainly also make a web application that does this. Personally, I would want to understand what the web application would do that we can't do in Drupal.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 21, 2019

@akuckartz main advantage of a desktop app is that the binary files to be ingested can sit on the user's computer instead of having to be on the Islandora server's filesystem.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 24, 2019

I've been thinking of the advantages of having a command-line tool do the actual interaction with islandora, and the GUI an optional app that sits on top of this CLI tool. Apparently Electron is very happy to interact with Python scripts (user only ever interacts with the GUI), and I've heard they bundle up nicely for distribution. I'd be happy to port https://github.com/mjordan/claw_rest_ingester to Python.

If we have both a CLI and an accompanying GUI, we have the best of both worlds.

@akuckartz
Copy link

the binary files to be ingested can sit on the user's computer instead of having to be on the Islandora server's filesystem.

How is that a significant advantage? (Still trying to understand this issue.)

@mjordan
Copy link
Contributor Author

mjordan commented Jun 24, 2019

I guess it depends on workflows and tooling at specific sites, but at my institution, the staff that scan the content and prepare the metadata don't have direct access to the Islandora server, so to get the content up to the server to perform a batch load, we need to get someone from Systems involved. Skipping that last step would be a significant gain for us.

Prior to adopting Islandora we were a CONTENTdm shop, and its desktop client is still described lovingly even after 3 years, not because it was a great piece of software (it was pretty horrible), but because getting content into CONTENTdm was sooo much easier than getting content into Islandora.

@seth-shaw-unlv
Copy link
Contributor

@akuckartz, for us it partly a resource load distribution. We can have 10+ students or staff working with large collections of materials at one time adding metadata to hundreds of items. That puts a lot of load on a single machine. If they are all continuously sending updates and retrieving updated records.

Sure, we could add more servers and place them behind a load balancer OR we can leverage the staff's desktop resources while they do their metadata creation and updates. When they are done with that project they can add their collections to a rate-limited queue. No one's work gets slowed down at busy times and pushes to the server can happen during slow times. Their machines could potentially also generate derivatives.

@seth-shaw-unlv
Copy link
Contributor

Ninja'd by @mjordan.

@akuckartz
Copy link

@mjordan @seth-shaw-unlv Thanks, now I understand. I was not aware of the resource load distribution issue.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 26, 2019

I've ported the CLAW REST Ingester to Python so it can be wrapped in an Electron app: https://github.com/mjordan/islandora_workbench.

@davidbasswwu
Copy link

@mjordan I'll take a swing at integrating that into Electron. Working on getting Claw up and running now...

@mjordan
Copy link
Contributor Author

mjordan commented Jun 27, 2019

@davidbasswwu awesome! Workbench is a work in progress and I plan on adding a bunch more CRUD functionality, but what's there should be good enough to get you going. I'm happy to test on Windows and Linux if you want.

@seth-shaw-unlv
Copy link
Contributor

@davidbasswwu and @mjordan, I'm happy to test OS X and Windows.

@davidbasswwu
Copy link

I'm taking next week off, but I did make some good progress today, so I'll let you know when I have something ready.

@mjordan
Copy link
Contributor Author

mjordan commented Jul 1, 2019

@davidbasswwu I've added delete, update, and add_media ability to workbench. If you pull in the latest version, you will need to add task: create to your config.yml file.

@mjordan
Copy link
Contributor Author

mjordan commented Jul 4, 2019

Over in mjordan/islandora_workbench#8, @seth-shaw-unlv has created a proof of concept implementation of jExcel that can be integrated into the Electron desktop app, providing a GUI CSV editor for end users. His implementation queries Islandora to get taxonomy terms (in the screenshot below, the vocabulary is "Access Terms") which are implemented in the GUI as spreadsheet pick lists:

jexel

The disk icon in the upper-left corner saves the file to your local disk. I can imaging a workflow like:

  1. user populates the GUI CSV editor, or pulls up a CSV and edits cells.
  2. user saves work when they are done.
  3. user clicks on an "Upload" button (or a "Validate" button, you know, 'cause it's good to validate content before pushing it into Islandora). (At this point Workbench starts up, in the background and invisible to the user, and ingests the content.)
  4. user does a little dance.

THIS IS AWESOME. Sorry for shouting.

To try this yourself, get the code at @seth-shaw-unlv's repo at https://github.com/seth-shaw-unlv/islandora-jexcel-scratchpad.

@mjordan
Copy link
Contributor Author

mjordan commented Jul 4, 2019

@manez you like this?

@manez
Copy link
Member

manez commented Jul 4, 2019

I could toss a few more emoji on there to be clear 😄

I'm imagining being able to add this to the Admin track at iCamps.

@seth-shaw-unlv
Copy link
Contributor

Keep in mind that it is just a proof of concept so far. There is a lot of potential here. Validation can happen on a cell value change so you know there is a problem immediately.

Also, you could probably build this directly into your Drupal as a mass editor.

@mjordan
Copy link
Contributor Author

mjordan commented Jul 4, 2019

@seth-shaw-unlv yes, that is another option... interesting.

@dannylamb
Copy link
Contributor

dannylamb commented Jul 4, 2019

@mjordan I believe what you meant to shout was

@dannylamb
Copy link
Contributor

Yeah... just wait till it can load rows from views results 🚀

@kayakr
Copy link
Contributor

kayakr commented Jul 21, 2019

I'm interested in seeing where this goes. Two projects that may be relevant:

  • DataCurator - Electron-based data-wrangling app with a spreadsheet UI and backed by data packages and table schema. The idea is that data entry staff can focus on entering & validating data, then download a data package and/or upload/push to repository. https://github.com/ODIQueensland/data-curator
  • TUS protocol for chunked/resumable uploads over HTTP, see https://tus.io/ and https://www.drupal.org/project/tus I've used this on Drupal 8 for video uploads but not (yet) on Islandora.

@mjordan
Copy link
Contributor Author

mjordan commented Jul 21, 2019

@kayakr excellent pointers. tus certainly looks like it would be useful, and I've already noted it over at the workbench repo,

@seth-shaw-unlv
Copy link
Contributor

@kayakr thanks for the pointers! Both of these look interesting. I will certainly take inspiration from DataCurator.

It looks like DataCurator is tagging an older version of the JavaScript spreadsheet library Handsontable (^3.0.0). I was initially planning on using Handsontable until they switched their licensing model as of version 7 and I wasn't confident on relying on forked versions from version 6.

@mrtngrsbch
Copy link

This is essentially how WWU's Islandora (7) Batch Uploader (an Electron app) works. After It does a bunch of neat stuff with various web-services it builds the MODS XML then SCPs the images and metadata to the Islandora server and triggers a Drush migration via SSH.

Hi @seth-shaw-unlv ,

I am interested in seeing this app but I don't have permissions... please can you give me a hint.

@davidbasswwu
Copy link

Hi @mrtngrsbch - I realized that my app has some security vulnerabilities that are going to be very hard to fix, so I closed the repository and I'm now working on converting it to a web application. I plan to make that available once it's ready, but for now, I would recommend not using IBU.

@davidbasswwu
Copy link

@mrtngrsbch - If you want to see what the IBU desktop app looks like, there is a video on https://mabel.wwu.edu/ibu (skip to 2:50). The web version should work and look just like the desktop version.

@mrtngrsbch
Copy link

mrtngrsbch commented Feb 8, 2021

OMG @davidbasswwu !
That IBU is impressive! They even integrated Clarifai AI !
It only remains to train your own AI to understand your elements. Some time ago I did several tests with Amazon Rekognition gave me better results.

Thanks so much David, you have impressed me with the progress of this tool... and I take the opportunity to give you back my current sandbox & proof of concept.

In a few words. Inject the metadata [ISAD(G), DC, VRA...] with ExifTool by Phil Harvey (also Adobe Bridge) and populate the contents with the EXIF module.
We have already done several things along the way:

  1. Extend ExifToolGUI [https://github.com/hvdwolf/jExifToolGUI] to read and write ISAD(G) metadata (other namespaces coming soon).
  2. I have created some presets for Adobe Bridge Custom Metadata [https://github.com/MuseosAbiertos/Adobe-Bridge-Custom-Metadata-JSON-Presets]

Next step: Extending Drupal's EXIF module to read new metadata presets [https://museosabiertos.org/node/85], for example:

  • A set composed of ISAD(G), DC, IPTC and other private parties to enable Taxomony Menu to create a logical tree.
    et voilá, just like that

Context:
In Latin America, the technological distance and investment in new technologies and knowledge has a very important delay. I have been following Islandora for almost two years but I can't find the resources to implement it. I just limit myself to study, test and learn.

So we have decided (we are still deciding *) that the fastest method to have an online publication, with its standardized metadata, is to first help the museum community to gently climb a step up the ladder. Islandora would be paradise itself, but still far away.

  • We still don't know whether to create a Drupal distro or rely on Islandora (without Fedora).

Indeed, I have participated in this thread because I am interested in a bulk upload.

best

@davidbasswwu
Copy link

Thanks @mrtngrsbch! It sounds like you are working on some neat things too. In case you're not already familiar, check out https://github.com/Islandora-Collaboration-Group/ISLE which may help with implementation, and https://islandora.ca/community for some additional ways to get involved. 👍

@mrtngrsbch
Copy link

mrtngrsbch commented Feb 12, 2021

Thanks @davidbasswwu, I think I still need to read and experiment on my own, before touching dad's toys. ;-)
I promise to study more as this Islandora community is very interesting and there are many people who really know.

@kstapelfeldt kstapelfeldt added Type: Meta-issue Identifies multiple related tickets for ease and removed Large labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking a volunteer or co-worker Type: Meta-issue Identifies multiple related tickets for ease
Projects
Development

No branches or pull requests