Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch import of new files in web UI #317

Closed
gjost opened this issue Oct 18, 2022 · 2 comments
Closed

Batch import of new files in web UI #317

gjost opened this issue Oct 18, 2022 · 2 comments
Assignees

Comments

@gjost
Copy link
Member

gjost commented Oct 18, 2022

ddrimport files has pretty much worked for awhile now. Batch export/import was always intended to be a feature usable through the web UI (the "import csv (objects,files)" links in the footer). Let's do this!

Do a minimal implementation and see how it goes.

  • Have Celery write logs to /var/log/ddr/batch/COLLECTIONID-YYYYMMDD-HHMM-import-files.log
  • Each batch operation writes to a new log file.
  • Include link to back log in Celery success/fail message.
@gjost
Copy link
Member Author

gjost commented Dec 5, 2022

12:03 sara.beckman - Tested the ddrimport in the Editor UI on 10.0.1.137. I imported mezz, master, and transcript files to ddr-testing-40410. I was able to successfully import the files. The log files were created in a subfolder. I also tested the CSV validation -- it dedicated a mismatch between file name & basename_orig. I also deleted a required header in the csv and the CSV validator caught it.

12:08 sara.beckman - One thing I noticed - another test I ran as a sort of sanity test. I tried a file import before I imported entities just to make sure it wouldn't work. It didn't and gave a clear Failure notice in celery. It gave a log file with the error output. However that log wasn't saved to the log folder. Could that be saved like the success log does?

12:09 gjost - Was the error on the batch file import (sounded like that worked before) or the batch entity import, which i haven't worked on yet?

12:10 sara.beckman - file import in the editor
12:11 - I just wanted to see what would happen if an archivist tried to import the files when the entities weren't present. Basically trying to get a failure notice in celery
12:12 - The failure notice was clear and had a link to the log file. Which clearly stated that the entities weren't in the repo. The log file just didn't save in the ddrshared folder like the success log files.
12:13 - I think saving even failure log files to the ddrshared folder would be beneficial

12:17 gjost - This case probably should have failed in the validation stage
12:22 sara.beckman - Adding it to the validator when you select the CSV would work too. But the import failing right away with the log is pretty clear and works too -- but catching it first would probably cause the archivists less distress

TODO Try to import files for non-existent entities

gjost added a commit that referenced this issue Dec 9, 2022
- batch: File import using the web UI.
- batch: Update signatures for file import parent entities
- batch: Also check for missing file parent entities
- batch: Commit batch file import in task, rollback if errors
- conf,templates: Make application logs and ddrshared dir
  browsable, with links in footer.
- (ddrcmdln) batch: Support file import from the web UI.
- (ddrcmdln) batch: File import now does more checks before
  importing: of the CSV file, the repository itself, and of parent
  entities.
- (ddrcmdln) batch: Add util.FileLogger for logging batch imports
  to separate file. Various functions needed to be modified to
  accept a sometimes optional `log` argument.
- (ddrcmdln) batch: Subsidiary functions now let *ddrimport* decide
  whether to quit instead of calling sys.exit() themselves.
- (ddrcmdln) batch: Updated and added more unit tests and function
  docstrings.
- (ddrcmdln) dvcs: Add rollback function for automatically cleaning
  up modified and untracked files in a repository if an import goes
  bad.
See #317

- models: Update persons search links, add creators search and
  NamesDB links.
- (ddrcmdln) models: Add `search_hidden` field for indexing
  tokenized/keyword text.
see denshoproject/ddr-public#199

- (ddrcmdln) ddrnames: Export in natural sort order
- (ddrcmdln) ddrnames: Simple tool to export creators,persons from
  collection.
See denshoproject/ddr-cmdln#216

- nginx,templates: Remove link to ddr-manual from footer
- Makefile: Include namesdb-public in install-app task
- (ddrcmdln) ddrvocab: Add cmd to convert densho-vocab data betw
  JSON and CSV
- (ddrcmdln) Makefile: Fixed apt package key install procedure
- (ddrcmdln) Makefile: Include ntp as a dependency.
gjost added a commit to denshoproject/ddr-cmdln that referenced this issue Dec 9, 2022
- batch: Support file import from the web UI.
- batch: File import now does more checks before importing: of the
  CSV file, the repository itself, and of parent entities.
- batch: Add util.FileLogger for logging batch imports to separate
  file. Various functions needed to be modified to accept a
  sometimes optional `log` argument.
- batch: Subsidiary functions now let *ddrimport* decide whether to
  quit instead of calling sys.exit() themselves.
- batch: Updated and added more unit tests and function docstrings.
- dvcs: Add rollback function for automatically cleaning up modified
  and untracked files in a repository if an import goes bad.
See denshoproject/ddr-local#317

- models: Add `search_hidden` field for indexing tokenized/keyword
  text.  Supports updating persons search links and adding creators
  search and NamesDB links in ddrpublic.
see denshoproject/ddr-public#199

- ddrnames: Export in natural sort order
- ddrnames: Simple tool to export creators,persons from collection
See #216

- ddrvocab: Add cmd to convert densho-vocab data betw JSON and CSV
- Makefile: Fixed apt package key install procedure
- Makefile: Include ntp as a dependency.
@gjost
Copy link
Member Author

gjost commented Feb 8, 2023

Work completed as of ddr-local commit e22f4b3328 and merged in to master.

@gjost gjost closed this as completed Feb 8, 2023
@gjost gjost added fixed and removed WORKING labels Feb 8, 2023
@gjost gjost reopened this Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants