-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dsc client with Windows and NAS #136
Comments
Hi, I'm not sure what exactly you want to do. Where is docspell installed? You want to download the files to your windows? In general, dsc accepts the |
Hi eikek, thanks for your reply. Docspell is installed via Docker on my NAS. It is accessible with http://myNasIP:7880. I want to use the CLI export functionality to regularly export all documents with metadata. This is to have a sort of backup, aside from the docker volume backups, and mainly to test on how I would set up an exit stragety, if I had to leave docspell. I downloaded dsc.exe to my windows machine and tried to use in CMD: dsc -d http://myNasIP:7880, but it just returns the general help. I also tried to login with dsc, but should make no sense at this point as dsc should only point at the local windows machine, not the NAS. |
The One of them is But you also need to decide where to run this. The dsc tool will download to the machine it is running on. If you want to download it to your windows machine, then run it there. When you later want to export it to your NAS, I would recommend to add another container for this; but first I would test it on some machine interactively until it does what you want. So… for doing the export, you must login via dsc first via the # First login
dsc -d http://myNasIP:7880 login --user YOUR_LOGIN --password YOUR_PASSWORD
# WHen using --*-links, I'd clean these first (don't know how that works on windows, the following line is for linux…)
rm -rf /path/to/export-directory/by_*
# Then export all
dsc -d http://myNasIP:7880 export --all --target /path/to/export-directory --link-naming name --tag-links --date-links --correspondent-links Please see the help to the sub-command for what the options mean. The |
That was great help, thanks. I had already read though the links you have provided. My error was that I didn't think of concatenating the arguments. I started with The export works, also symlinks are created using Windows. You suggest As I said the export works, but I get an error message, that does not seem to have any impact (at least I can't see any impact): But now that the export works, I wonder if I could get a single metadata.json file which contains metadata for all exported items? Or how would you process all exported items with indivial metadata.json files in individual folders? What I mean is that I want to test a full migration scenario, where I need to recover all metadata as hassle-free as possible. |
Ah, this error is not good probably. Are you sure that everything was exported? You can run with The export is producing a metadata.json per item in a defined directory structure. I'm not sure what exactly you want to do (where do you want to migrate to?). To process all exported items, I would traverse the directory tree and on each item do whatever I need to do. For example, with |
The error also appears without About the migration: I don't have a migration destination yet. But in case I need to migriate, I want to be prepared. Or is there a way with partly automated commands or steps to convert the export into a normal file structure, that I don't see at this point? I don't want to lose all work that I'm about to invest in docspell some day to come. |
Regarding the error: I could imagine that some filesystems have problems with specific characters like Regarding the migration: I think it is really hard to prepare for something you don't know - like the system you want to migrate to. The only thing to me is making sure you have access to all the data in a machine readable way (because you don't want to migrate manually). The migration then will require manual steps anyways. So for me it doesn't matter at all whether the data is in an Excel file or in thousands of json files. Actually having thousands of lines in an excel file is much worse, because it is very hard to work with it in my experience (you need to convert to csv or something first usually). With the json files you first don't need to load everything in memory and secondly have the data in an machine readable format right away. Imho it is so much easier to work with data than with an Excel file. I would argue that you can extract data more efficiently with the json files than with a huge Excel file.
You could write a script that unifies all json files into a single file in whatever format you want. But I'm not sure why you would want this, especially when the target system you want to migrate to is unknown. The export produces a well defined file structure. What do you mean with "normal"? |
Thanks for your further input. The § character was the problem. I deleted this file within docspell UI and then the export worked without errors. One thing to note: Now the other missing file that was not exported before is now exported as wll. It seems that the problematic §-character-file stopped the export of (one single) remaining files. Probably that's not the case, as it sounds strange. Your thoughts about migration are very viable and draw me away from my former point of view. My idea was to have a full json file, as this is what paperless-ng export does (I came to docspell because I compared those two and like docspell). In a converted excel file you could use filters etc. But you're right, I don't want to do a manual migration. Now what I mean with "normal file structure" (sorry for being fuzzy) is to simply have PDF files in windows (or any OS) file system - so without DMS features. I'm about to have my PDF files well maintained in docspell and if I had to leave some day (well I hope this projects lives a very long time ;) ) I don't want to have a ton of files without any organization. I'd like to arrange a folder structure then, that maybe includes tag names in the file names for example. I see that the export folder structure supports in this way, but still you'd have to extract the items in folders without symlinks and without subfolders for each item. Right now I wouldn't know how to do that without going crazy. That is my only concern about going with any DMS. |
Ah right, I can totally see now the idea with Excel, it makes it easy to do changes on the whole data at once. It is a good idea, if you don't have too many data, I think. I still think with more data, it will be less convenient. But… you need to script / program something with the json file approach, though hopefully not much. But if one doesn't feel comfortable with scripting, then a bunch of json files can look scary :-). A huge single one doesn't seem that much better, I would say (for me personally both is fine, I tend to prefer many smaller files to a single huge one - but that is probably matter of taste :)). 🤔 Hm, I think the current metadata export really requires to do some sort of scripting to consume it later. Regarding the file structure, I think I see what you are after in general - that is exactly what the Edit: regarding the failed file - this is probably a bug. It exports in batches, maybe when one file fails the whole batch fails. Just a guess, I need to look into this. |
What I mean with the last sentences is that I would be overwhelmed with the export result if I had to quit using the DMS. Even with sysmlinks - I'd had to (re-)create a folder structure without symlinks. What file structure would help me - Well this idea again comes from paperless-ng export: I could choose how the file tree is set up, for example [correspondent]/[tag list]/[individual files]. Then I would have x folders for my x correspondents (like a bank or insurance company), in each of those folders are subfolders named by tags or tag lists that refer to files, and in this folder would be the matching files. The file name would contain the issued date of the document and a title. This would not be perfect for sure, but from there on I could manually sort the files and have most information about the files usable. If this really is a good method when you have many files - I don't know. I might experience trouble then. I see where you're coming from with indivial json files as this is machine readable, if you have some script. I guess with a script it would be possible to have the result I've just suggested. Personally, I'd have to evaluate how difficult this would be for me. I hope you don't mind me referring to paperless-ng. I like both docspell and paperless-ng. Docspell has really great features and usability that appeals to me, and paperless-ng might be abandonned for any further maintanance. Having a possibility for an exit strategy is really important I think. I can't tell if one method is better than the other, after you explained your approach. I see that there is no ready to use and worry free solution to migrate from a DMS, and I'm willing to put some work into it once I have to. But still I fear that this could be some kind of a trap for me if I failed at that. Maybe that's that. Do you have any other thoughts about this? |
No worries at all! I totally understand the importance of an exit strategy! I also did this evaluation for myself, I might want to migrate, too, at some day maybe. I think it's a great idea to look at the output of the export and think whether that fits you or not. I would probably do the same. For me, having the data in some machine readable form would allow me to shape it for the next tool. I also sometimes look at the DB schema to see if I can make sense of it. It would be another way to get the data out without relying on the application. But I see that this approach is not suitable for everyone. I don't have good ideas for a general approach - the json files are some sort of middle way to have a common format that can be parsed easily. I'm always open to ideas. Regarding the folder structure: I think if #114 is done (might not be very soon), you should be able to create at least something close. It would still be a symlink tree, though. Creating directories for tags with real files in them could be problematic, since you can assign multiple tags. Files would need to be duplicated. That's why symlinks are used - to me this feels quite the same as "real" files. For example, if you share it via samba - the user won't notice a difference. On Linux at least it's not hard to replace the links with a copy of their target file. (The folder structure is real, only the files at the and are symlinks; just to be on the same page not sure how close you looked at it). The idea is that you can use the "items" folder as input to scripts, because it is always the same structure. And the symlink trees are for humans to look at. But yes, you need to decide for yourself at the end - as you just found, a silly bug can also be a problem. With some work, you can create some safety nets if needed. In general (with any self hosted and more important application) I would try to not only create data backups, but also do this for the environment and software binaries/packages. With docker, vms and such things, this has become more convenient now. (only testing the backups is like always a pain :)). The idea is that even if the software is abandoned that you can always use the last version, which may have served you reliably for some time (ofc, there are caveats - I would not open it up anymore, but only use internally). This could give you some breath for migrating or you can keep using it just as well. |
I'd absolutely use the software for some time, even if further development was stopped. About the symlinks: I'll have a look into it if I can replace them with the file that the symlink is pointed to in Windows. I see a clear advantage of the single json files, as they are located right at the single files, so you don't need to rely on one big Excel file. But let's assume you would come around with a dsc file structure after the export before you first saw docspell. Now you want put all items and tags into docspell. Would you have an idea on how to start? In docspell there is an import functionality for the original paperless project, but I think you would need to anything manually coming from a plain export. I don't expect a full manual, but if I had to do the task now, I wouldn't know how to start. |
This is hard to answer for the general case. I think I would look what data is supported in the new tool and what ways exist to bring it into the tool. Then I would start looking at one item in the For docspell specifically, you can first go through all items and upload the files. In a second pass, each id can be obtained by sending a sha256 of a file. With this id it is then possible to associate tags, correspondents etc. I would maybe start with 10 or so to see how it goes and then apply to all. The
If you are using a NAS, you could do the export there and share the directory via Samba. From windows you should be able to access and/or copy files (i.e. the symlinks are not observed, they are represented as normal files iirc). |
Hi, sorry I was off for a few days, and I did not have proper input to answer.
You mean that you would request the feature to mass import tags, or you would request to have atrributes like correspondents in a tool?
I didn't look this up, but is this a functionaliy of dsc to mass extract IDs?
This means there are commands in dsc that allow me to mass upload e.g. tags or corresponents? Edit: Corrected quotes |
No worries for being off - it is good to be that sometimes :) I think I don't understand the questions, sorry.
dsc can set tags for an item, but otherwise there is not much functionality yet for changing/adding metadata. But since it is only a client to the quite comprehensive api, you can always use
I'm not sure what you mean with "extracting IDs". You were asking how I would go about this for docspell - in this case you can upload files and later ask for the docspell ID for each file to associate tags and other metadata. |
Sorry, I was off again :) Now I need to find time to finish my DMS project. Basically I wanted to ask if there is a way to:
If there is some way, then migration to docspell would be doable without too much manual effort. I'll have a look to the api and curl, but this would a quite unkown approach for me. I'll see if this is too intimidating or something suitable to learn for me. |
Hi and no worries :) - yes, this is exactly how it would work. You can first upload all documents. then ask for the id given the sha256 of a file and now you can attach tags and other metadata. But you need to code something together, that is true. Uploading and getting the id for a file is supported in |
Ok, thanks a lot for your help. I guess I'd have to dig deep into it and see how it would work for me. But I see that I can work with Docspell, and if I had to migrate to another tool sometime, there likely should be a tool that supports mass input of data - but I'd need to have some understanding of coding to get it done. Do you think this is doable for a non-programmer? I know a bit of coding, but I have no feeling what are the skills and experience needed for such a task. |
For a very non-programmer it is probably a tough journey. But with only a little coding experience, I would say it is doable. I think a bit experience in shell scripting is a good thing, knowing tools like The problem is that preparing something for a "future unknown tool" to process is difficult. The role of |
That sounds like a good conclusion. Thanks a lot for your help and input! And thanks for the great work you do! I'll see how I get on from here. I might have a closer look on jq, curl and python, now that I have some real purpose to do that. |
Hi everyone
I'm trying to use dsc export feature to export all data out of my docspell instance on my Synology NAS. I tried to look everywhere I could think of, but I didn't manage to even connect dsc to my NAS. I know there should be a config file somewhere, or it should be created somehow, but even with dsc --help I have no clue how to connect with my NAS using dsc on Windows.
Do you have any starting point on how to proceed?
The text was updated successfully, but these errors were encountered: