New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PICCL pipelines need to do better input validation and provide better error/warning messages to the user + general lack of documentation needs to improve #37
Comments
PICCL documentation is unfortunately indeed in a rather minimal state currently, all there is currently is is the README in this repository, which should give some examples. The
Yes, it needs to be put in a spot that is shared between container and host system, you can't reference paths on the host system from within the container. Consult the docker documentation regarding volumes, at https://docs.docker.com/storage/volumes/ to learn how to deal with sharing data.
Good point, there should be more checks and more helpful error messages implemented.
It probably didn't find the input directory or nor pdf files in it, a message would have been nice yes, definite points for improvement. I'll make that the topic for this issue. |
Also it's not really specified where the ocr_output directory is. Where is that? |
Since it's a relative path, it will be created in your current working directory. You can set any other path, absolute or relative with the |
I'm having issues with this, I have my pdf that I first copy into my docker file system. I successfully do that. Then I run "ocr.nf --inputdir home/lamachine/test.pdf --language eng --outputdir home/lamachine". It appears that I'm successful, there are no error warnings. But things don't seem right. Prior to me running ocr.nf I had two things in my lamachine directory, bin and test.pdf. Afterwards, I now have three things, bin, test.pdf, and work. This new work directory seems like a good place for my new test.folia.xml file. However, there is nothing within this directory. After some more looking around, I still can't find it. I have no idea where this file ended up, or if it was created at all. Do you have any idea what could be up? Also I'm not seeing an ocr_output directory in my current working directory. And when providing two different output directories (I tried home/lamachine and home), only one gets that "work" directory, and no matter which output directory I put, the work directory always shows up in my current working directory. I have no idea what's going on. A similar failure happens with tokenize.nf:
I'm just not seeing where any of the files end up |
Still having this issue |
You specified |
(Sorry to keep pestering you with this issue)
The work directory has nothing within it, and as far as I can tell it's the only thing changed or created from running ocr.nf |
I just released the new PICCL that contains more input validation (as per this issue), though it is still not ideal. Considering that you keep running into problems related to input parameters/files, can you give it a try whether the new messages make it any clearer for you? (You'll need to update your LaMachine) As to the above problem, you used a non-existing parameter ( |
Whoops, --input was an accident, however that issue persists. I've looked over my code and made sure that I've haven't made any mistakes you went over too. The notes were good though, I realized I can't use the path to the exact file as the input directory, I've probably been doing that wrong the whole time.
|
Okay so good news and bad news. I got it working, but it only seems to work with pdf files, not tif files (I only tested those two file types). During the first run of
And for the sake of a little more testing and looking to see if something is up with ocr'ing a tiff file I tried this:
Even with the correct tiff file notation (I'm pretty sure that's how that's meant to be), |
(sorry for the delay, my holiday period is starting so I'm more absent the coming weeks) For tiff the filename indeed needs to correspond to a particular pattern ( |
(closing this after long inactivity, the situation in the latest release today should be better at least, although it's still not ideal) |
I'm trying to run ocr.nf with docker and I'm not sure how the parameters are meant to be used.
So for like the --inputdir parameter, we're only supposed to give the folder that contains the images? Does this mean that what ever image files are within that folder are going to be run through the pipeline? And is this file system our normal file system or our docker file system?
So if I want to run a pdf, that's sitting in my desktop folder, through the pipeline, I would run
"ocr.nf --inputdir C:\Users\willstout\Desktop --language eng"? Or would I first need to add it to a docker container
ocr.nf is quite confusing to work with because there isn't a lot of documentation on the whole program. In fact running "ocr.nf --help" does the exact same thing as "ocr.nf". Additionally if I wanted to purposefully run something wrong just to see what error I would be given, the program will run the same as if nothing is wrong. For instance running "ocr.nf --inputdir" and not giving it a specified directory sends me back to the starting point of the OCR pipeline. Running with a specified directory just tells me
N E X T F L O W ~ version 0.30.2
Launching
/usr/local/bin/ocr.nf
[desperate_newton] - revision: 76d7839f83OCR Pipeline
[warm up] executor > local
lamachine@eab8a83a33ea:~$
And running this with a directory that doesn't exist gives that same output. So there's no way to tell if what I am doing is correct.
The text was updated successfully, but these errors were encountered: