Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: cannot pickle '_io.TextIOWrapper' object" issue blocking final step; python version issue? #570

Closed
philoel opened this issue May 26, 2021 · 3 comments

Comments

@philoel
Copy link

philoel commented May 26, 2021

Hi there- I'm running the first analysis in a few months, this time with a new machine with new installs of everything including python. After getting almost to the end of the whole run, I noticed that the species tree wasn't inferred correctly so I passed a manually corrected tree, using orthofinder -ft path_to_dir -s manual_tree.txt.

It's hanging up within a heartbeat, with this message (copied to .txt and attached).

I googled the final TypeError, and found GoogleCloudPlatform/gsutil#961 , which suggests that it might be an error with a recent version of python? I'm using Python 3.9.4 for this.

I didn't see "TypeError: cannot pickle '_io.TextIOWrapper' object" in the issues for Orthofinder so I'm not sure if this is new or not.

Anyways, I downgraded my python to 3.6.0 to see if the error might really be solved just with a python change, and it worked nicely.

So, heads up that there is some issue arising in Python 3.9 that interferes with ... pickling things? I'm not sure if there is a better venue to let you know this than by opening an issue.

orthofinder_TypeError_output.txt

Best,
Phil

@davidemms
Copy link
Owner

Hi Phil

Thanks for raising this and for the extra info supplied. I will take a look and see if there's a fix I can make in the OrthoFinder code to resolve this.

Best wishes
David

@davidemms
Copy link
Owner

Hi Phil

Thanks again for reporting this. The notes below are mainly for recording the changes I've made to fix this and why so feel free to ignore them if they're not of interest.

On MacOS as of python 3.8 for multiprocessing the spawn start method is now the default instead of fork: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

This does not allow the pickling of TextIOWrapper objects. These are essential for this particular bit of multiprocessing code in OrthoFinder. OrthoFinder parallelises over the gene trees and each of these parallel processes identifies all the orthologs in each gene tree it processes and writes these orthologs to the ortholog results files. I can see only one way to get around this, detailed below. This would be a large amount of work to implement and has some significant downsides, which might not be resolvable. For that reason I've switched back to forking on MacOS.

Alternative: There would be a single file writer process, which is responsible for writing all the orthologs, duplications etc to file. All the other processes would pass orthologs etc to this process and it would write them to file. As it would wholly own the TextIOWrapper objects there would be no need to pass them around the multiple processes and this would eliminate the issue. However, there are significant load-distribution issues. If the orthologs, gene duplicates etc from the parallel tree processing threads were produced quicker than the serial ortholog writing thread could manage then an increasingly large backlog could build up and this could exceed the amount of RAM on the machine, causing a crash. It could be possible to parallelise the writing of different ortholog results files over different threads, but this would be complex and would still require very delicate balancing to prevent such problems. Currently, and tree processing thread is also responsible for writing its own orthologs, so no such backlog can build up and the method used automatically balances itself.

@davidemms
Copy link
Owner

Hi @philoel, I've submitted a fix but I have limited ability to test things on mac, if you'd be able to try it and let me know if it fixes the issue for you that'd be really helpful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants