Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ClustalO wrapper example from the docs is not working correctly #116

Open
kMutagene opened this issue May 7, 2021 · 8 comments
Open
Assignees
Labels
bug project-IO Issues regarding the BioFSharp.IO project

Comments

@kMutagene
Copy link
Member

Describe the bug
The example throws an error when it wants to read some temporary files

System.IO.IOException: The process cannot access the file 'C:\Users\schne\AppData\Local\Temp\ac7bce1d-efe2-45ba-8ffa-90d0a60e0f49.fasta' because it is being used by another process.   at System.IO.FileSystem.DeleteFile(String fullPath)
   at System.IO.File.Delete(String path)
   at BioFSharp.IO.ClustalOWrapper.ClustalOWrapper.AlignSequences(IEnumerable`1 input, IEnumerable`1 parameters, FSharpOption`1 name) in C:\Users\schne\source\repos\CSBiology\BioFSharp\src\BioFSharp.IO\ClustalOWrapper.fs:line 347
   at <StartupCode$FSI_0010>.$FSI_0010.main@()
Stopped due to error

To Reproduce

#r "nuget: BioFSharp, 2.0.0-beta5"
#r "nuget: BioFSharp.IO, 2.0.0-beta5"

open BioFSharp
open BioFSharp.IO

open ClustalOWrapper
let cw = ClustalOWrapper()

let sequences = 
    [
    TaggedSequence.createTaggedSequence "pep1" ("AAGECGEK")
    TaggedSequence.createTaggedSequence "pep2" ("AAGEGEK")
    TaggedSequence.createTaggedSequence "pep3" ("AAAGECGEK")
    TaggedSequence.createTaggedSequence "pep4" ("AAGECGEL")
    ]

let alignedSequences = 
    cw.AlignSequences(sequences,Seq.empty)

Additional context

Also, conversion functions or overloads for AlignSequences that work on all BioSequences (BioArray, BioList, BioSeq), so something like TaggedSequence<string,seq<IBioItem>> would be awesome. Tagging @HLWeil as you're the creator ;)

@kMutagene kMutagene added bug project-IO Issues regarding the BioFSharp.IO project labels May 7, 2021
@kMutagene
Copy link
Member Author

Maybe related: #61

@HLWeil
Copy link
Member

HLWeil commented May 7, 2021

If I remember correctly, the ClustalO tool is executed and the output is directly read back into the F# runtime afterwards. It seems like the reader is not waiting until the ClustalO is finished. I'll look into this..

The issue #61 is unrelated but definitely needs some attention 😅 (Edit: Whoops definitely related to your additional context)

@HLWeil
Copy link
Member

HLWeil commented May 7, 2021

I could reproduce the problem but: The code you wrote above shouldn't even get that far because using the ClustalOWrapper() constructor with no parameters will be looking for a default clustalo.exe in a nonsense path (mea culpa).

Downloading the tool and giving it's exe path to the constructor fixes this first hurdle:

let clustalToolPath = System.IO.Path.Combine(__SOURCE_DIRECTORY__,"clustal-omega-1.2.2-win64\clustalo.exe")
let cw = ClustalOWrapper(clustalToolPath)

But calling the AlignSequences methods fails with the error you described. The reason for this seems to be the clustalo tool failing:

> Starting -i C:\Users\HLWei\AppData\Local\Temp\5a1b3b45-1685-4741-a450-28f071ce30cf.fasta -o C:\Users\HLWei\AppData\Local\Temp\0508777c-193d-4bb7-bd27-d903cffb787a.aln --infmt=fa --outfmt=clu ...

FATAL: File C:\Users\HLWei\AppData\Local\Temp\5a1b3b45-1685-4741-a450-28f071ce30cf.fasta does not appear to be in FASTA format at line 1. You may want to specify the file format on the command line. Usually this is done with an option --infmt <fmt>.

I can reproduce this error when running the tool by hand in the powershell, which is weird as the input file is definitely in fasta format:

>pep1
AAGECGEK
>pep2
AAGEGEK
>pep3
AAAGECGEK
>pep4
AAGECGEL

Maybe some line ending bonanza or the fasta format was changed recently 🤷

Edit: Whoops misclicked on close issue. Will investigate further

@HLWeil HLWeil closed this as completed May 7, 2021
@HLWeil HLWeil reopened this May 7, 2021
@HLWeil
Copy link
Member

HLWeil commented May 7, 2021

The line FastA.write id inPath (sequences |> Seq.map tsToFasta) does write in UTF8-BOM encoding which does not seem to be supported by ClustalO. Copying the same fastA file into a UTF8 encoded file does work properly.

Can you check whether your temporary files are encoded in UTF8-BOM or UTF8 not?

Also the FastA.write does not seem to close the writer when finished, blocking the file.
If this is also the case for you I think we should open an additional issue.

AdditionalInfo:

let tsToFasta (ts:TaggedSequence.TaggedSequence<string,char>) =
    {FastA.Header = ts.Tag;
    FastA.Sequence = ts.Sequence}

@kMutagene
Copy link
Member Author

Good catch, UTF8BOM was already an issue when is tried to use fasta files for TargetP2, could have thought about that one. Ill fix that one

@kMutagene
Copy link
Member Author

Should be fixed with beb4158

Adaptions to work on biosequences would still be highly appreciated

@kMutagene
Copy link
Member Author

Also, the rootpath works just fine for me as long is i have the BioFSharp repo on my machine, was that not the intention of the design?

@HLWeil
Copy link
Member

HLWeil commented May 8, 2021

well yes, but actually no

This is the path it tries for me: C:\Users\schne\source\repos\CSBiology\BioFSharp\lib\clustal-omega\clustalo.exe 💯

I think either we change it to look for a global clustal installation or we make the parameter mandatory?

(Edit: Yes that was the intention. Not sure if building the repo yourself should be regarded as a a case in the code though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug project-IO Issues regarding the BioFSharp.IO project
Projects
None yet
Development

No branches or pull requests

2 participants