Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastaWriter erros after about 5.7 mil files #13

Open
CristinaMoraru opened this issue Jan 24, 2024 · 0 comments
Open

FastaWriter erros after about 5.7 mil files #13

CristinaMoraru opened this issue Jan 24, 2024 · 0 comments

Comments

@CristinaMoraru
Copy link

Hei there!
Thank you coding the FastaIO library for Julia, I'm using it quite often. Recently I had a problem when I had to transform a very large multifasta file (~15,9 mil records) into a folder of individual files.

This is my function
`function mfasta2fasta(mfasta_p::FnaP, write_p::String; addprefix::Bool=false, prefix::String="", extension::String="fasta",
removeprev::Bool=false)
if removeprev
rm_mkpaths([write_p])
else
mkpath(write_p)
end
i=0
FastaReader(mfasta_p.p) do FASTA
for fr in FASTA
if addprefix == false
outp = "$(write_p)/$(fr[1]).$(extension)"
else
outp = "$(write_p)/$(prefix)_$(fr[1]).$(extension)"
end

        i = i+1

        try
            FastaWriter(outp) do fw
                @suppress writeentry(fw, "$(prefix)_$(fr[1])", "$(fr[2])")
            end
        catch e
            if e isa SystemError
                println(i)
                println(e)
                println("$(fr[1])")
                println("$(fr[2])")
                println(length(fr[2]))
                break
            end
        end
    end
end 

return nothing

end`

The "i" and the "try-catch" were introduced later, after it kept giving an "out of disk space error" always at the same fasta record, after ~ 5.7 mil fasta records were writen as individual files in the folder. I've tested and the respective record is not broken (I can see its name, bases, and length is of 5.7 kb). If I write just the respective record in a separate folder, it works. And there is plenty of space on the disk, I've checked that as well.
In the end, I had to run this function several times, saving only ~5 mil records per folder. I created a total of 4 folders.

The code used was:
function mfasta2fastaIMG(mfasta_p::FnaP, write_p::String; addprefix::Bool=false, prefix::String="", extension::String="fasta",
removeprev::Bool=false)
if removeprev
rm_mkpaths([write_p])
else
mkpath(write_p)
end

i=0
FastaReader(mfasta_p.p) do FASTA
    for fr in FASTA
        i = i+1

        if i < 15000000 #1 #5000000 #10000000 #15000000  (which numbers were used here, sequencially and in correspondence with the number for elseif below)
            continue
        elseif i < 16000000 #5000000 #10000000 #15000000 #16000000
            outp = "$(write_p)/$(fr[1]).$(extension)"

            if length(fr[2]) >= 2000
                FastaWriter(outp) do fw
                    @suppress writeentry(fw, "$(prefix)_$(fr[1])", "$(fr[2])")
                end
            end
        else
            break
        end
    end
end 

return nothing

end

I don't know what the problem is. It might be related with a faulty file system on the server. Or, it might be related to your library and Julia itself. So, I thought to make you aware of it.

best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant