Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in process busco4_dist #32

Open
ghost opened this issue Dec 9, 2021 · 3 comments
Open

Error in process busco4_dist #32

ghost opened this issue Dec 9, 2021 · 3 comments

Comments

@ghost
Copy link

ghost commented Dec 9, 2021

Hi, I apologize for my frequent contacts.

When the runninfg of SOS_busco.py in process busco4_dist, I got following error,

Command error:
  Traceback (most recent call last):
    File "/mnt/data/software/TransPi/bin/SOS_busco.py", line 38, in <module>
      busco_df = pd.read_csv(input_busco_file, sep=',',header=0,names=['Busco_id','Status','Sequence','Score','Length'])
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
      return _read(filepath_or_buffer, kwds)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read
      data = parser.read(nrows)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1186, in read
      ret = self._engine.read(nrows)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2145, in read
      data = self._reader.read(nrows)
    File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
    File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
    File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
    File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
    File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
  pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 51, saw 8

I think this is a problem for SOS_busco.py input file(In my case, Read_R_all_busco4.tsv).
Most of lines of my Read_R_all_busco4.tsv have 6 commas (7 columns), like this.
0at38820,Duplicated,SOAP.k25.scaffold27258,8202.3,4167,https://www.orthodb.org/v10?query=0at38820,sacsin

However, some lines of my file have 7 or 8 commas ( 8 or 9 columns) like this.
121at38820,Complete,SOAP.k25.scaffold11722,3027.5,1446,https://www.orthodb.org/v10?query=121at38820,Zinc finger, RING-type
I think that this difference in the number of commas (columns) is the cause of this pandas error.

SOS_busco.py doesn't seem to use columns 6 onwards in the input file.
If so, we can remove columns 6 onwards before SOS_busco.py.

TransPi/TransPi.nf

Lines 1591 to 1592 in 899d160

cat $transpi_tsv | grep -v "#" | tr "\\t" "," >>$all_busco
SOS_busco.py -input_file_busco $all_busco -input_file_fasta $assembly -min ${params.minPerc} -kmers ${params.k}

This is an example of my suggestion for revising.

cat $transpi_tsv | grep -v "#" | tr "\\t" "," >>$all_busco
awk -F',' 'OFS="," {print $1,$2,$3,$4,$5}' $all_busco > some.csv
SOS_busco.py -input_file_busco some.csv -input_file_fasta $assembly -min ${params.minPerc} -kmers ${params.k}
rm -rf some.csv

I hope this helps you.
Thank you.

@rivera10
Copy link
Member

rivera10 commented Dec 9, 2021

Hello @HarukiNakamura,

No worries. Thanks for finding issues and providing suggestions to TransPi. We appreciate it.

You are right, the last column will cause issues since the name has a comma and SOS_busco.py will fail. I think the easiest solution is what you suggested. I will do a test and modify the code. Thanks!

Best,
Ramón

@rivera10
Copy link
Member

rivera10 commented Dec 9, 2021

Pinging @n-conci

@AlexGaithuma
Copy link

AlexGaithuma commented Oct 30, 2022

this works:

1517                cat full_table_*.tsv | grep -v "#" | tr "\t" "," | cut -d ',' -f1-5 >.busco_names.txt
1591                cat $transpi_tsv | grep -v "#" | tr "\t" "," | cut -d ',' -f1-5 >>$all_busco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants