Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in datapre step #184

Open
huawen-poppy opened this issue Sep 19, 2023 · 7 comments
Open

error in datapre step #184

huawen-poppy opened this issue Sep 19, 2023 · 7 comments

Comments

@huawen-poppy
Copy link

Hello, thanks for your great tool!

Recently I am trying to run xpore on my data, however, there is an error stating that:
File "pandas/_libs/lib.pyx", line 2411, in pandas._libs.lib.maybe_convert_numeric ValueError: Unable to parse string "114.817,115.635,109.092,117.816,123.814,102.277" at position 0

could you please help me fix this problem? Thank you very much!

@yuukiiwa
Copy link
Collaborator

Hi @huawen-poppy,

Do you mind sharing the command you use and head all the inputs (eventalign.txt, gtf, and fasta) you use, please?

Thanks!

Best wishes,
Yuk Kei

@huawen-poppy
Copy link
Author

Thank you for your response!
I was using the command xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --out_dir ./output --n_process 32
I figured out the error sourced form the eventalign file, in which I have an extra column containing the strings '114.817,115.635,109.092,117.816,123.814,102.277'. Now I deleted the extra column. But it comes with another error:
`/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)`

The header of eventalign.txt file is
image

the header of the gtf file is:
image

the header of the fasta file is:
image

Could you please help me sovle this problem? Thanks!

@yuukiiwa
Copy link
Collaborator

Hi @huawen-poppy,

To convert the transcript position to genome position, you will have to include the --genome flag too:

xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --genome --out_dir ./output --n_process 32

Do you mind sharing the full error message from xpore diffmod, please?

Thanks!

Best wishes,
Yuk Kei

@huawen-poppy
Copy link
Author

Hi. Thanks for your reply! I am still running the xpore diffmod process. So far there is no error messages. The head of the diffmod.table looks like:
image

Do you think I should cancel the current job and run the xpore dataprep process with adding flag --genome?

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Oct 2, 2023

Hi @huawen-poppy,

If you don't need to convert the transcript coordinates to genomic coordinates, then you don't need the --genome , --gtf_or_gff, and --transcript_fasta flags.

I have just noticed that the sequences of your fasta file are not capitalized. If you need transcript-to-genomic coordinate conversion, you can try capitalizing them.

Thanks!

Best wishes,
Yuk Kei

@AndreaYCT
Copy link

Hi,

I have encountered " PerformanceWarning" as well. I still can have eventalign.index generated.
Does it affect result?

Thanks!

Andrea

Thank you for your response! I was using the command xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --out_dir ./output --n_process 32 I figured out the error sourced form the eventalign file, in which I have an extra column containing the strings '114.817,115.635,109.092,117.816,123.814,102.277'. Now I deleted the extra column. But it comes with another error: `/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines)`

The header of eventalign.txt file is image

the header of the gtf file is: image

the header of the fasta file is: image

Could you please help me sovle this problem? Thanks!

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Mar 7, 2024

Hi @AndreaYCT,

This is warning that doesn't affect the results.

Thanks!

Best wishes,
Yuk Kei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants