Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python error in dataProcessing.py #9

Closed
maxdudek opened this issue Dec 28, 2022 · 1 comment
Closed

Python error in dataProcessing.py #9

maxdudek opened this issue Dec 28, 2022 · 1 comment
Assignees

Comments

@maxdudek
Copy link

Hi,

When I run the script like this:

conda activate TRACE_env
python dataProcessing.py $PEAKS $BAM $GENOME --atac-seq pe --prefix ./$PREFIX

I get the following error:

Traceback (most recent call last):
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 290, in <module>
    main()
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 273, in main
    loessSignal, deriv2nd, deriv1st, bc_loess = signal.get_signal(args.span, args.is_atac, args.shift)
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 193, in get_signal
    maximum = int(self.size[1][self.size[0] == peak[0]])
  File "/home/maxdudek/local/miniconda3/envs/TRACE_env/lib/python3.7/site-packages/pandas/core/series.py", line 131, in wrapper
    raise TypeError("cannot convert the series to " "{0}".format(str(converter)))
TypeError: cannot convert the series to <class 'int'>

As far as I can tell my bam, bed3, and genome files are all in the correct format, and I don't know enough about the code to figure out what's going wrong. Looking up the error tells me to use .astype(int), instead if int(), but then I just get another error later down:

Traceback (most recent call last):
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 280, in <module>
    main()
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 263, in main
    loessSignal, deriv2nd, deriv1st, bc_loess = signal.get_signal(args.span, args.is_atac, args.shift)
  File "/home/maxdudek/local/src/TRACE/scripts/dataProcessing.py", line 186, in get_signal
    ext_r = 5000 if int(peak[2]) + 5000 < maximum else maximum - int(peak[2])
  File "/home/maxdudek/local/miniconda3/envs/TRACE_env/lib/python3.7/site-packages/pandas/core/generic.py", line 1556, in __nonzero__
    self.__class__.__name__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I printed out the values of the variables when the error occurs in case it helps. Any insight would be greatly appreciated!

self.size:

                    0          1
0                chr1  248956422
1                chr2  242193529
2                chr3  198295559
3                chr4  190214555
4                chr5  181538259
..                ...        ...
450  chrUn_KI270539v1        993
451  chrUn_KI270385v1        990
452  chrUn_KI270423v1        981
453  chrUn_KI270392v1        971
454  chrUn_KI270394v1        970

[455 rows x 2 columns]

peak:

1	9957	10639

self.size[0] == peak[0]:

0      False
1      False
2      False
3      False
4      False
       ...  
450    False
451    False
452    False
453    False
454    False
Name: 0, Length: 455, dtype: bool

self.size[1][self.size[0] == peak[0]]:

Series([], Name: 1, dtype: int64)
@OuyangNX
Copy link
Contributor

I believe the error was caused by chromosome format in your peak file, you have 1 in the first column instead of chr1, so it was not able to get the right chromosome size.

@aboyle aboyle closed this as completed Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants