AMULET.py crashes on empty --rfilter BED (pandas EmptyDataError)

## Summary

`AMULET.py` reads the `--rfilter` BED file via `pd.read_csv(args.rfilter, sep="\t", header=None).values[:,0:3]` (line 194). If the BED file is genuinely empty (a valid case for organisms without a published blacklist), pandas raises:

```
pandas.errors.EmptyDataError: No columns to parse from file
```

This happens before any of the AMULET algorithm runs, so the OverlapSummary.txt and Overlaps.txt files (produced upstream by `FragmentFileOverlapCounter.py`) are wasted.

## Reproduction

Pass a zero-byte BED file as `--rfilter`:

```bash
touch /tmp/empty_blacklist.bed
bash AMULET.sh fragments.tsv.gz singlecell.csv chrs.txt /tmp/empty_blacklist.bed outdir scriptpath
```

Full traceback:

```
Traceback (most recent call last):
  File "AMULET.py", line 194, in <module>
    simplerepeats = po.getUnionPeaks([pd.read_csv(args.rfilter, sep="\t", header=None).values[:,0:3]])
  ...
  File ".../pandas/_libs/parsers.pyx", line 581, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
```

## Workaround

A single dummy line on a fake chromosome unblocks pandas without affecting the algorithm (cannot intersect any real fragment):

```bash
echo -e "__no_blacklist__\t0\t1" > /tmp/repeats.bed
```

This is what I'm doing in production for duck (*Anas platyrhynchos*) snATAC multiome — no duck-specific blacklist exists.

## Suggested fix

Guard the `pd.read_csv` call against empty files and skip the filtering step in that case. Roughly:

```python
import os
simplerepeats = np.zeros((0, 3))
if args.rfilter and os.path.getsize(args.rfilter) > 0:
    try:
        simplerepeats = po.getUnionPeaks([
            pd.read_csv(args.rfilter, sep="\t", header=None).values[:, 0:3]
        ])
    except pd.errors.EmptyDataError:
        pass  # treat as no-filter
```

(Or even simpler: catch the `EmptyDataError` and fall through to the no-filter branch.)

## Why this matters

The README says the repeats filter is recommended but not strictly required. Empty BED is a legitimate use case for users working with non-mammalian / non-model-organism datasets without published ENCODE-style blacklists. Currently those users have to know about the dummy-line workaround, which is not documented anywhere.

Low priority — trivial workaround exists — but worth a 5-line fix for surface ergonomics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMULET.py crashes on empty --rfilter BED (pandas EmptyDataError) #30

Summary

Reproduction

Workaround

Suggested fix

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AMULET.py crashes on empty --rfilter BED (pandas EmptyDataError) #30

Description

Summary

Reproduction

Workaround

Suggested fix

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions