add new little requested features #875

pavanvidem · 2023-10-24T13:49:18Z

Flake8 passes (flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F403,E402,F999,F405,E712)
Local tests pass (py.test hicexplorer --doctest-modules)

pavanvidem · 2023-10-24T13:54:41Z

Added the following features:

Allow chicAggregateStatistic.py to to extract the aggregated data from the views.hdf based on differential.hdf or differential_target.bed. Now the BED may have the target name in the 4th column. In that case, the aggregation is done per target.
Allow hicCorrectMatrix.py to write filtered out regions to a BED file

pavanvidem · 2023-10-24T13:55:01Z

Tests need to be added

joachimwolff · 2023-11-17T17:52:01Z

Please provide test cases and make sure it works with current python and numpy. Thanks!

pavanvidem · 2023-11-17T18:08:48Z

I might have some test data but have to find some time.

pavanvidem · 2024-04-23T09:36:46Z

@joachimwolff @bgruening please review

bgruening · 2024-04-24T06:55:41Z

hicexplorer/chicAggregateStatistic.py

+            for line in file.readlines():
+                if line.startswith('#'):
+                    continue
+                _line = line.strip().split('\t')


I find the variable name _line confusing. They are now elements or something like that.

to be consistent with the remaining project I used _line. The same line of code can be found several times.

bgruening · 2024-04-24T06:56:08Z

hicexplorer/hicCorrectMatrix.py

@@ -548,6 +551,7 @@ def filter_by_zscore(hic_ma, lower_threshold, upper_threshold, perchr=False):
    to avoid introducing bias due to different chromosome numbers

    """
+    print("filtering by z-score")


remove? Or use logging.

bgruening · 2024-04-24T06:56:18Z

hicexplorer/hicCorrectMatrix.py

@@ -658,9 +662,22 @@ def main(args=None):
                            restore_masked_bins=False)

        assert matrix_shape == ma.matrix.shape
+        for idx in outlier_regions:


bgruening · 2024-04-24T06:56:46Z

hicexplorer/hicCorrectMatrix.py

+            with open(args.filteredBed, 'w') as f:
+                for outlier_region in set(outlier_regions):
+                    interval = ma.cut_intervals[outlier_region]
+                    f.write('{}\t{}\t{}\t.\t{}\t.\n'.format(interval[0],


using an f-string here makes it much easier to read I suspect

used a plain join because the interval should anyways contain 4 elements

bgruening · 2024-04-24T06:56:53Z

hicexplorer/hicCorrectMatrix.py

        # mask filtered regions
        ma.maskBins(outlier_regions)
        total_filtered_out = set(outlier_regions)
+        print(outlier_regions, "Bins that are MAD outliers ({:.2f}%) "


bgruening · 2024-04-24T06:58:21Z

hicexplorer/test/general/test_hicCorrectMatrix.py

+            if x != y:
+                count = sum(1 for a, b in zip(x, y) if a != b)
+                if count > pDifference:
+                    equal = False


you break here the entire loop, correct? than you can also return and remove the equal altogether

I want to return true or false. Again copied this part from somewhere in the project.

return True, return False ;)

bgruening · 2024-04-24T07:04:41Z

hicexplorer/utilities.py

+        for line in file.readlines():
+            if line.startswith('#'):
+                continue
+            _line = line.strip().split('\t')


same as my comment above

bgruening · 2024-04-24T07:06:24Z

hicexplorer/utilities.py

+            try:
+                chrom, start, end = _line[:4]
+            except ValueError:
+                _line = line.strip().split()


When a line has less then 3 columns, this exception is raised, and then you are trying the same again?

No, first split was with tabs. This one with white spaces, if the input BED file is not tab seperated.

bgruening · 2024-04-24T07:06:36Z

hicexplorer/utilities.py

+                chrom, start, end = _line[:4]
+            except ValueError:
+                _line = line.strip().split()
+                chrom, start, end, gene = _line[:4]


isn't the 4 wrong here?

This file should contain 4 columns.

Maybe add a column, its really not clear that you are parsing here two different file-types.

bgruening · 2024-04-24T07:07:18Z

requirements.txt

@@ -23,6 +23,6 @@ future >= 0.18
 tqdm >= 4.66
 hyperopt >= 0.2.7
 python-graphviz >= 0.20
-scikit-learn >= 1.3.1
+scikit-learn == 1.3.2


this pin looks to strict to me, is >=1,3,2,<1.4 better?

Currently, there is no other version between 1.3.2 and 1.4. I switched it to 1.3,1.4

bgruening · 2024-04-24T12:53:22Z

setup.py

@@ -116,7 +116,7 @@ def checkProgramIsInstalled(self, program, args, where_to_download,
                       "tqdm >= 4.66",
                       "hyperopt >= 0.2.7",
                       "graphviz >= 0.20",
-                       "scikit-learn >= 1.3.1",
+                       "scikit-learn == 1.3.2",


pavanvidem added 5 commits April 23, 2024 03:06

add new little requested features

9bc0611

Revert the old 3-column BED input in addition to the new 4-column BED

2795d39

add tests

0c0fd76

fix linting

118949f

update requirements scikit-learn 1.3

74f990f

pavanvidem force-pushed the hic-new-features branch from 799ca0d to 74f990f Compare April 23, 2024 07:49

also update in setup.py

ac380c6

bgruening reviewed Apr 24, 2024

View reviewed changes

remove print statements

55a5dc3

bgruening reviewed Apr 24, 2024

View reviewed changes

pavanvidem added 2 commits April 24, 2024 17:58

change requirement in setup.py and comment in BED reading

8fd8066

write a 6-column BED instead of a 4-column one

c416e9a

bgruening approved these changes Apr 24, 2024

View reviewed changes

bgruening merged commit 67a4f37 into deeptools:master Apr 24, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add new little requested features #875

add new little requested features #875

pavanvidem commented Oct 24, 2023

pavanvidem commented Oct 24, 2023 •

edited

pavanvidem commented Oct 24, 2023

joachimwolff commented Nov 17, 2023

pavanvidem commented Nov 17, 2023

pavanvidem commented Apr 23, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

bgruening Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

bgruening Apr 24, 2024

pavanvidem Apr 24, 2024

bgruening Apr 24, 2024

add new little requested features #875

add new little requested features #875

Conversation

pavanvidem commented Oct 24, 2023

pavanvidem commented Oct 24, 2023 • edited

pavanvidem commented Oct 24, 2023

joachimwolff commented Nov 17, 2023

pavanvidem commented Nov 17, 2023

pavanvidem commented Apr 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavanvidem commented Oct 24, 2023 •

edited