Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between ICE and KR biases. Setting up the bias limits. #55

Open
pna059 opened this issue Jul 15, 2022 · 5 comments
Open

Difference between ICE and KR biases. Setting up the bias limits. #55

pna059 opened this issue Jul 15, 2022 · 5 comments

Comments

@pna059
Copy link

pna059 commented Jul 15, 2022

Hi,
I am processing a HiC-Pro results from a small genome (in scaffolds) at 5000bp resolution. I have used HiCPro2FitHiC utility to convert data and also generated the KR biases file using HiCKRy.py.

For example, I see 2 extremely high values in the ICE:
HiC_scaffold_83 6297500 88.3104735696205
HiC_scaffold_20 672500 43.444427428612
that are far from corresponding KR values:
HiC_scaffold_83 6297500 1.71088667499396
HiC_scaffold_20 672500 0.824151827905017
...otherwise, the distributions are similar, with a number of -1 values.

Which of the bias version is preferable?
My other question is in which cases the -bL and -bU need to be modified and whether it is appropriate to adjust them to the bias method or other genome/data-specific factors.

Thank you!

@pna059
Copy link
Author

pna059 commented Jul 15, 2022

....with that, I am always getting an error when using bias file with -1 values and have to remove them manually before running fithic.py. Is t possible to fix in the code?

$FITHICDIR/fithic.py -i FitHiC2/HiC_TB5/fithic.interactionCounts.gz -f FitHiC2/HiC_TB5/fithic.fragmentMappability.gz -o FitHiC2/HiC_TB5/ -r 5000 -t FitHiC2/HiC_TB5/KR.biases.gz -p 1 -l TB5_5kb_KR -U 500000 -L 10000 -v


GIVEN FIT-HI-C ARGUMENTS
=========================
Reading fragments file from: FitHiC2/HiC_TB5/fithic.fragmentMappability.gz
Reading interactions file from: FitHiC2/HiC_TB5/fithic.interactionCounts.gz
Output path being used from FitHiC2/HiC_TB5/
Fixed size option detected... Fast version of FitHiC will be used
Resolution is 5.0 kb
Reading bias file from: FitHiC2/HiC_TB5/KR.biases.gz
The number of spline passes is 1
The number of bins is 100
The number of reads required to consider an interaction is 1
The name of the library for outputted files will be TB5_5kb_KR
Upper Distance threshold is 500000
Lower Distance threshold is 10000
Graphs will be outputted
Only intra-chromosomal regions will be analyzed
Lower bound of bias values is 0.5
Upper bound of bias values is 2
All arguments processed. Running FitHiC now...
=========================


Reading the contact counts file to generate bins...
Interactions file read. Time took 11.859328031539917
Traceback (most recent call last):
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 1324, in <module>
    main()
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 323, in main
    (binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb) = generate_FragPairs(observedInterAllCount, observedInterAllSum, binStats, fragsFile, resolution)
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 600, in generate_FragPairs
    print("ERROR - the chromosome " + ch + " has " + len(allFragsDic[ch]) + " valid fragments/bins and should be removed from the input fragment information !!! ")                
TypeError: can only concatenate str (not "int") to str

@ay-lab
Copy link
Owner

ay-lab commented Aug 1, 2022

You should be removing the chromosomes/scaffolds that have ALL of their bins with -1 bias values or bias values outside the 0.5-2 range in general, not the specific bins with -1 values. Hope this clarifies

@pna059
Copy link
Author

pna059 commented Aug 1, 2022 via email

@pna059
Copy link
Author

pna059 commented Aug 15, 2022

I did remove the scaffolds having all of their bins with -1 bias outsider or outside of the range using this R script:

#! /usr/bin/env Rscript
# run: Rscript filter_biases.R <file> <min bias val> <max bias val>

# parse command args
args = commandArgs(trailingOnly = TRUE)
datafile = args[1]
minval = as.numeric(args[2])
maxval = as.numeric(args[3])

# process
alldata <- read.table(datafile, sep = "\t", header = F)
chroms = unique(alldata[,1])
good <- c()
bad <- c()
for (chr in chroms) {
  tmp <- alldata[alldata[, 1] == chr, ]
  if (all(as.numeric(tmp[, 3]) < minval |
          as.numeric(tmp[, 3]) > maxval)) {
    bad <- rbind(bad, tmp)
  } else {
    good <- rbind(good, tmp)
  }
}
# write results
write.table(
  good,
  file = paste0(datafile, ".good.txt"),
  sep = "\t",
  quote = F,
  row.names = F,
  col.names = F
)

However, I am still getting the same error when using the good biases file.

Reading the contact counts file to generate bins...Interactions file read. Time took 28.445305585861206
Traceback (most recent call last):
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 1324, in <module>
    main()
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 323, in main
    (binStats,noOfFrags, maxPossibleGenomicDist, possibleIntraInRangeCount, possibleInterAllCount, interChrProb, baselineIntraChrProb) = generate_FragPairs(observedInterAllCount, observedInterAllSum, binStats, fragsFile, resolution)
  File "/home/pavlan/tools/fithic/fithic/fithic.py", line 600, in generate_FragPairs
    print("ERROR - the chromosome " + ch + " has " + len(allFragsDic[ch]) + " valid fragments/bins and should be removed from the input fragment information !!! ")     
TypeError: can only concatenate str (not "int") to str

Thank you for help, Pavla

@ay-lab
Copy link
Owner

ay-lab commented Aug 29, 2022

the program is trying to throw an ERROR but in there there is a little bug about int to str conversion (we can fix). But the error that is important tells you that you haven't removed all problematic chrs or contigs properly. I suggest you look at the number of bins for each non-filtered chr or contig and there is likely one or more with zero or one such bins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants