UnicodeDecodeError When running hicConvertFormat to convert HiC to Cool format #821

ashishjain1988 · 2022-09-26T19:46:53Z

Welcome to the HiCExplorer GitHub repository! Before opening the issue please check
that the following requirements are met :

Search whether this issue (or a similar issue) has been solved before using the search tab above. Link the previous issue if appropriate below.
Paste your HiCExplorer version (hicInfo --version) and your python version (python --version) below.
HiC Version: 3.7.2 and Python version 3.9
Have you checked our documentation on hicexplorer.readthedocs.io? Yes
Do you use conda to install HiCExplorer? Yes
Do you use the latest HiCExplorer release? If not, please install it via a conda environment:
conda create --name hicexplorer hicexplorer=3.6 python=3.8 -c bioconda -c conda-forge
and activate the environment: conda activate hicexplorer. Retry your command. You can exit a conda environment via conda deactivate. To learn more about conda and environments, please consider the following documentation.

Retry your command, is it solved now? If not please continue with the following:

Paste the full HiCExplorer command that produces the issue below
(ignore if you simply spotted the issue in the code/documentation).
hicConvertFormat -m K0_S2_mapped_contact_map.hic --inputFormat hic --outputFormat cool -o K0_S2_mapped_contact_map.cool
Paste the output printed on screen from the command that produces the issue
below (ignore if you simply spotted the issue in the code/documentation).
INFO:hicexplorer.hicConvertFormat:Converting with hic2cool.
Traceback (most recent call last):
File "/programs/x86_64-linux/hicexplorer/3.7.2/bin/hicConvertFormat", line 7, in
main()
File "/programs/x86_64-linux/hicexplorer/3.7.2/lib/python3.9/site-packages/hicexplorer/hicConvertFormat.py", line 131, in main
hic2cool_convert(matrix, args.outFileName[i], 0)
File "/programs/x86_64-linux/hicexplorer/3.7.2/lib/python3.9/site-packages/hic2cool/hic2cool_utils.py", line 860, in hic2cool_convert
pair_footer_info, expected, factors, norm_info = read_footer(req, mmap_buf, masteridx)
File "/programs/x86_64-linux/hicexplorer/3.7.2/lib/python3.9/site-packages/hic2cool/hic2cool_utils.py", line 131, in read_footer
unit = readcstr(f)
File "/programs/x86_64-linux/hicexplorer/3.7.2/lib/python3.9/site-packages/hic2cool/hic2cool_utils.py", line 59, in readcstr
return buf.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 1: invalid start byte

The text was updated successfully, but these errors were encountered:

lldelisle · 2022-10-21T12:30:44Z

Hi,
Would you mind to download the file we use in tests here and check if the following command works:

hicConvertFormat -m SRR1791297_30.hic --inputFormat hic --outputFormat cool -o test.cool

xiaohuli-45 · 2022-10-30T13:17:26Z

Hello, I have encountered the same problem. But SRR1791297_ 30. hic succeeded in hic2cool. My hic file is downloaded from ENCODE, could you help me? Thank you very much!

lldelisle · 2022-10-30T16:26:22Z

Would you mind to give the URL?
Thanks

xiaohuli-45 · 2022-10-31T01:19:28Z

The URL is https://www.encodeproject.org/files/ENCFF080DPJ/@@download/ENCFF080DPJ.hic.
Thanks

lldelisle · 2022-10-31T12:54:54Z

Hi,
In fact this issue is the same as #798 : the format behind .hic changed and it seems that hic2tool is not updated (see 4dn-dcic/hic2cool#60).

lldelisle · 2022-10-31T21:46:10Z

Waiting for a better solution, this python script is working, using hicstraw (available on pip: https://pypi.org/project/hic-straw/) and cooler (https://cooler.readthedocs.io/en/latest/):


import numpy as np
import hicstraw
import os

hic_file = 'ENCFF080DPJ.hic'
cool_file = 'ENCFF080DPJ_250kb.cool'

data_type = 'observed' # (previous default / "main" data) or 'oe' (observed/expected)
normalization = "NONE"  # , VC, VC_SQRT, KR, SCALE, etc.
resolution = 250000

hic = hicstraw.HiCFile(hic_file)

assert resolution in hic.getResolutions(), \
    f"{resolution} is not part of the possible resolutions {','.join(hic.getResolutions())}"

# First write the chromosome sizes:
with open(hic.getGenomeID() + '.size', 'w') as fsize:
    for chrom in hic.getChromosomes():
        if chrom.name != "All":
            fsize.write(f"{chrom.name}\t{chrom.length}\n")
# Then write the counts in text file:
with open(cool_file.replace('.cool', ".txt"), 'w') as fo:
    for i in range(len(chrom_sizes)):
        for j in range(i, len(chrom_sizes)):
            chrom1 = chrom_sizes.index[i]
            chrom2 = chrom_sizes.index[j]
            result = hicstraw.straw(data_type, normalization, hic_file, chrom1, chrom2, 'BP', resolution)
            for k in range(len(result)):
                start1 = result[k].binX
                start2 = result[k].binY
                value = result[k].counts
                fo.write(f"{chrom1}\t{start1}\t{start1}\t{chrom2}\t{start2}\t{start2}\t{value}\n")

os.system(f"cooler load -f bg2 {hic.getGenomeID()}.size:{resolution} {cool_file.replace('.cool', '.txt')} {cool_file}")

The code above has a mistake, please use the one below.

xiaohuli-45 · 2022-11-01T11:10:21Z

Hi,
I successfully converted the file format with your code. Thank you very much !

lldelisle · 2022-11-01T12:54:06Z

Glad it has been useful for someone. 😉

LinearParadox · 2022-11-16T11:19:14Z

Hi,

I tried your code, however I'm getting chrom_sizes is not defined, as the variable does not seem to be declared anywhere.

lldelisle · 2022-11-16T11:25:49Z

Oups indeed...
I tried to simplify but I did a mistake, here is the correct one:

import numpy as np
import hicstraw
import os
import pandas as pd

hic_file = 'ENCFF080DPJ.hic'
cool_file = 'ENCFF080DPJ_250kb.cool'

data_type = 'observed' # (previous default / "main" data) or 'oe' (observed/expected)
normalization = "NONE"  # , VC, VC_SQRT, KR, SCALE, etc.
resolution = 250000

hic = hicstraw.HiCFile(hic_file)

assert resolution in hic.getResolutions(), \
    f"{resolution} is not part of the possible resolutions {','.join(hic.getResolutions())}"

chrom_sizes = pd.Series({chrom.name: chrom.length for chrom in hic.getChromosomes() if chrom.name != "All"})

# First write the chromosome sizes:
with open(hic.getGenomeID() + '.size', 'w') as fsize:
    for chrom in hic.getChromosomes():
        if chrom.name != "All":
            fsize.write(f"{chrom.name}\t{chrom.length}\n")
# Then write the counts in text file:
with open(cool_file.replace('.cool', ".txt"), 'w') as fo:
    for i in range(len(chrom_sizes)):
        for j in range(i, len(chrom_sizes)):
            chrom1 = chrom_sizes.index[i]
            chrom2 = chrom_sizes.index[j]
            result = hicstraw.straw(data_type, normalization, hic_file, chrom1, chrom2, 'BP', resolution)
            for k in range(len(result)):
                start1 = result[k].binX
                start2 = result[k].binY
                value = result[k].counts
                fo.write(f"{chrom1}\t{start1}\t{start1}\t{chrom2}\t{start2}\t{start2}\t{value}\n")

os.system(f"cooler load -f bg2 {hic.getGenomeID()}.size:{resolution} {cool_file.replace('.cool', '.txt')} {cool_file}")

caragraduate · 2024-01-03T17:48:17Z

Hi there, thank you for providing the code above which can be successfully run in my case. But when I check the information inside the converted .cool file using 'hicInfo' command, it only has 'chrom', 'start', 'end' columns available. I did not see any 'weight' column or in my case, it should be 'SCALE' column.

Is it normal to see or do you have any advice to deal with this problem?

Many thanks!

ashishjain1988 closed this as completed Sep 26, 2022

ashishjain1988 reopened this Sep 26, 2022

lldelisle mentioned this issue Oct 31, 2022

Having Issues using HiCExplorer with .hic files generated from Juicer v2.0 #798

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError When running hicConvertFormat to convert HiC to Cool format #821

UnicodeDecodeError When running hicConvertFormat to convert HiC to Cool format #821

ashishjain1988 commented Sep 26, 2022

lldelisle commented Oct 21, 2022

xiaohuli-45 commented Oct 30, 2022 •

edited

Loading

lldelisle commented Oct 30, 2022

xiaohuli-45 commented Oct 31, 2022

lldelisle commented Oct 31, 2022

lldelisle commented Oct 31, 2022 •

edited

Loading

xiaohuli-45 commented Nov 1, 2022

lldelisle commented Nov 1, 2022

LinearParadox commented Nov 16, 2022

lldelisle commented Nov 16, 2022 •

edited

Loading

caragraduate commented Jan 3, 2024

UnicodeDecodeError When running hicConvertFormat to convert HiC to Cool format #821

UnicodeDecodeError When running hicConvertFormat to convert HiC to Cool format #821

Comments

ashishjain1988 commented Sep 26, 2022

lldelisle commented Oct 21, 2022

xiaohuli-45 commented Oct 30, 2022 • edited Loading

lldelisle commented Oct 30, 2022

xiaohuli-45 commented Oct 31, 2022

lldelisle commented Oct 31, 2022

lldelisle commented Oct 31, 2022 • edited Loading

xiaohuli-45 commented Nov 1, 2022

lldelisle commented Nov 1, 2022

LinearParadox commented Nov 16, 2022

lldelisle commented Nov 16, 2022 • edited Loading

caragraduate commented Jan 3, 2024

xiaohuli-45 commented Oct 30, 2022 •

edited

Loading

lldelisle commented Oct 31, 2022 •

edited

Loading

lldelisle commented Nov 16, 2022 •

edited

Loading