Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle to NPZ? #33

Closed
biocyberman opened this issue Aug 7, 2017 · 2 comments
Closed

Pickle to NPZ? #33

biocyberman opened this issue Aug 7, 2017 · 2 comments

Comments

@biocyberman
Copy link

I want to try the newer wisecondor that use NPZ. One problem is that we do not have the original BAM files. Is there a way to convert .pickle file to .npz?

@rstraver
Copy link
Collaborator

rstraver commented Aug 8, 2017

I've had this question before, and I guess it makes sense for some situations. I wrote a little script that should do the conversion for you, just copy and paste the following bit into a .py file and it should do the conversion for any pickle you provide. I didn't quite check the correctness of this as I currently do not have much to test this on.

A word of warning: I can imagine that due to differences in implementation, a consistent error in either conversion step can pop up, making a file converted from bam to pickle to npz behave differently than a npz created from a bam directly. Neither implementation would suffer from this difference on their own, it only shows problematic with a conversion like this. If you find there is such a systematic error, please let me know. To test, just convert a bam file you have using the new implementation and the old, then convert it to npz as well, and see if their results differ when testing for CNVs (or send the npzs to me and I can check them internally).

Also, this conversion fills a lot of stats and runtime information it cannot obtain with -1 (or None). So I advice strongly against using it unless it is for reference creation and you have absolutely no way of retrieving the original bam files.

import sys
import pickle
import argparse
import numpy as np

def getRuntime():
	runtime = dict()
	runtime['version']='None'
	runtime['datetime']='None'
	runtime['hostname']='None'
	runtime['username']='None'
	return runtime


parser = argparse.ArgumentParser(description='Convert a legacy pickle file to the newer npz format',
		formatter_class=argparse.ArgumentDefaultsHelpFormatter)

parser.add_argument('infile', type=str,
					help='old format pickle file (input)')

parser.add_argument('outfile', type=str,
					help='new format npz file (output)')

parser.add_argument('-binsize', type=int, default=1000000,
					help='binsize used for pickle creation')

args = parser.parse_args()

sample = pickle.load(open(args.infile,'rb'))


chromosomes = dict()
for chrom in sample:
	chromosomes[chrom] = np.array(sample[chrom],dtype=np.int32)

qual_info = {'mapped':-1,
	'unmapped':-1,
	'no_coordinate':-1,
	'filter_rmdup':-1,
	'filter_mapq':-1,
	'pre_retro':-1,
	'post_retro':-1,
	'pair_fail':-1}

np.savez_compressed(args.outfile,
	arguments=vars(args),
	runtime=getRuntime(),
	sample=chromosomes,
	quality=qual_info)

@biocyberman
Copy link
Author

I will try and give feedback to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants