Skip to content

JMPflug/gsec

Repository files navigation

genome_size_estimation_from_coverage GSEC.py

A script to trim and estimate genome sequencing coverage from a base coverage file. Base coverage file should be a tab delimited file with a sequence name, position number, and coverage. These can be generated by many genome manipulation tools, including samtools (using the depth command), BBMap/pileup.sh (with the "basecov=" option), bedtools (with the "-d" option).

Trimming the ends of coverage pileups is often necessary since read mapping programs tend to struggle when extending contigs beyond reference sequences, as seen below. This decrease in coverage can cause the apparent overall coverage of the contig to be artificially low. GSEC.py removes these low-coverage positions by removing bases from both ends and/or retaining only a certain percentage of positions from the middle of the contig, and calculates a corrected coverage for each contig.

Per base read mapping coverage for several loci

Arguments

  -basecov BASECOV, -i BASECOV
                        Base coverage file.
                        
  -exclude EXCLUDE, -e EXCLUDE
                        Optional list of gene names to ignore.
                        
  -iqr IQR              IQR coefficient for excluding outliers.
  
  -trim TRIM            Number of bases to remove from each end of locus.
  
  -min MIN              Minimum number of bases to retain a locus after
                        trimming.
                        
  -out OUT, -o OUT      Output name.

About

GSEC - Genome Size Estimation using Coverage

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages