Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
a8c1b80
Add new synthetic epitopes
owlang Jun 4, 2022
dccf535
include R100,R50,R20 to generate synth genomes
owlang Jun 4, 2022
8a15989
update to skip bp-count of genomic FASTA
owlang Jul 7, 2022
a17d125
add new synthetic epitopes to tagDB setup
owlang Jul 26, 2022
fcce324
update build synthetic genome scripts
owlang Aug 2, 2022
2525aa6
autobuild depth simulation submission scripts
owlang Aug 3, 2022
4023ae1
add parallelization to simulation script
owlang Sep 22, 2022
1499004
expand simulations to vary different parameters
owlang Sep 22, 2022
a1e5877
add `.md` ext to README files
owlang Sep 22, 2022
ad0312c
add `.md` ext to remaining README files
owlang Sep 22, 2022
10e134f
reformat README files
owlang Sep 22, 2022
6683f88
add checklist to SyntheticEpitope README
owlang Sep 25, 2022
279b8ba
update SyntheticEpitope progress
owlang Sep 25, 2022
11548a2
update simulation progress
owlang Sep 27, 2022
ab76915
add script to check progress on simulations
owlang Sep 27, 2022
1634980
add scripts for running EID on simulated data
owlang Sep 27, 2022
9d64824
fix db naming to account for ref genome
owlang Sep 27, 2022
42bf110
update simulation progress
owlang Sep 28, 2022
dfd8460
mark in-progress sets
owlang Sep 28, 2022
734a9cc
update simulation progress
owlang Sep 29, 2022
478b41c
update simulation progress
owlang Oct 2, 2022
ed5a7b8
update simulation progress
owlang Oct 2, 2022
f5cd513
update simulation progress
owlang Oct 4, 2022
309d247
update simulation progress
owlang Oct 5, 2022
61a502c
update simulation progress
owlang Oct 6, 2022
2e28833
update simulation progress
owlang Oct 10, 2022
2710e4d
update simulation progress
owlang Oct 11, 2022
3ce3af2
update simulation progress
owlang Oct 12, 2022
cc47bcc
update simulation progress
owlang Oct 14, 2022
5a6af0f
update simulation progress
owlang Oct 15, 2022
dbbc49a
formatting - entab script filese
owlang Oct 15, 2022
dbb7c4a
update to use bowtie2 and support custom p-val
owlang Oct 16, 2022
d5a1730
remove unused files
owlang Oct 16, 2022
ae6c378
update epitopeid-template
owlang Oct 16, 2022
f122292
update simulation progress
owlang Oct 16, 2022
ae86f38
update simulation progress
owlang Oct 18, 2022
b4386b7
update simulation progress
owlang Oct 22, 2022
ed2daae
add script to collect SynthEpitope yeast reports
owlang Oct 31, 2022
678e376
add reports for completed simulations
owlang Oct 31, 2022
9ea3bc5
switch to Bowtie2 for simulations
owlang Nov 10, 2022
21a3c9f
add scripts for compiling & plotting results
owlang Nov 10, 2022
62f0673
add error-checking to the script templates
owlang Nov 10, 2022
86660bd
update simulation progress
owlang Dec 6, 2022
6b171ce
report results of ENCODE-eGFP rerun on new alg
owlang Dec 21, 2022
a61cdca
Merge branch 'revisions' of https://github.com/CEGRcode/GenoPipe into…
owlang Dec 21, 2022
8653a7a
add completed sacCer3 simulation results
owlang Jan 4, 2023
5122797
switch back to Bedtools-based simulation
owlang Jan 18, 2023
6bc8313
bugfix simulations for new Bedtools version
owlang Jan 18, 2023
ef02f64
update software import statements
owlang Jan 18, 2023
e025da8
fix tabs and add comments to simulation script
owlang Jan 18, 2023
3cbe2a7
add checks and thread option to epitopeid template
owlang Jan 18, 2023
7f2dc0f
update simulations
owlang Jan 18, 2023
0eea093
fix delimiter between simulated paired reads
owlang Jan 26, 2023
e3ec6ef
remove unused bwa indexing from setup script
owlang Jan 31, 2023
9486c8d
rename human EpitopeID with HIV database
owlang Jan 31, 2023
6e557a0
adjust SRA download scripts to reformat FASTQ
owlang Jan 31, 2023
0eeb840
update simulation progress with results
owlang Jan 31, 2023
f692cca
remove "partial" data files
owlang Jan 31, 2023
2a29356
correct overwrites from previous commit
owlang Jan 31, 2023
3d337cd
flatten directory structure of yeast results
owlang Jan 31, 2023
3348499
update simulation progress with results
owlang Feb 7, 2023
f36a5b6
build figures from raw results
owlang Feb 7, 2023
b131490
update with latest simulation results
owlang Feb 8, 2023
2439bb1
update with latest simulation results
owlang Feb 8, 2023
0fabe3a
update with HIV rerun results
owlang Feb 9, 2023
c8fafca
update scripts that build SyntheticEpitope figs
owlang Feb 9, 2023
da945fe
update yeast R500 simulations with fixed results
owlang Feb 16, 2023
31f2b42
hardcode target order in plotting scripts
owlang Feb 16, 2023
6c7c898
add yeast summary reports and figs
owlang Feb 16, 2023
7843ba1
rename READMEs README.md
owlang Feb 17, 2023
b0cb6d1
update mixture contamination scripts
owlang Feb 17, 2023
7746a54
update with raw results of yeast mixture (1M) eid
owlang Feb 17, 2023
a79306e
add yeast mixture eid summary results and fig
owlang Feb 17, 2023
b0d75cb
add and updated eid hg19 results
owlang Feb 22, 2023
fab5c02
update hg19 depth simulation summaries and figs
owlang Feb 22, 2023
7f85bf1
add mix-human eid raw results
owlang Mar 1, 2023
8ed21d9
add human mixture eid summary results and fig
owlang Mar 1, 2023
7f13ae6
update ENCODE-eGFP with browser scripts
owlang Mar 6, 2023
44ae489
minor script updates
owlang Mar 6, 2023
236cd49
update READMEs
owlang Mar 6, 2023
eba2c2b
refactor general scripts for Bowtie2 and update
owlang Mar 22, 2023
9a2de36
Merge branch 'master' into revisions
owlang Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.ann
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.bwt
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.pac
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.sa
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.1.bt2
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.2.bt2
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.3.bt2
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.4.bt2
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.rev.1.bt2
EpitopeID/sacCer3_EpiID/FASTA_genome/genome.fa.rev.2.bt2
EpitopeID/ecoli_EpiID/FASTA_genome/genome.fa.amb
EpitopeID/ecoli_EpiID/FASTA_genome/genome.fa.ann
EpitopeID/ecoli_EpiID/FASTA_genome/genome.fa.bwt
Expand All @@ -16,3 +22,9 @@ EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.ann
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.bwt
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.pac
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.sa
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.1.bt2
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.2.bt2
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.3.bt2
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.4.bt2
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.rev.1.bt2
EpitopeID/hg19_EpiID/FASTA_genome/genome.fa.rev.2.bt2
60 changes: 30 additions & 30 deletions DeletionID/delScripts/detect_deletion_BAM.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ def calculateDeletion(PASS):
if float(PASS[key]) == 0:
SCORES.append((key, 'No Data Detected'))
elif np.isnan(float(PASS[key])):
SCORES.append((key, 'Region does not meet mappability threshold'))
SCORES.append((key, 'Region does not meet mappability threshold'))
else:
SCORES.append((key, np.log(float(PASS[key]) / MEDIAN) / np.log(2)))
SCORES.append((key, np.log(float(PASS[key]) / MEDIAN) / np.log(2)))
return SCORES

def iterateBAM(bam, bed, READLENGTH, MAP, MAPTHRESH):
# open BAM file
samfile = pysam.AlignmentFile(bam, "rb")
# open BED file
# open BED file
file = open(bed, "r")

# Counter of total tags mapping across all intervals
Expand Down Expand Up @@ -65,16 +65,16 @@ def iterateBAM(bam, bed, READLENGTH, MAP, MAPTHRESH):
intervalCount[index] = intervalCount[index] + 1
totalSize = totalSize + intervalSize

# Calculate avg tags per bp across the entire region
# Calculate avg tags per bp across the entire region
intervalAvg = list(map(lambda x : float(x) / float(intervalSize), intervalCount))
# Normalize avg reads per interval by mappability

# Normalize avg reads per interval by mappability
for index in range(0, len(MAP[intervalID])):
if float(MAP[intervalID][index]) >= MAPTHRESH:
mapAvg[index] = float(intervalAvg[index] * intervalCount[index]) / float(MAP[intervalID][index])
else:
mapAvg[index] = float('NaN')
if float(MAP[intervalID][index]) >= MAPTHRESH:
mapAvg[index] = float(intervalAvg[index] * intervalCount[index]) / float(MAP[intervalID][index])
else:
mapAvg[index] = float('NaN')

PASS[intervalID] = (mapAvg, intervalCount)

except (IndexError, ValueError):
Expand All @@ -90,7 +90,7 @@ def iterateBAM(bam, bed, READLENGTH, MAP, MAPTHRESH):
if all(float(x) < MAPTHRESH for x in MAP[key]):
normalizedScore = float('NaN')
elif sum(PASS[key][1]) != 0:
normalizedScore = np.nansum(PASS[key][0]) / sum(PASS[key][1])
normalizedScore = np.nansum(PASS[key][0]) / sum(PASS[key][1])
else:
normalizedScore = 0
SCORE[key] = normalizedScore
Expand All @@ -99,11 +99,11 @@ def iterateBAM(bam, bed, READLENGTH, MAP, MAPTHRESH):
if float(totalSize) <= 0:
print("ERROR!!!\tTotal size of all intervals surveyed is less than 1")
sys.exit(-1)

# Close files
file.close()
samfile.close()

return SCORE,FAIL

def closestLength(READLENGTH, read):
Expand All @@ -120,7 +120,7 @@ def loadMap(MAP):
file = open(MAP, "r")
header = 0;
MAP = {}
# Iterate BED coord file, getting tag counts across interval
# Iterate BED coord file, getting tag counts across interval
for line in file:
mapline = line.rstrip().split("\t")
if header == 0:
Expand All @@ -140,8 +140,8 @@ def validateBAM(bam):
print("BAM index not detected.\nAttempting to index now...\n")
pysam.index(str(bam))
if not os.path.isfile(bam + ".bai"):
raise RuntimeError("BAM indexing failed, please check if BAM file is sorted")
return False
raise RuntimeError("BAM indexing failed, please check if BAM file is sorted")
return False
print("BAM index successfully generated.\n")
return True

Expand All @@ -150,9 +150,9 @@ def validateBAM(bam):
if len(sys.argv) < 2 or not sys.argv[1].startswith("-"): sys.exit(usage)
BAM = BED = MAP = OUT = ""

# Variable to set the mappability threshold so that we do not consider regions with mappability
# below this number 0-1, Default to 0.25 meaning at least 25% of the region must be uniquely mappable
# by at least one actively used readlength
# Variable to set the mappability threshold so that we do not consider regions with mappability
# below this number 0-1, Default to 0.25 meaning at least 25% of the region must be uniquely mappable
# by at least one actively used readlength
MAPTHRESH = 0.25

OUTPUTTHRESH = -3
Expand All @@ -173,11 +173,11 @@ def validateBAM(bam):
print("No BAM file detected!!!")
sys.exit(usage)
elif BED == "":
print("No BED Coordinate file detected!!!")
sys.exit(usage)
print("No BED Coordinate file detected!!!")
sys.exit(usage)
elif MAP == "":
print("No Mappability file detected!!!")
sys.exit(usage)
print("No Mappability file detected!!!")
sys.exit(usage)
if OUT == "":
OUT = os.path.splitext(os.path.basename(BAM))[0] + "_" + os.path.splitext(os.path.basename(BED))[0] + ".tab"

Expand All @@ -189,11 +189,11 @@ def validateBAM(bam):
print("Output file: ",OUT)
print("Log2 output threshold: ",OUTPUTTHRESH)

# Validate BAM file
# Validate BAM file
if(not validateBAM(BAM)):
print("ERROR!!!\tNo BAM index detected.\n")
sys.exit(-1)
print("ERROR!!!\tNo BAM index detected.\n")
sys.exit(-1)

# Load mappability file
READLENGTH, REGIONMAP = loadMap(MAP)
print("Mappability file loaded")
Expand All @@ -203,7 +203,7 @@ def validateBAM(bam):
print("Genomic coordinate coverage calculated")

# Calculate log2 tag enrichment over median of mappability-normalized tag avg per region
SCORE = calculateDeletion(PASS)
SCORE = calculateDeletion(PASS)
print("Depletion calculated")

# Output final data
Expand All @@ -215,7 +215,7 @@ def validateBAM(bam):
for id,score in reversed(FINAL):
try:
if float(score) < OUTPUTTHRESH:
output.write(id + "\t" + str(score) + "\n")
output.write(id + "\t" + str(score) + "\n")
else:
break
except(ValueError):
Expand Down
40 changes: 20 additions & 20 deletions DeletionID/identify-Deletion.sh
Original file line number Diff line number Diff line change
@@ -1,39 +1,39 @@
#!/bin/bash

# Required software:
# python v2.15 with scipy
# python3 with scipy

usage()
{
echo 'identify-Deletion.sh -i /path/to/BAM -o /path/to/output -d /path/to/genome/database'
echo 'eg: bash identify-Deletion.sh -i /input -o /output -d /sacCer3_Del'
exit
echo 'identify-Deletion.sh -i /path/to/BAM -o /path/to/output -d /path/to/genome/database'
echo 'eg: bash identify-Deletion.sh -i /input -o /output -d /sacCer3_Del'
exit
}

if [ "$#" -ne 6 ]; then
usage
usage
fi

while getopts ":i:o:d:" IN; do
case "${IN}" in
i)
INPUT=${OPTARG}
;;
o)
OUTPUT=${OPTARG}
;;
d)
DATABASE=${OPTARG}
;;
*)
usage
;;
esac
case "${IN}" in
i)
INPUT=${OPTARG}
;;
o)
OUTPUT=${OPTARG}
;;
d)
DATABASE=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))

if [ -z "${INPUT}" ] || [ -z "${OUTPUT}" ] || [ -z "$DATABASE" ]; then
usage
usage
fi

echo "Input folder = ${INPUT}"
Expand Down
12 changes: 6 additions & 6 deletions EpitopeID/epiScripts/calculate_EpitopeSignificance.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,20 @@
print("No Pvalue input!!!")
sys.exit(usage)
elif COUNT == "":
print("No Single-end epitope counts input!!!")
sys.exit(usage)
print("No Single-end epitope counts input!!!")
sys.exit(usage)
elif SIZE == "":
print("No Genome-size input!!!")
sys.exit(usage)
print("No Genome-size input!!!")
sys.exit(usage)
if OUT == "":
OUT = os.path.splitext(os.path.basename(BAM))[0] + ".tab"
OUT = os.path.splitext(os.path.basename(BAM))[0] + ".tab"

# Minimum fold enrichment over background
MINFOLD = 2;

# Open output file for writing
output = open(OUT, "w")
# open PE_table
# open PE_table
file = open(TABLE, "r")

TABLE = []
Expand Down
2 changes: 1 addition & 1 deletion EpitopeID/epiScripts/count_raw_epitope.pl
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

open(OUT, ">$output") or die "Can't open $output for writing!\n";
if($#SORT == -1) {
print OUT "EpitopeID\tEpitopeCount\nNo Tag ID'd\n";
print OUT "EpitopeID\tEpitopeCount\nNo Tag ID'd\n";
} else {
print OUT "EpitopeID\tEpitopeCount\n";
for($x = 0; $x <= $#SORT; $x++) {
Expand Down
6 changes: 3 additions & 3 deletions EpitopeID/epiScripts/sum_PE_epitope-alignment.pl
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
chomp($line);
next if((substr $line, 0, 1) eq "@");
@array = split(/\t/, $line);

# Set predicted terminus of epitope
$LOC = "C-term";
if($array[5] eq "+" && $array[18] eq "-") { $LOC = "N-term"; }
Expand All @@ -56,12 +56,12 @@

open(OUT, ">$output") or die "Can't open $output for writing!\n";
if($#SORT == -1) {
print OUT "Epitope could not be detected genomically\n";
print OUT "Epitope could not be detected genomically\n";
} else {
for($x = 0; $x <= $#SORT; $x++) {
@temparray = split(/\~/, $SORT[$x]{'id'});
for($y = 0; $y <= $#temparray; $y++) { print OUT "$temparray[$y]\t" }
print OUT "$SORT[$x]{'count'}\n";;
print OUT "$SORT[$x]{'count'}\n";;
}
}
close OUT;
Loading