Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
4928fdd
WIP: adding some comments for myself for future reference.
Mar 5, 2025
4b04b0c
Merge branch 'main' into FoldClinicalIn
Mar 5, 2025
76d2a99
A few changes, and just starting to add some test cases.
Mar 6, 2025
fb7eb1e
WIP: some more tweaks/visual fixes.
Mar 7, 2025
7278841
WIP: more (slow) minor changes sifting through the code.
Mar 8, 2025
79ec805
WIP: continuing to refactor code. THIS IS NOT FUNCTIONAL CODE
Mar 12, 2025
8f2691d
WIP: more restructuring and reworking.
Mar 13, 2025
d8b4c6b
WIP: more restructuring/refactoring.
Mar 14, 2025
9938899
More progress on refactor/rewriting.
Mar 15, 2025
a3c673c
Some more refactoring.
Mar 17, 2025
c723c2e
All tests are currently passing.
Mar 18, 2025
aa46ec9
Remove deprecated stuff from the typing module.
Mar 18, 2025
3a68ac5
WIP: saving some progress before switching branches to see how much I…
Mar 18, 2025
5ccd964
WIP: some progress in refactoring; added back some optimizations to c…
Mar 20, 2025
437d67d
WIP: some more restructuring/refactoring.
Mar 21, 2025
8fa2b3e
WIP: some more refactoring of code, specifically the main "run" method.
Mar 22, 2025
f6a6f59
Refactored some code into a "presentation layer".
Mar 25, 2025
1e4fd39
Fixing/updating tests; made some methods static where possible.
Mar 26, 2025
987eb7b
Changed the way protein pairs are represented internally; updated tes…
Mar 27, 2025
d9d274b
Minor refactoring continues.
Mar 27, 2025
dd0c5aa
Completed testing for models.py (for now).
Mar 28, 2025
f130636
Some refactoring of tests. Stuck on reworking the tests for pad_short.
Mar 29, 2025
a7c67f1
Reworked the testing of pad_short (and calc_padding); fixed an error …
Apr 2, 2025
da5ef44
WIP: minor change here, refactoring some tests.
Apr 2, 2025
68a4f24
Slugged through tests for pair_exons_helper.
Apr 3, 2025
4285ccb
WIP: breaking ground on pair_exons tests.
Apr 4, 2025
24b5125
Finished testing pair_exons; some more small refactoring.
Apr 5, 2025
e7a77a2
Reworked get_mismatches and added testing for it.
Apr 8, 2025
a8b7f77
Some more refactoring, some more testing.
Apr 9, 2025
1cfa4b0
Added some more test coverage to reach some very hard-to-reach areas.
Apr 10, 2025
ba250e4
Finished the "load HLA standards" tests.
Apr 10, 2025
0f5b332
WIP: started adding tests for read_hla_frequencies.
Apr 10, 2025
eb26961
Added some fixmes and did a bit of code cleanup.
Apr 10, 2025
3b64ea1
WIP: started adding check_length tests for HLA B and C.
Apr 11, 2025
3e2db4d
Finished testing the "library", then added some more methods that nee…
Apr 12, 2025
24b950c
Added "distance to B*57:01" functionality and tests.
Apr 15, 2025
8f159eb
Cleaned up some imports/commented-out imports.
Apr 15, 2025
fa141e3
Incorporating most of David's commit suggestions.
Apr 15, 2025
97be2a7
WIP: started translating the original Ruby script for updating the HL…
Apr 17, 2025
af7463b
Started work on the "update alleles from IMGTHLA" script.
Apr 18, 2025
0e8fcac
Some more refactoring; moved code from a "driver" to the "library".
Apr 24, 2025
dd8bbba
Added tests for the functionality moved from update_alleles into utils.
Apr 24, 2025
ae2514e
Added the SQL(-ish) files used by hla.rb.
Apr 26, 2025
4a7198d
Some light refactoring of hla.rb for readability as I learn what it's…
Apr 26, 2025
c7216c8
Some refactoring and a start on translating the clinical HLA driver.
Apr 30, 2025
06b6f0b
WIP: started adding some stuff with an eye towards database access fo…
May 2, 2025
1bd2b7e
WIP: some progress on the clinical HLA driver.
May 3, 2025
78f1fc3
First pass at the clinical HLA script is complete.
May 7, 2025
a0b10a0
WIP: some refactoring and minor restructuring.
May 8, 2025
664de9e
WIP: fixed tests broken from previous commits, and *almost* started w…
May 9, 2025
83982fa
WIP: first pass of overhauling the "best common allele pair" method o…
May 9, 2025
8dbfd4a
WIP: reworked the code that chooses which combined standard to draw m…
May 10, 2025
59d42f7
Finished testing clinical_hla_lib.
May 14, 2025
eac8e18
Reworked the way mismatches are reported in bblab_lib. Added some te…
May 15, 2025
8148120
Finished writing tests for bblab_lib.
May 15, 2025
ea7bd4e
Added an old Ruby file for reference.
May 16, 2025
52eaa01
WIP: first pass at a Ruby "adaptor".
May 17, 2025
4603d80
hla_algorithm_adaptor.py is working on two test datasets.
May 21, 2025
d2f9ed0
WIP: refactored the Ruby adaptor and wrote tests.
May 22, 2025
e0a5ed9
Fixed tests for interpret_from_json_lib (renamed from ruby_adaptor) a…
May 23, 2025
d28ce8a
WIP: some tweaks to devcontainer.json and pyproject.toml.
May 24, 2025
60a8366
Some changes to the devcontainer configuration; the Ruby adaptor pass…
May 26, 2025
c533935
WIP: refactoring the stored HLA standard data to be in a YAML file ra…
May 28, 2025
a139acc
WIP: a first pass of updating for the changes in the EasyHLA object.
May 29, 2025
8ed2114
WIP: some polish on how the standards are stored and loaded.
May 29, 2025
8cd9d74
Fixed all existing tests and removed some seemingly unnecessary "conf…
May 31, 2025
a029519
Finished testing, and started work on a script to update hla_frequenc…
Jun 3, 2025
8254fc4
WIP: examining the conversion from old frequency file format to new.
Jun 5, 2025
b9a30ff
Finished testing the library-side changes relating to the new frequen…
Jun 6, 2025
9bb6ee2
WIP: cleaned up the update_frequency_file script and started adding t…
Jun 7, 2025
68ac01c
WIP: A little more test coverage.
Jun 24, 2025
4ab6857
All tests passing.
Jun 24, 2025
7e68912
Updated the hla_frequencies.csv file to the new format (changing "unk…
Jun 24, 2025
1dd6d93
Cleaned up some unused imports and cleaned up the clinical HLA script…
Jun 25, 2025
f4ccc0b
Some changes to allow a sensible sorting of allele pairs.
Jun 26, 2025
d377fc8
WIP: some starting rumblings of a Django app to replace the current b…
Jul 19, 2025
b4f98c9
A number of minor fixes.
Jul 24, 2025
eea16fa
WIP: some fleshing out of the Django models.
Jul 25, 2025
204d027
Changed combined_standards_helper into a generator; added tests.
Jul 30, 2025
3c3b30f
Removed the Django app framework (it's been moved to a different proj…
Jul 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Base image for PyEasyHLA development.

# This image is the basis of the project's devcontainer as well.

ARG PYTHON_VERSION="3.13.3-bookworm"

FROM python:${PYTHON_VERSION} AS base

RUN apt update -y && apt upgrade -y

# Install the vendored software.
ARG INSTANTCLIENT_BASIC="instantclient-basic-linux.x64-23.7.0.25.01.zip"
# The value of ORACLE_HOME depends on the instant client used as it will
# install to a path like ".../instantclient_23_7" where the version numbers
# will vary.
ARG ORACLE_HOME="/opt/oracle/instantclient_23_7"

ENV ORACLE_HOME=${ORACLE_HOME} \
LD_LIBRARY_PATH=${ORACLE_HOME}:$LD_LIBRARY_PATH

COPY vendor/${INSTANTCLIENT_BASIC} /tmp/vendor/
RUN unzip /tmp/vendor/${INSTANTCLIENT_BASIC} -d /opt/oracle &&\
rm -rf /tmp/vendor
21 changes: 18 additions & 3 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/ubuntu
{
"image": "python:3.10-slim",
"image": "python:3.13.3-bookworm",
"features": {
"ghcr.io/devcontainers/features/git:1": {
"ppa": true,
Expand All @@ -19,7 +19,9 @@
"updateRemoteUserUID": true,
"containerEnv": {
"WORKON_HOME": "${containerWorkspaceFolder}/.venv",
"PYTHONPATH": "${containerWorkspaceFolder}"
"PYTHONPATH": "${containerWorkspaceFolder}",
"ORACLE_HOME": "/opt/oracle/instantclient",
"LD_LIBRARY_PATH": "/opt/oracle/instantclient"
},
"onCreateCommand": {
"update apt": [
Expand All @@ -38,7 +40,20 @@
"install",
"--no-install-recommends",
"-y",
"openssh-client"
"openssh-client",
"libaio-dev",
"unzip"
],
"install Instant Client": [
"/usr/bin/env",
"bash",
"${containerWorkspaceFolder}/.devcontainer/install_instant_client.bash"
],
"install Ruby": [
"/usr/bin/apt",
"install",
"-y",
"ruby"
]
},
"postAttachCommand": {
Expand Down
7 changes: 7 additions & 0 deletions .devcontainer/install_instant_client.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#! /usr/bin/env bash

mkdir /tmp/vendor
cp vendor/instantclient*.zip /tmp/vendor/
unzip /tmp/vendor/instantclient*.zip -d /opt/oracle
ln -s /opt/oracle/instantclient_* /opt/oracle/instantclient
rm -rf /tmp/vendor
62 changes: 62 additions & 0 deletions original_ruby_scripts/01.5_hla_reduce_by_g.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Dir.chdir "config"
['a','b','c'].each do |letter|
data = []
source_filename = "hla_#{letter}_std.csv"
puts "reading #{source_filename}..."
File.open(source_filename) do |file|
file.each_line do |line|
data << line.strip.split(',')
end
end

data.sort! do |a,b|
tmpa = a[0].split(':')
tmpb = b[0].split(':')
(tmpa[0] != tmpb[0] ? tmpa[0].split('*')[1].to_i <=> tmpb[0].split('*')[1].to_i :
(tmpa[1] != tmpb[1] ? tmpa[1].to_i <=> tmpb[1].to_i :
(tmpa[2] != tmpb[2] ? tmpa[2].to_i <=> tmpb[2].to_i :
tmpa[3].to_i <=> tmpb[3].to_i )))
end

#File.open("hla_std_#{letter}_test.csv", 'w') do |file|
#data.each do |d|
#file.puts d.join(',')
#end
#end

data.each_with_index do |orig, i|
next if(orig == nil)
orig_allele = orig[0]
orig_seq = orig[1 .. 2]
match_count = 0
data.each_with_index do |row, j|
next if(j <= i or row == nil)
if(orig_seq == row[1 .. 2])
match_count += 1
puts "Reducing #{row[0]} into #{orig_allele}"
data[j] = nil
end
end
if(match_count > 0)
gallele = orig_allele.split(':')
gallele = gallele[0 .. 2] if(gallele.size > 3)
gallele = gallele.join(':') + 'G'
puts "Turning #{orig_allele} into #{gallele}"
orig[0] = gallele
end

end

data.delete(nil)

reduced_filename = "hla_#{letter}_std_reduced.csv"
puts "writing #{reduced_filename}"
File.open(reduced_filename, 'w') do |file|
data.each do |d|
file.puts d.join(',')
end
end

File.delete(source_filename)
system('zip', "hla_#{letter}_std_reduced.zip", reduced_filename)
end
176 changes: 176 additions & 0 deletions original_ruby_scripts/01_hla_fasta_parse.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
#!/usr/bin/ruby
#
# This program downloads the newest version of hla_nuc.fasta and then
# processes it, creating three standard files for a, b, and c. Any sequence
# that doesn't seem to align properly will be rejected.
# The alignment algorithm I'm using is the dirtiest possible.
# This program requires the bioruby library


#require 'bio'
require 'fileutils'
require 'net/http'
require 'uri'
require 'date'
require 'digest'
require 'json'

#scores the sequence according to how many characters don't match. An
#alignment of 0 means a perfect match. We optimize it by assuming anything under 20 is probably
#a match.
def score(seq, align)
maxscore = align.size
maxseq = -1;

0.upto(seq.size - align.size) do |i|
score = align.size
0.upto(align.size - 1) do |j|
if(align[j] == seq[i + j])
score -= 1
end
end

if(score < maxscore)
maxscore = score
maxseq = i
if(maxscore < 20)
return [maxscore, seq[maxseq, align.size]]
end
end
end
if(maxscore > 30)
# puts maxscore
# puts align
# puts seq[maxseq, align.size]
end
return [maxscore, seq[maxseq, align.size]]
end

#Exon sequences to compare against(used for scoring)
a_exon2_align='GCTCCCACTCCATGAGGTATTTCTTCACATCCGTGTCCCGGCCCGGCCGCGGGGAGCCCCGCTTCATCGCCGTGGGCTACGTGGACGACACGCAGTTCGTGCGGTTCGACAGCGACGCCGCGAGCCAGAAGATGGAGCCGCGGGCGCCGTGGATAGAGCAGGAGGGGCCGGAGTATTGGGACCAGGAGACACGGAATATGAAGGCCCACTCACAGACTGACCGAGCGAACCTGGGGACCCTGCGCGGCTACTACAACCAGAGCGAGGACG'
a_exon3_align='GTTCTCACACCATCCAGATAATGTATGGCTGCGACGTGGGGCCGGACGGGCGCTTCCTCCGCGGGTACCGGCAGGACGCCTACGACGGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCTTGGACCGCGGCGGACATGGCAGCTCAGATCACCAAGCGCAAGTGGGAGGCGGTCCATGCGGCGGAGCAGCGGAGAGTCTACCTGGAGGGCCGGTGCGTGGACGGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACGG'
b_exon2_align='GCTCCCACTCCATGAGGTATTTCTACACCTCCGTGTCCCGGCCCGGCCGCGGGGAGCCCCGCTTCATCTCAGTGGGCTACGTGGACGACACCCAGTTCGTGAGGTTCGACAGCGACGCCGCGAGTCCGAGAGAGGAGCCGCGGGCGCCGTGGATAGAGCAGGAGGGGCCGGAGTATTGGGACCGGAACACACAGATCTACAAGGCCCAGGCACAGACTGACCGAGAGAGCCTGCGGAACCTGCGCGGCTACTACAACCAGAGCGAGGCCG'
b_exon3_align='GGTCTCACACCCTCCAGAGCATGTACGGCTGCGACGTGGGGCCGGACGGGCGCCTCCTCCGCGGGCATGACCAGTACGCCTACGACGGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCCGCGGACACGGCGGCTCAGATCACCCAGCGCAAGTGGGAGGCGGCCCGTGAGGCGGAGCAGCGGAGAGCCTACCTGGAGGGCGAGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGACAAGCTGGAGCGCGCTG'
c_exon2_align='GCTCCCACTCCATGAAGTATTTCTTCACATCCGTGTCCCGGCCTGGCCGCGGAGAGCCCCGCTTCATCTCAGTGGGCTACGTGGACGACACGCAGTTCGTGCGGTTCGACAGCGACGCCGCGAGTCCGAGAGGGGAGCCGCGGGCGCCGTGGGTGGAGCAGGAGGGGCCGGAGTATTGGGACCGGGAGACACAGAAGTACAAGCGCCAGGCACAGACTGACCGAGTGAGCCTGCGGAACCTGCGCGGCTACTACAACCAGAGCGAGGCCG'
c_exon3_align='GGTCTCACACCCTCCAGTGGATGTGTGGCTGCGACCTGGGGCCCGACGGGCGCCTCCTCCGCGGGTATGACCAGTACGCCTACGACGGCAAGGATTACATCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCCGCGGACACCGCGGCTCAGATCACCCAGCGCAAGTGGGAGGCGGCCCGTGAGGCGGAGCAGCGGAGAGCCTACCTGGAGGGCACGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCGCGG'

Dir.chdir "config"
filename = 'hla_nuc.fasta'
timestamp_path = filename + '.mtime'

repo_path = "https://raw.githubusercontent.com/ANHIG/IMGTHLA"

# Find latest release at https://github.com/ANHIG/IMGTHLA/releases
hla_nuc_version = File.read('hla_nuc.fasta.version.txt').strip
puts "attempting to download version #{hla_nuc_version} from #{repo_path}"

uri = URI.parse("#{repo_path}/#{hla_nuc_version}/hla_nuc.fasta")
response = Net::HTTP.get_response(uri)
md5 = Digest::MD5.new
Net::HTTP.get_response(uri) do |response|
response.value # Raise error if not 200 response code.
File.open(filename, 'w') do |file|
response.read_body do |segment|
file.write(segment)
md5 << segment
end
end
end
checksum_report = md5.hexdigest + ' ' + filename + "\n"
File.write('hla_nuc.fasta.checksum.txt', checksum_report)
puts "Parsing " + filename;

hla_a = []
hla_b = []
hla_c = []

diff_reject = 32


#file = Bio::FastaFormat.open(filename)
fasta = []
enu=[]
File.open(filename) do |file|
file.each_line do |line|
if(line =~ /^>/)
fasta.push(enu)
enu = [line.strip, '']
else
enu[1] += line.strip
end
end
fasta.push(enu)
end

fasta.delete_if{|e| e== []}

bar_width = 50
fasta.each_with_index do |entry, index| #for each fasta sequence
#title = entry.definition[entry.definition.index(' ') + 1 .. entry.definition.size]
title = entry[0].split(' ')[1]
type = title[0, 1]
data = entry[1]
progress_bar = ('#' * (index.to_f/fasta.length*bar_width) + '.' * bar_width)
progress_bar = progress_bar[0...bar_width]

data.tr!("\n\t\r ", '') #get rid of whitespace
message = ''

if(type == 'A')
exon2 = score(data, a_exon2_align)
exon3 = score(data, a_exon3_align)
if(exon2[0] <= diff_reject and exon3[0] <= diff_reject)
message = "Approving " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
hla_a.push( [title, exon2[1], exon3[1]] )
else
message = "***Rejecting " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
end
elsif(type == 'B')
exon2 = score(data, b_exon2_align)
exon3 = score(data, b_exon3_align)
if(exon2[0] <= diff_reject and exon3[0] <= diff_reject)
message = "Approving " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
hla_b.push( [title, exon2[1], exon3[1]] )
else
message = "***Rejecting " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
end
elsif(type == 'C')
exon2 = score(data, c_exon2_align)
exon3 = score(data, c_exon3_align)
if(exon2[0] <= diff_reject and exon3[0] <= diff_reject)
message = "Approving " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
hla_c.push( [title, exon2[1], exon3[1]] )
else
message = "***Rejecting " + title + ": " + exon2[0].to_s + " " + exon3[0].to_s
end
end
print "\r" + progress_bar + ' ' + message + ". "
end
puts "\r" + '#' * bar_width + ' Completed.' + ' ' * 18

#file.close

#Lets sort, just to make things easier for our eyes
hla_a.sort!
hla_b.sort!
hla_c.sort!

File.open('hla_a_std.csv', 'w') do |file|
hla_a.each do |entry|
file.puts entry[0] + "," + entry[1] + "," + entry[2]
end
end

File.open('hla_b_std.csv', 'w') do |file|
hla_b.each do |entry|
file.puts entry[0] + "," + entry[1] + "," + entry[2]
end
end

File.open('hla_c_std.csv', 'w') do |file|
hla_c.each do |entry|
file.puts entry[0] + "," + entry[1] + "," + entry[2]
end
end

File.delete(filename)
Loading