Skip to content

Vaccitech/HLAfreq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HLAfreq

HLAfreq allows you to download and combine HLA allele frequencies from multiple datasets, e.g. combine data from several studies within a country or combine countries. Useful for studying regional diversity in immune genes and, when paired with epitope prediction, estimating a population's ability to mount an immune response to specific epitopes.

Automated download of allele frequency data download from allele frequencies.net.

Details

Estimates are combined by modelling allele frequency as a Dirichlet distribution which defines the probability of drawing each allele. When combining studies their estimates are weighted as 2x sample size by default. Sample size is doubled as each person in the study contributes two alleles. Alternative weightings can be used, for example population size when averaging across countries.

When selecting a panel of HLA alleles to represent a population, allele frequency is not the only thing to consider. Depending on the purpose of the panel, you should include a range of loci and supertypes (groups alleles sharing binding specificies).

Install

HLAfreq is a python package available on windows, mac, and linux. We recommend installing with conda.

conda create -n hlafreq -c bioconda -c conda-forge hlafreq
conda activate hlafreq

If you're new to conda see the miniconda installation guide and documentation to get started with conda. Enter the above command into your conda prompt to create and activate a conda environment with HLAfreq installed. Typing python into this activated environment will start a python session where you can enter your python code such as the HLAfreq minimal example below.

If you prefer to write your python code as scripts using an IDE such as PyCharm or VScode, you'll need to look up how to configure a conda virtual environment with those tools.

Troubleshooting

HLAfreq uses pymc to estimate credible intervals, which is the source of most installation difficulty, see pymc installation guide.

At time of writing pymc doesn't play nice with python 3.11, so you can try installing a specific python version and then add HLAfreq with pip or conda. For example

conda create -n hlafreq -c conda-forge -c bioconda python=3.10 numpy=1.25.2 pymc=5.6.1 hlafreq

HLAfreq requires python>=3.8, matplotlib>=3.5, and pymc>=3. Conda should handle this automatically, but if you get errors check the package versions with conda list.

If you do run into trouble please open an issue.

If you don't intend to use credible intervals you can install with pip: pip install HLAfreq. However, if you do import HLAfreq_pymc you may get warnings about degraded performance.

See the pip documentation to get started with pip. If you do have issues with pip, try installing with conda as described above.

Minimal example

Download HLA data using makeURL() and getAFdata(). All arguments that can be specified in the webpage form are available, see help(HLAfreq.makeURL) for details (press q to exit).

import HLAfreq
base_url = HLAfreq.makeURL("Uganda", locus="A")
aftab = HLAfreq.getAFdata(base_url)

After downloading the data, it must be filtered so that all studies sum to allele frequency 1 (within tolerence). Then we must ensure that all studies report alleles at the same resolution. Finaly we can combine frequency estimates.

aftab = HLAfreq.only_complete(aftab)
aftab = HLAfreq.decrease_resolution(aftab, 2)
caf = HLAfreq.combineAF(aftab)

Detailed examples

For more detailed walkthroughs see HLAfreq/examples.

Docs

For help on specific functions view the docstring, help(function_name). Full documentation API at HLAfreq/docs created with pdoc3 in pdf mode.

Citation

Wells, D. A., & McAuley, M. (2023). HLAfreq: Download and combine HLA allele frequency data. bioRxiv, 2023-09. https://doi.org/10.1101/2023.09.15.557761

About

Aggregate HLA allele frequencies data from allelefrequencies.net at large multi population scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%