Skip to content

cfe-lab/codeclub1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeClub1 Program1

This program extracts the vif gene sequences from a set of HIV sequences by aligning them to the HXB2 reference genome.

What it does

  • Reads the HXB2 reference sequence from data/hxb2.fasta
  • Reads multiple query sequences from data/sequences.fasta
  • For each query sequence:
    • Performs a global alignment against HXB2
    • Maps the vif gene coordinates (positions 5243-5619 in HXB2) to the query sequence
    • Extracts and prints the corresponding vif sequence in FASTA format

How to run

  1. Ensure you have uv installed.

  2. Navigate to the project root directory.

  3. Run the program:

    uv run program1

The output will be printed to stdout, with each vif sequence in FASTA format.

Output format

For each input sequence, the program outputs:

  • A FASTA header: >{sequence_id}_vif
  • The extracted vif sequence

Notes

  • The program uses global pairwise alignment with BioPython's PairwiseAligner.
  • Coordinate mapping is handled by the aligntools library.
  • The program skips sequences where the vif region cannot be successfully mapped.

About

Repository for CFE's first code club session

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published