MOL3022 - SVE Protein Alignment Tool

The SVE (Solveig, Vebjørn and Emma) Protein Alignment Tool is a tool that finds the best protein alignments for a user-given query sequence. The project is made for the course MOL3022 - Bioinformatics - Method Oriented Project and consists of a Python script with a Streamlit-based front-end.

How to run the project

Download protein sequences in FASTA (canonical) format. We recommend downloading protein sequences from UniProtKnowledgeBase by clicking the "Download" button. You may either download all files or choose some yourself. If you download the files as compressed make sure you unzip the files after download.

Create a folder named uniprot_fasta in this repository and put your fasta file in it. Make sure the fasta file is named uniprot_sprot.fasta which is the default naming when downloading from UniProtKB.
Open a terminal and use the command pip install -r requirements.txt. Make sure your terminal is in the MOL3022 folder.
Run the project using streamlit run main.py in your terminal.

Expected results

When running the project using Streamlit your browser should open and connect to localhost:8051. (or a different port if 8501 is in use) After loading you will be met by the following page:

Here you can choose how many sequences you want to compare alignment to. This is useful when you have a lot of sequences (such as if you download all sequences from UniProtKB) as comparing alignment to all of them can take a lot of time and may not be necessary when just testing the software. The software also limits you to one best alignment from each sequence. This is because one sequence can give huge amounts of possible best alignments and this scales exponentially with the amount of alignments you compare to. By giving one alignment for each sequence, a user can instead look from there at which sequence they would like to study further.

After selecting the amount of sequences you want to compare alignments to you can choose to input a query sequence or use the default that is already input and then click the Find closest alignment button.

The software will then begin analyzing your query sequence to each other sequence and the progress bar indicates how many sequences it has completed alignment to compared to how many are left.

After the best alignments (from unique sequences) have been found they are presented as a list.

Scoring

Our tool uses BLOSUM62 as our scoring matrix. BLOSUM is widely regarded as a good scoring matrix for detecting most weak protein similarties.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOL3022 - SVE Protein Alignment Tool

How to run the project

Expected results

Scoring

About

Releases

Packages

Contributors 3

Languages

EmmaPeders1/MOL3022

Folders and files

Latest commit

History

Repository files navigation

MOL3022 - SVE Protein Alignment Tool

How to run the project

Expected results

Scoring

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages