Skip to content

Project in the course MOL3022 - Bioinformatics - Method Oriented Project

Notifications You must be signed in to change notification settings

EmmaPeders1/MOL3022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOL3022 - SVE Protein Alignment Tool

The SVE (Solveig, Vebjørn and Emma) Protein Alignment Tool is a tool that finds the best protein alignments for a user-given query sequence. The project is made for the course MOL3022 - Bioinformatics - Method Oriented Project and consists of a Python script with a Streamlit-based front-end.

How to run the project

  1. Download protein sequences in FASTA (canonical) format. We recommend downloading protein sequences from UniProtKnowledgeBase by clicking the "Download" button. You may either download all files or choose some yourself. If you download the files as compressed make sure you unzip the files after download.

  1. Create a folder named uniprot_fasta in this repository and put your fasta file in it. Make sure the fasta file is named uniprot_sprot.fasta which is the default naming when downloading from UniProtKB.

  2. Open a terminal and use the command pip install -r requirements.txt. Make sure your terminal is in the MOL3022 folder.

  3. Run the project using streamlit run main.py in your terminal.

Expected results

When running the project using Streamlit your browser should open and connect to localhost:8051. (or a different port if 8501 is in use) After loading you will be met by the following page:

Here you can choose how many sequences you want to compare alignment to. This is useful when you have a lot of sequences (such as if you download all sequences from UniProtKB) as comparing alignment to all of them can take a lot of time and may not be necessary when just testing the software. The software also limits you to one best alignment from each sequence. This is because one sequence can give huge amounts of possible best alignments and this scales exponentially with the amount of alignments you compare to. By giving one alignment for each sequence, a user can instead look from there at which sequence they would like to study further.

After selecting the amount of sequences you want to compare alignments to you can choose to input a query sequence or use the default that is already input and then click the Find closest alignment button.

The software will then begin analyzing your query sequence to each other sequence and the progress bar indicates how many sequences it has completed alignment to compared to how many are left.

After the best alignments (from unique sequences) have been found they are presented as a list.

Scoring

Our tool uses BLOSUM62 as our scoring matrix. BLOSUM is widely regarded as a good scoring matrix for detecting most weak protein similarties.

About

Project in the course MOL3022 - Bioinformatics - Method Oriented Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages