Skip to content

drugilsberg/uniprot_fasta_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status codecov Updates License: MIT PyPI version Codacy Badge

uniprot_fasta_parser

UniProt FASTA parser written in pure python.

Development setup

Create a venv:

python -m venv venv

Activate it:

source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Install the package in editable mode:

pip install -e .

Install jupiter playground:

pip install jupyter
ipython kernel install --user --name=uniprot_fasta_parser

Tutorial on converting FASTA sequences into CSV format

Get the latest FASTA from UniProt SwissProt:

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

The script upfp-fasta-to-csv (installed with upfp) can be used.

upfp-fasta-to-csv -h
usage: upfp-fasta-to-csv [-h] [-g] [-c CHUNK_SIZE] fasta_filepath csv_filepath

positional arguments:
  fasta_filepath        path to the FASTA file.
  csv_filepath          path where to store the CSV file.

optional arguments:
  -h, --help            show this help message and exit
  -g, --gzipped         flag to indicate whether the FASTA is gzipped.
                        Defaults to False.
  -c CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        size of the chunks used when writing the CSV file.
                        Defaults to 10000.

Provide as input the downloaded gzipped FASTA file and convert it to CSV:

upfp-fasta-to-csv uniprot_sprot.fasta.gz /path/to/file.csv -g

Revert CSV to FASTA

You might want to recreate FASTA format from a CSV resulting from upfp with the script upfp-csv-to-fasta.

upfp-csv-to-fasta -h  
usage: upfp-csv-to-fasta [-h] [-g] [-c CHUNK_SIZE] csv_filepath fasta_filepath

positional arguments:
  csv_filepath          path to the CSV file or SMI file.
  fasta_filepath        path where to store the FASTA file

optional arguments:
  -h, --help            show this help message and exit
  -g, --gzipped         flag to indicate whether the FASTA should be gzipped.
                        Defaults to False.
  -c CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        size of the chunks used when writing the FASTA file.
                        Defaults to 10000.