Skip to content

BLAST SEG masking

Benjamin J. Buchfink edited this page Jun 26, 2026 · 8 revisions

It is a little known fact that NCBI BLAST in blastp mode applies SEG low complexity masking to the target sequences by default. I asked GPT 5.5, Opus 4.8, Gemini and Groq if that was the case and they all told me no!

Let us look at it based on one example:

>NP_000016.1 beta-3 adrenergic receptor [Homo sapiens]
MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEAALAGALLALAVLATVGGNLLVIVAIAWTPRLQTMTNVFVTSLA
AADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLRYGALVTKRCARTAVVL
VWVVSAAVSFAPIMSQWWRVGADAEAQRCHSNPRCCAFASNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQLRLLRG
ELGRFPPEESPPAPSRSLAPAPVGTCAPPEGVPACGRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGP
SLVPGPAFLALNWLGYANSAFNPLIYCRSPDFRSAFRRLLCRCGRRLPPEPCAAARPALFPSGVPAARSSPAQPRLCQRL
DGASWGVS

Now let us run the SEG masker with these (more conservative than default) settings:

$ segmasker -in NP_000016.1.faa -window 10 -locut 1.8 -hicut 2.1 -outfmt fasta
>NP_000016.1 beta-3 adrenergic receptor [Homo sapiens]
MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEaalagallalavlaTVGGNLLVIV
AIAWTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLC
VTASIETLCALAVDRYLAVTNPLRYGALVTKRCARTAVVLVWVVSAAVSFAPIMSQWWRV
GADAEAQRCHSNPRCCAFASNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQLRLLRG
ELGRFppeesppapsRSLAPAPVGTCAPPEGVPACGRRPARLLPLREHRALCTLGLIMGT
FTLCWLPFFLANVLRALGGPSLVPGPAFLALNWLGYANSAFNPLIYCRSPDFRSAFrrll
crcgrrlPPEPCAAARPALFPSGVPAARSSPAQPRLCQRLDGASWGVS

We can see the masking indicated by lower case letters.

Lessons learned: Things can happen inside bioinformatics tools that are not written in any paper or any documentation or anywhere. Sometimes the developer is the only person in the world who knows about them. Respect the people that understand the tools that billions of dollars worth' of science are being built on top of.

Clone this wiki locally