-
-
Notifications
You must be signed in to change notification settings - Fork 201
BLAST SEG masking
Benjamin J. Buchfink edited this page Jun 26, 2026
·
8 revisions
It is a little known fact that NCBI BLAST in blastp mode applies SEG low complexity masking to
the target sequences by default. I asked GPT 5.5, Opus 4.8, Gemini and Groq if that was the case
and they all told me no!
Let us look at it based on one example:
>NP_000016.1 beta-3 adrenergic receptor [Homo sapiens]
MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEAALAGALLALAVLATVGGNLLVIVAIAWTPRLQTMTNVFVTSLA
AADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLRYGALVTKRCARTAVVL
VWVVSAAVSFAPIMSQWWRVGADAEAQRCHSNPRCCAFASNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQLRLLRG
ELGRFPPEESPPAPSRSLAPAPVGTCAPPEGVPACGRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGP
SLVPGPAFLALNWLGYANSAFNPLIYCRSPDFRSAFRRLLCRCGRRLPPEPCAAARPALFPSGVPAARSSPAQPRLCQRL
DGASWGVS
Now let us run the SEG masker with these (more conservative than default) settings:
$ segmasker -in NP_000016.1.faa -window 10 -locut 1.8 -hicut 2.1 -outfmt fasta
>NP_000016.1 beta-3 adrenergic receptor [Homo sapiens]
MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEaalagallalavlaTVGGNLLVIV
AIAWTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLC
VTASIETLCALAVDRYLAVTNPLRYGALVTKRCARTAVVLVWVVSAAVSFAPIMSQWWRV
GADAEAQRCHSNPRCCAFASNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQLRLLRG
ELGRFppeesppapsRSLAPAPVGTCAPPEGVPACGRRPARLLPLREHRALCTLGLIMGT
FTLCWLPFFLANVLRALGGPSLVPGPAFLALNWLGYANSAFNPLIYCRSPDFRSAFrrll
crcgrrlPPEPCAAARPALFPSGVPAARSSPAQPRLCQRLDGASWGVS
We can see the masking indicated by lower case letters.
Lessons learned: Things can happen inside bioinformatics tools that are not written in any paper or any documentation or anywhere. Sometimes the developer is the only person in the world who knows about them. Respect the people that understand the tools that billions of dollars worth' of science are being built on top of.