Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain IDs reduced to first 3 characters #215

Closed
LeeryScientist opened this issue Jun 11, 2020 · 3 comments · Fixed by #216
Closed

Chain IDs reduced to first 3 characters #215

LeeryScientist opened this issue Jun 11, 2020 · 3 comments · Fixed by #216

Comments

@LeeryScientist
Copy link

Version: 0.22.0
The mmCIF and MMTF parsers only keep the first 3 characters of chain IDs.
This makes handling structures like 6n1d impractical, since information is lost
and duplicate chain IDs are produced.

@padix-key
Copy link
Member

padix-key commented Jun 12, 2020

The reason is that the chain_id annotation array uses the NumPy dtype 'U3', allowing only 3 characters. I was not aware, that 4-character chain IDs were allowed. However, four characters seems the maximum, as the MMTF format decodes the chainNameList and chainIdList fields into arrays of 4-character strings. Therefore, I will create a PR that sets the chain_id dtype to 'U4'.

@padix-key padix-key mentioned this issue Jun 12, 2020
@padix-key
Copy link
Member

pdbx_file = pdbx.PDBxFile.read(rcsb.fetch("6n1d", "pdbx"))
pdbx_structure = pdbx.get_structure(pdbx_file, model=1)
print(np.unique(pdbx_structure.chain_id))

Before:

['A16' 'A23' 'A5S' 'AL0' 'AL1' 'AL2' 'AL3' 'AMR' 'APT' 'AS0' 'AS1' 'AS2'
 'ATH' 'B16' 'B23' 'B5S' 'BAT' 'BL0' 'BL1' 'BL2' 'BL3' 'BMR' 'BPT' 'BS0'
 'BS1' 'BS2' 'BTH']

After fix:

['A16S' 'A23S' 'A5S' 'AL01' 'AL02' 'AL03' 'AL04' 'AL05' 'AL06' 'AL09'
 'AL13' 'AL14' 'AL15' 'AL16' 'AL17' 'AL18' 'AL19' 'AL20' 'AL21' 'AL22'
 'AL23' 'AL24' 'AL25' 'AL27' 'AL28' 'AL29' 'AL30' 'AL31' 'AL32' 'AL33'
 'AL34' 'AL35' 'AL36' 'AMRN' 'APTN' 'AS02' 'AS03' 'AS04' 'AS05' 'AS06'
 'AS07' 'AS08' 'AS09' 'AS10' 'AS11' 'AS12' 'AS13' 'AS14' 'AS15' 'AS16'
 'AS17' 'AS18' 'AS19' 'AS20' 'ATHX' 'B16S' 'B23S' 'B5S' 'BATN' 'BL01'
 'BL02' 'BL03' 'BL04' 'BL05' 'BL06' 'BL09' 'BL13' 'BL14' 'BL15' 'BL16'
 'BL17' 'BL18' 'BL19' 'BL20' 'BL21' 'BL22' 'BL23' 'BL24' 'BL25' 'BL27'
 'BL28' 'BL29' 'BL30' 'BL31' 'BL32' 'BL33' 'BL34' 'BL35' 'BL36' 'BMRN'
 'BPTN' 'BS02' 'BS03' 'BS04' 'BS05' 'BS06' 'BS07' 'BS08' 'BS09' 'BS10'
 'BS11' 'BS12' 'BS13' 'BS14' 'BS15' 'BS16' 'BS17' 'BS18' 'BS19' 'BS20'
 'BTHX']

padix-key added a commit that referenced this issue Jun 13, 2020
@padix-key
Copy link
Member

The PR is merged now, the issue should be fixed. Thanks for bringing this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants