### Scientific question: What is the molecular basis of naturally developed human CCR5 mutation in HIV resistance?


### Hypothesis: If a mutation occurs on CCR5 that can influence the conformational integrity or structure of this molecule, which results in impaired CCR5 expression at cell surface or binding to gp120 expressed by HIV, then the cell carrying this mutation will be resistant to HIV infection.

### Packages

### Structural bioinformatics 3D protein measurement

In [1]:
import nglview as nv

# view structure of CD4-gp120-CXCR5 complex by reading in a pdb file
# https://www.rcsb.org/structure/6MET
view_6met=nv.show_structure_file('6met_CD4_gp120_CCR5.pdb')
view_6met



NGLWidget()

In [2]:
# clear imported representations
view_6met.clear()
# demonstrate the structure in ribbon format using chainindex as color scheme
# https://projects.volkamerlab.org/teachopencadd/talktorials/T017_advanced_nglview_usage.html
view_6met.add_ribbon("protein", color_scheme="chainindex")

# how to color protein complex by chains?

### Homology Modeling
Homology modeling performed by SWISS-MODEL to obtain protein structures of CCR5 mutant alleles. https://swissmodel.expasy.org/

In [3]:
# structure of wild-type human CCR5 obtained from the complex shown above
CCR5=nv.show_structure_file('CCR5.pdb')
CCR5.clear()
CCR5.add_cartoon('protein',color='blue')
CCR5

NGLWidget()

In [4]:
# structure of human CCR5 with delta-32 mutation
delta_32=nv.show_structure_file('CCR5_delta_32.pdb')
delta_32.clear()
delta_32.add_cartoon('protein',color='red')
delta_32

NGLWidget()

In [5]:
# structure of human CCR5 with C20S mutation
C20S=nv.show_structure_file('C20S.pdb')
C20S.clear()
C20S.add_cartoon('protein',color='orange')
C20S

NGLWidget()

In [6]:
# structure of human CCR5 with C101A mutation
C101A=nv.show_structure_file('C101A.pdb')
C101A.clear()
C101A.add_cartoon('protein',color='green')
C101A

NGLWidget()

### Multiple sequence alignment
Multiple sequence alignment preformed with Clustal Omega using amino acid sequences of beta-chemokine receptors. https://www.ebi.ac.uk/Tools/msa/clustalo/

In [7]:
from Bio import AlignIO

alignments = AlignIO.read(open("CCR_alignment.aln"), "clustal")
print(alignments)

Alignment with 8 rows and 396 columns
M-----------------------DYQVSSPIYDINYY------...--- lcl|Query_10001
M----------------ETPNTTEDYDTTTEFD----------Y...--- lcl|Query_10002
ML------STSRSRF---IRNTNESGEEVTTFFDYDY-------...--- lcl|Query_10003
MP------FGIRMLLRAHKPGSSRRSEMTTSLDTVETFGTTSYY...--- lcl|Query_10004
M-------------------NPTDIADTTLDESIYSNY---YLY...--- lcl|Query_10005
MS-GESMNFSDVF------------DSSEDYFVSVNTSYYSVDS...FTM lcl|Query_10006
MDLGKPMKSVLVVALLVIFQVCLCQDEVTDDYIGDNTT---VDY...FSP lcl|Query_10007
MD------YTLDLSV-----------TTVTDYYYPDIF------...--- lcl|Query_10008


### Sequence logos
https://www.tije.co/post/seqlogo_from_multiple_sequence_alignment/

In [8]:
pip install seqlogo

Note: you may need to restart the kernel to use updated packages.


In [9]:
conda install -c conda-forge ghostscript


Note: you may need to restart the kernel to use updated packages.


In [10]:
conda install -c conda-forge pdf2svg


Note: you may need to restart the kernel to use updated packages.


In [11]:
conda env create -f environment.yml


Note: you may need to restart the kernel to use updated packages.


In [12]:
from Bio import AlignIO
import pandas as pd
import seqlogo

In [13]:
CCR_alignmentfile = "CCR_alignment.aln"
CCR_alignment = AlignIO.read(CCR_alignmentfile, "clustal")
CCR_alignment

<<class 'Bio.Align.MultipleSeqAlignment'> instance (8 records of length 396) at 15f56f5b970>

In [14]:
def alnDF(aln, characters="ACDEFGHIKLMNPQRSTVWY"):
  alnRows = aln.get_alignment_length()
  compDict = {char:[0]*alnRows for char in characters}
  for record in aln:
    header = record.id
    seq = record.seq
    for aaPos in range(len(seq)):
      aa = seq[aaPos]
      if aa in characters:
        compDict[aa][aaPos] += 1    
  return pd.DataFrame.from_dict(compDict)

In [15]:
CCR_align = alnDF(CCR_alignment)
CCR_align

Unnamed: 0,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0
1,0,0,2,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0
2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
392,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
393,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
394,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0


In [16]:
CCR_alignFreq = CCR_align.div(CCR_align.sum(axis=1), axis=0)
CCR_alignFreq

Unnamed: 0,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
392,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
393,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
394,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0


In [17]:
CCR_seqLogo= seqlogo.Ppm(CCR_alignFreq,alphabet_type="AA")
seqlogo.seqlogo(CCR_seqLogo, ic_scale = False, format = 'jpeg', size = 'large')

OSError: Could not find Ghostscript on path. There should be either a gs executable or a gswin32c.exe on your system's path

### Result
Structure of CCR5 with C20S or C101A mutation does not show great variation from the wildtype. Structure of CCR5 with delta-32 mutation shows that this mutation will cause this protein to lose its conformational integrety. Since CCR5 is required for HIV entry into the cell by forming CD4-gp120-CCR5 complex, delta-32 mutation is expected to hinder the function of CCR5 as coreceptor for HIV entrance. This could result in the development of HIV resistance, which is proven by research demonstrating that people with CCR5-delta-32 allele are resitant to HIV infection. Although C20S and C101A mutation do not seeme to influence the overall secondary structure of CCR5 based on result from homology modeling, point mutation in cystine could result in a loss of disulfide bond, which could also influence the conformation of the protein. Therefore, these two mutation could also hinder the ability of CCR5 serving as coreceptor of HIV entry into host cells. This is also proven by research demonstrating that cells carrying genes with C20S mutation or C101A mutation have reduced expression level of CCR5 at cell surface, which in turn reduced HIV infection. Therefore, it is likely that a mutation in CCR5 which could alter the structure of the protein and reduce its expression at cells surface could cause the development of HIV resistance. Interestingly, the two point mutations analyzed, C20S and C101A, both occured in highly conserved region throughout beta-chemokine receptor.