# M12: Challenge Project Computational Tasks for Structure and Function
## Daiyun Dong

## Introduction
The FTO protein belongs to the AlkB family of non-heme 2-oxoglutarate (2OG) dependent dioxygenases, primarily involved in the oxidative demethylation of DNA and RNA (Fedeles B. I et al., 2015). As a member of this family, FTO exhibits high specificity for the RNA modification m6A, affecting mRNA stability and translation efficiency, thereby regulating gene expression. It is mainly divided into the N-terminal domain (NTD) and the C-terminal domain (CTD). The NTD can catalyze the demethylation, while CTD may help stabilize the conformation of the protein (Han et al., 2010).

The m6A modification involves replacing the hydrogen in the amino group linked to carbon 6 of adenine with a methyl group. A series of methyltransferases carry out the addition of the methyl group, and these methyl modifications can be recognized by proteins known as readers and removed by RNA demethylases. The protein FTO (Fat Mass and Obesity-associated) is one such demethylase (Guo J. et al., 2024).

FTO's catalytic activity has profound implications for health and disease. Initially reported as a protein closely associated with human obesity and energy balance (Fawcett, K. A, 2010), it was later discovered to have demethylase activity. m6A methylation involves the addition of S-adenosyl methionine to the N6 position of adenine, a site extensively present to regulate gene expression. Abnormal m6A modifications can lead to aberrant expression of corresponding genes, thereby promoting disease. Thus, abnormalities in FTO lead to aberrant m6A, which in turn can cause various diseases such as hypertension, diabetes, obesity, non-alcoholic fatty liver, vascular diseases, heart failure, and cardiovascular diseases (Zhang B. et al., 2022). Additionally, FTO can affect the biological functions of tumors by regulating m6A levels in tumors; abnormal levels of FTO can promote tumor progression (Guo J. et al., 2024). The widespread downregulation of FTO in cancer is associated with increased invasion, metastasis, and poor clinical outcomes (Jeschke J et al., 2021).

Therefore, FTO is widely focused upon as a potential drug target. By inhibiting or regulating the activity of FTO, it is possible to modulate the pattern of m6A modifications, thereby affecting the development of related diseases. Studies have shown that specific small molecule inhibitors can effectively inhibit the activity of FTO. For example, the FTO inhibitor 18097 can selectively inhibit the demethylase activity of FTO, significantly inhibiting the growth of breast cancer cells in vivo and their settlement in the lungs (Xie G et al., 2022). Thus, small molecule inhibitors of FTO represent a potential new strategy for treating obesity, certain types of cancer, and cardiovascular diseases.

Current research on small molecule inhibitors of FTO still needs to be improved, and clinical therapies targeting FTO protein have yet to be fully developed. Based on this, this project focuses on the structure and function of the FTO protein, analyzing the mechanisms by which FTO accurately recognizes and binds RNA, and identifying key amino acid residues in FTO that affect its activity. A deep understanding of the structure and function of the FTO protein, especially how it accurately recognizes and binds RNA and which amino acid residues are crucial for its activity, has practical application for developing targeted therapeutic strategies.




**References:**



Fedeles, B. I., Singh, V., Delaney, J. C., Li, D., & Essigmann, J. M. (2015). The AlkB Family of Fe(II)/α-Ketoglutarate-dependent Dioxygenases: Repairing Nucleic Acid Alkylation Damage and Beyond. Journal of Biological Chemistry, 290(34), 20734–20742. https://doi.org/10.1074/jbc.r115.656462

Han, Z., Niu, T., Chang, J., Lei, X., Zhao, M., Wang, Q., Cheng, W., Wang, J., Feng, Y., & Chai, J. (2010). Crystal structure of the FTO protein reveals basis for its substrate specificity. Nature, 464(7292), 1205–1209. https://doi.org/10.1038/nature08921

Guo, J., Zhao, L., Duan, M., Yang, Z., Zhao, H., Liu, B., Wang, Y., Deng, L., Wang, C., Jiang, X., & Jiang, X. (2024). Demethylases in tumors and the tumor microenvironment: Key modifiers of N6-methyladenosine methylation. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie, 174, 116479. Advance online publication. https://doi.org/10.1016/j.biopha.2024.116479

Fawcett, K. A., & Barroso, I. (2010). The genetics of obesity: FTO leads the way. Trends in genetics : TIG, 26(6), 266–274. https://doi.org/10.1016/j.tig.2010.02.006

Zhang, B., Jiang, H., Dong, Z., Sun, A., & Ge, J. (2020). The Critical Roles of m6A Modification in Metabolic Abnormality and Cardiovascular Diseases. Genes & Diseases. https://doi.org/10.1016/j.gendis.2020.07.011

Jeschke, J., Collignon, E., Al Wardi, C., Krayem, M., Bizet, M., Jia, Y., Garaud, S., Wimana, Z., Calonne, E., Hassabi, B., Morandini, R., Deplus, R., Putmans, P., Dube, G., Singh, N. K., Koch, A., Shostak, K., Rizzotto, L., Ross, R. L., & Desmedt, C. (2021). Downregulation of the FTO m6A RNA demethylase promotes EMT-mediated progression of epithelial tumors and sensitivity to Wnt inhibitors. Nature Cancer, 2(6), 611–628. https://doi.org/10.1038/s43018-021-00223-7


Xie, G., Wu, X. N., Ling, Y., Rui, Y., Wu, D., Zhou, J., Li, J., Lin, S., Peng, Q., Li, Z., Wang, H., & Luo, H. B. (2022). A novel inhibitor of N6-methyladenosine demethylase FTO induces mRNA methylation and shows anti-cancer activities. Acta pharmaceutica Sinica. B, 12(2), 853–866. https://doi.org/10.1016/j.apsb.2021.08.028


## Structural Question A  How do protein dynamic changes affect the interaction and stability between two domains?

### 1. What bioinformatics tools can simulate the dynamic changes of protein domains?
GROMACS can be used for molecular dynamics simulations of biomolecules in solutions (Berendsen, H. J. C., 1995). Thus, we can study the physicochemical properties of the contact interface by using GROMACS and predicting the interaction between the two domains. In this process, a three-dimensional structure file of the target protein is required, and we need to ensure that both domains are included in the model. Subsequently, tools in GROMACS are used to generate topology files, and we need to select appropriate force fields and create the necessary force field parameter files for the simulation. Next, the simulation box is constructed, and a proper amount of solvent, along with necessary ions, is added to neutralize the system. Before proceeding with the dynamics simulation, we also need to do energy minimization to eliminate unreasonable atomic distances and collisions. Additionally, we need to bring the system to the desired temperature and pressure. Finally, GROMACS' analysis tools can be used to simulate molecular dynamics and identify the contact interface between the two domains. Based on these results, we can further analyze the amino acid composition, hydrophobicity or hydrophilicity, and charge distribution at the interface.

**References:**

Berendsen, H. J. C., van der Spoel, D., & van Drunen, R. (1995). GROMACS: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 91(1-3), 43–56. https://doi.org/10.1016/0010-4655(95)00042-e


### 2. What are the mechanisms of conformational changes?


Protein conformational changes are a core mechanism in cellular physiological functions, involving complex and diverse dynamic structural adjustments. Among these, "conformational selection" and "induced fit" are two main mechanisms of conformational change. Conformational selection theory suggests that proteins can sample a variety of potential conformations even before ligand binding, and the binding of a ligand promotes the stability of certain specific conformations. In contrast, induced fit suggests that ligand binding can cause the protein to transition directly from one conformation to another (Grant et al., 2010).

In protein conformational changes, rigid-body domain movement involves the relative motion between two or more domains of a protein, typically accomplished through hinge or shear-type actions. This movement does not affect the protein's secondary structure. For example, periplasmic binding proteins (PBPs) demonstrate this action. Upon ligand binding, the N-terminal and C-terminal domains of these proteins close, thus trapping the ligand and almost completely isolating it from the solvent (Ha & Loh, 2012).

Limited structural rearrangement involves the movement or ordering/disordering transition of small parts of the protein structure, such as a loop or a small segment. These changes usually occur near the protein's active site and can directly affect its catalytic activity or interactions with other molecules. For instance, some enzymes may undergo minor structural adjustments near the active site upon substrate binding, thus activating or inhibiting enzyme activity (Ha & Loh, 2012).

Global folding changes are also an important mechanism of protein conformational change, involving a transition from one folded state to a completely different folded state. For instance, in viral proteins such as influenza hemagglutinin (HA), a proton binding event triggered by low pH induces a significant conformational change that enables viral fusion with the host cell membrane (Ha & Loh, 2012).

Folding-unfolding changes involve the complete unfolding or refolding of protein structures and is often related to changes in intra-molecular or inter-molecular interactions. This mechanism is found in many regulatory proteins that need to expose or hide their functional domains by completely unfolding or folding. For example, a signaling protein might unfold in its inactive state, and upon receiving a signal, it folds into a functional structure to initiate downstream signaling (Ha & Loh, 2012).

In summary, the conformational plasticity of proteins allows them to switch between different states. The changeable conformations of proteins allow them to respond effectively to ligand binding, chemical modifications, environmental changes, or other molecular recognition events. Thus, this property enables them to act as molecular switches in cellular signaling, pathogen infection, and other biological processes. 



**Reference:**

Grant, B. J., Gorfe, A. A., & McCammon, J. A. (2010). Large conformational changes in proteins: signaling and other functions. Current Opinion in Structural Biology, 20(2), 142–147. https://doi.org/10.1016/j.sbi.2009.12.004

Ha, J.-H., & Loh, S. N. (2012). Protein Conformational Switches: From Nature to Design. Chemistry (Weinheim an Der Bergstrasse, Germany), 18(26), 7984–7999. https://doi.org/10.1002/chem.201200348


### 3. What role do flexible regions play in protein structure and function?


Flexible regions in proteins are crucial for their structure and function. According to Liu and Karimi (2007), these regions typically include loops or irregular structural parts of the protein, which exhibit high mobility and plasticity during protein dynamics. This high degree of dynamics enables flexible regions to play roles in various biological processes. For instance, the flexible regions around the active sites of certain enzymes can directly regulate substrate binding and catalytic activity through their dynamic changes (Liu & Karimi, 2007). Additionally, upon receiving external signals such as environmental changes or binding with other molecules/proteins, these flexible regions allow proteins to respond by altering their conformation. This flexibility is essential for proteins to maintain their functional adaptability (Chen, Kurgan & Ruan, 2007). Grant et al. (2010) mentioned that the flexible regions of proteins might also be involved in larger-scale structural rearrangements, which is crucial for significant changes in protein function. The flexibility of these regions enables proteins to respond to changes within the cellular environment, such as changes in pH or the presence of other molecules, thus enabling functional adjustments or switches through conformational changes.

Moreover, understanding the flexible regions can help us delve deeper into the protein folding process and predict their 3D structures. Through quantitative analysis of these areas, Chen et al. (2007) found that some protein sequences' flexible regions exhibit various dynamic changes, such as rotation, missing, or disarray. These flexible regions are not only the key to understanding how proteins respond to biochemical signals, but also crucial for predicting protein structure.

**Reference:**

Liu, X., & Karimi, H. A. (2007). High-throughput modeling and analysis of protein structural dynamics. Briefings in Bioinformatics, 8(6), 432–445. https://doi.org/10.1093/bib/bbm014

Chen, K., Kurgan, L. A., & Ruan, J. (2007). Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Structural Biology, 7(1), 25. https://doi.org/10.1186/1472-6807-7-25

Grant, B. J., Gorfe, A. A., & McCammon, J. A. (2010). Large conformational changes in proteins: signaling and other functions. Current Opinion in Structural Biology, 20(2), 142–147. https://doi.org/10.1016/j.sbi.2009.12.004


### 4. How do the shape, charge distribution, and hydrophilic/hydrophobic characteristics of the interaction interface between NTD and CTD change during FTO protein dynamics?

In X-ray crystallography, the B-factor describes the amplitude of vibration of atoms or atomic groups around their average positions within the crystal structure. These vibrations reflect the thermal motion of atoms within the crystalline environment and can indirectly reveal the flexibility and dynamics of proteins. Therefore, the B-factor can also provide information about the dynamic behavior of different protein domains. A higher B-factor indicates that the corresponding atom or atomic group vibrates more within the crystal, suggesting that this part of the structure is more dynamic and flexible. Conversely, a lower B-factor indicates more minor vibrations around the average position, indicating a more stable and orderly structure.

Considering that the NTD (N-terminal domain) and CTD (C-terminal domain) are located internally within the FTO protein, we can examine the B-factor information in the protein crystal structure to study the interaction between NTD and CTD. 

<img src="https://p.ipic.vip/6116sc.png" width="500" height="auto" align="middle" /> 

The B-factor values range from 0 to 70 and are mapped to a color gradient from blue (low) to red (high), as shown in the image, where the left side is the CTD and the right side is the NTD. We can see that the areas interacting with RNA are the most stable, the C-terminus of the CTD is the most dynamic, and the binding sites between the CTD and NTD are generally stable. Khatiwada B. et al. (2022) studied the structure of FTO using solution nuclear magnetic resonance and molecular dynamics simulations, suggesting that the NTD plays a primary catalytic role, while one end of the CTD helix bundle helps stabilize the conformation of the NTD but does not directly contact the FTO substrate. There are no significant conformational changes between the NTD and CTD structural domains, which is crucial for stabilizing the active conformation of the catalytic domain.

Based on this, there are no apparent conformational changes between the NTD and CTD under normal conditions, nor significant changes in shape, charge distribution, and hydrophilic/hydrophobic characteristics. Although there is no direct evidence of the function of the CTD, the stability of its interaction with the NTD suggests that it may be necessary for the stability of the conformation during the functional process of FTO.


**References:**

Khatiwada, B., Nguyen, T. T., Purslow, J. A., & Venditti, V. (2022). Solution structure ensemble of human obesity-associated protein FTO reveals druggable surface pockets at the interface between the N- and C-terminal domain. The Journal of biological chemistry, 298(5), 101907. https://doi.org/10.1016/j.jbc.2022.101907

## Structural Question B  How does charge help the binding of the RNA strand to the active site of the protein?



### 1. How to view the charge of the RNA strand and the active site of the protein?
There are several bioinformatics tools available for calculating charge distributions, such as APBS (Adaptive Poisson-Boltzmann Solver). APBS can be used to calculate the charge distribution maps of an entire protein-RNA complex or specific parts of it. After inputting the molecular structure file in PDB format into APBS, it provides outputs related to the charge distribution.

Subsequently, tools like PyMOL or VMD can be used to view these charge distribution maps. By using the surface charge coloring feature in PyMOL, we can observe and analyze the charge distribution on the RNA strand and the active site of the protein.

The distribution of charge of the FTO protein is shown below.

<img src="https://p.ipic.vip/xon9on.png" width="250" height="auto" align="middle" />


### 2. How to determine which amino acid residues belong to the active site within the protein?

X-ray crystallography is one of the most direct methods for identifying active sites in proteins. By analyzing the crystal structure of a protein that is bound with a substrate, the amino acid residues interacting with the substrate can be directly observed.

NMR can also be used to determine protein active sites. By labeling the substrate and observing the chemical shift changes caused by its interaction with the protein, the potential location of the active site can be inferred.

Additionally, specialized bioinformatics tools such as COFACTOR (Zhang, C. et al, 2017) (https://seq2fun.dcmb.med.umich.edu//COFACTOR/) can be used to predict the active sites of proteins. COFACTOR first analyzes the three-dimensional structure of the protein. Structural alignment helps identify the locations and structural similarities of protein functional domains. By comparing the target protein with proteins of known function based on sequence similarity, conserved functional regions and active sites can be indicated. 

The result of COFACTOR is shown below. According to the picture, we can find that the amino acid residues that interact with RNA are mainly in the center of the NTD, and they are very close to the RNA which makes their interaction possible.

<img src="https://p.ipic.vip/3jg7zw.png" width="1200" height="auto" align="middle" />


Observing the structure of FTO which is obtained from an X-ray crystallograph, we can see that the protein is primarily divided into two domains. According to Han et al. (2010), residues 32-326 constitute the N-terminal domain (NTD), and residues 327-498 make up the C-terminal domain (CTD). In PyMOL, the NTD is represented by the right side in deep blue, and the CTD by the left side in light blue. Manganese ions and n-oxalylglycine are located at the center of the NTD, while the DNA strand is positioned between the NTD and CTD, closer to the center of the NTD.

Based on this, we can deduce that the active site of the FTO protein is primarily located in the NTD, especially in the central area where the manganese ion and n-oxalylglycine are bound. This is because the metal ion and substrate are essential for the catalytic reaction, and their location indicates that the central part of the NTD is of great functional importance.

<img src="https://p.ipic.vip/bkva8m.png" width="500" height="auto" align="middle" />


 
**References:**

Zhang, C., Freddolino, P. L., & Zhang, Y. (2017). COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic acids research, 45(W1), W291–W299. https://doi.org/10.1093/nar/gkx366

Han, Z., Niu, T., Chang, J., Lei, X., Zhao, M., Wang, Q., Cheng, W., Wang, J., Feng, Y., & Chai, J. (2010). Crystal structure of the FTO protein reveals basis for its substrate specificity. Nature, 464(7292), 1205–1209. https://doi.org/10.1038/nature08921

### 3. How is the distribution of amino acid charges conducive to the binding of RNA strands?

The interactions between proteins and RNA involve various molecular mechanisms, including hydrogen bonds, van der Waals forces, hydrophobic interactions, π-π stacking, and electrostatic interactions (Corley, M. et al., 2020). Hydrogen bonding is the most prevalent form of interaction within protein-RNA complexes, typically occurring between the polar or positively charged amino acid residues of the protein and the negatively charged phosphate groups or oxygen-containing bases of RNA. For example, positively charged lysine and arginine residues often form strong charge-complementary hydrogen bonds with RNA through their side chains. Due to electrostatic interactions, charged residues can significantly promote RNA binding even at distances up to 11 Å (GuhaThakurta, D., 2000). Negatively charged amino acid residues generate repulsive forces with RNA, facilitating the correct positioning of RNA binding and recognition of RNA bases. When the charged residues in a protein cannot pair with complementary charges on RNA, binding can be hindered due to electrostatic repulsion.

In summary, we can discern that positively charged amino acids on the protein surface can form stable charge-complementary structures when binding with the negatively charged RNA. A dense distribution of positively charged amino acids at the RNA binding center facilitates RNA binding. However, proper distribution of negatively charged amino acids helps enhance binding specificity by recognizing the charge distribution on specific RNA sequences.


<img src="https://p.ipic.vip/xon9on.png" width="500" height="auto" align="middle" />

Observing the FTO protein structure above, we can find that the center shows a negative charge, while the area around the center shows a positive charge. Considering that the calculation of charge is based on the structure of the protein with RNA, the presentation of negative electronegativity may be due to the presence of RNA in the center. In addition, the positive charge around the center may help the binding with the negatively charged RNA, facilitating the editing of the RNA by the FTO protein.

**References:**

Corley, M., Burns, M. C., & Yeo, G. W. (2020). How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms. Molecular cell, 78(1), 9–29. https://doi.org/10.1016/j.molcel.2020.03.011
GuhaThakurta, D., & Draper, D. E. (2000). Contributions of basic residues to ribosomal protein L11 recognition of RNA. Journal of molecular biology, 295(3), 569–580. https://doi.org/10.1006/jmbi.1999.3372




### 4. Does the length of the RNA strand affect its binding to the protein's active site?
Many RNA-binding domains (RBDs) bind with short RNA motifs that are 3 to 5 nucleotides in length (Stitzinger, S. H. et al., 2022). Given the limited nature of RNA binding domains, once the length of RNA is sufficient to interact with the amino acids within the RBD, further extension of the RNA does not significantly affect its affinity with the protein. Therefore, for my protein of interest, FTO can edit RNAs of various lengths, from short to long, without a significant preference.

However, the number of binding sites within the RBD significantly impacts the binding between RNA and the protein, with the affinity increasing exponentially with the number of binding sites. The research by Stitzinger, S. H. et al. (2022) indicates that adding or removing a domain (or an RNA binding site) can create a significant difference between binding and non-binding.

**References:**

Stitzinger, S. H., Salma Sohrabi-Jahromi, & Johannes Söding. (2022). Cooperativity boosts affinity and specificity of proteins with multiple RNA-binding domains. NAR Genomics and Bioinformatics, 5(2). https://doi.org/10.1093/nargab/lqad057


### 5. How do the charges of the RNA strand and the active site of the protein change during their interactions?

According to Jones S. et al. (2001), lysine, tyrosine, phenylalanine, isoleucine, and arginine show a higher propensity for interaction within protein-RNA complexes, with aromatic amino acids and positively charged residues playing significant roles. This is because aromatic compounds may accumulate near unpaired bases in RNA molecules, interacting with RNA bases and aiding the stabilization of the RNA molecule. Additionally, charged and polar residues complement the negative charges on DNA, facilitating interactions between proteins and RNA. The charge distribution at the active site is crucial for the stability of RNA and protein-RNA interactions.

In the binding process between RNA and proteins, they may undergo conformational changes to facilitate binding (Williamson, J. R., 2000). In the induced fit model, the binding of RNA and proteins is not a simple "key-lock" mechanism but a dynamically adjusted process. RNA and proteins may have some degree of structural flexibility or partially unfolded states before contact. When the binding interface of RNA contacts the protein, both may undergo conformational changes to better adapt to each other's structure. Through this mutual adaptation, the stability and specificity of the complex are enhanced, promoting the realization of biological functions. The charge distribution also changes during the conformational changes that facilitate binding.


**References:**

Jones, S., Daley, D. T., Luscombe, N. M., Berman, H. M., & Thornton, J. M. (2001). Protein-RNA interactions: a structural analysis. Nucleic acids research, 29(4), 943–954. https://doi.org/10.1093/nar/29.4.943
Williamson, J. R. (2000). Nature Structural Biology, 7(10), 834–837. https://doi.org/10.1038/79575

## Functional Question C  If a key residue determining substrate specificity in the active site of FTO is mutated, how would it affect substrate binding?

### 1. What are the key residues within the FTO active site that determine substrate specificity?
FTO belongs to the AlkB subfamily. Among the studied AlkB enzymes, only FTO, AlkB, ALKBH1-3, and ALKBH5 are capable of processing N-methylated DNA/RNA substrates (Toh, J. D. W. et al., 2015). The crystal structures of FTO complexed with N3-methylcytosine (m3T) and N-oxalylglycine (NOG, a catalytically inert analog of 2OG) have been reported. Toh, J. D. W. et al. suggest that the 2OG binding site is very close to the nucleotide-binding site, and the recognition of m3T by FTO could involve six residues: Tyr108, His231, Leu109, Val228, Arg96, and Glu234. Besides, Glu234 may be the key residue determining the affinity and specificity of FTO for its substrates.

<img src="https://pubs.rsc.org/image/article/2015/sc/c4sc02554g/c4sc02554g-f2_hi-res.gif" width="700" height="auto" align="middle" />

Based on this, we will further explore Glu234 in FTO.


**References:**

Toh, J. D. W., Sun, L., Lau, L. Z. M., Tan, J., Low, J. J. A., Tang, C. W. Q., Cheong, E. J. Y., Tan, M. J. H., Chen, Y., Hong, W., Gao, Y. G., & Woon, E. C. Y. (2015). A strategy based on nucleotide specificity leads to a subfamily-selective and cell-active inhibitor of N6-methyladenosine demethylase FTO. Chemical science, 6(1), 112–122. https://doi.org/10.1039/c4sc02554g


### 2. What tools can predict the impact of protein mutations on their stability and dynamic characteristics?
mCSM (https://biosig.lab.uq.edu.au/mcsm/) utilizes a graph-based method based on cutoff scanning matrices (CSM) to predict the impact of point mutations, including effects on protein stability, and protein-protein, protein-nucleic acid, and protein-ligand affinity (Pires, D. E. et al. 2014).

Using the mCSM server, we can assess the impact of mutations at the Glu234 site.

Amino acids most similar to Glu in terms of physical properties are Asp. Both of these amino acids have negatively charged side chains, which typically allow them to undertake similar functions and chemical reactions in proteins, such as in-charge interactions within the protein structure and as donors or acceptors of protons. Additionally, Glu and Gln also share many similarities. Glu and Gln both contain the same side-chain backbone and are polar amino acids, which enhances their solubility in water. Although Glu is negatively charged and Gln is neutral, both can form hydrogen bonds in aqueous solutions. Based on this, we investigated the effects of the E234D and E234Q mutations.

The impact of the E234D mutation is as follows. When Glu is mutated to Asp, both protein stability and affinity for RNA binding significantly decrease.

| <img src="https://p.ipic.vip/ntvr5a.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/h2ajxz.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/foanaf.png" width="400" height="auto" align="middle" /> |
| :--: | :---: | :---: |

The impact of the E234Q mutation is as follows. When glutamate is mutated to glutamine, both protein stability and affinity for RNA binding significantly decrease. However, the impact of E234Q is smaller than that of E234D, suggesting that glutamine is more similar to glutamate in this environment. The functionality of the glutamate residue is likely primarily related to its side chain.

| <img src="https://p.ipic.vip/o2bqdc.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/txwj26.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/zbxd2u.png" width="400" height="auto" align="middle" /> |
| :--: | :---: | :---: |

When glutamate at other positions is mutated to glutamine, the result is as follows. Protein stability slightly decreases, and even the affinity for RNA binding may increase. This suggests that the amino acid at this position is not critical or a key residue. When the amino acid is a key residue, mutations typically have a more significant negative impact on protein stability and affinity for RNA binding, as seen in the two mutations mentioned above.

| <img src="https://p.ipic.vip/tqtee9.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/vhc9o5.png" width="250" height="auto" align="middle" /> |<img src="https://p.ipic.vip/774l16.png" width="400" height="auto" align="middle" /> |
| :--: | :---: | :---: |


**References:**

Pires, D. E., Ascher, D. B., & Blundell, T. L. (2014). mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics (Oxford, England), 30(3), 335–342. https://doi.org/10.1093/bioinformatics/btt691

### 3. Use Blast to find proteins homologous to the FTO active site. What are the key residues in these proteins?

<img src="https://p.ipic.vip/cdzsys.png" width="700" height="auto" align="middle" />

Use the 5ZMD_A sequence of FTO to perform a Blast search. It was found that among the top 30 homologous proteins, Glu234 has never undergone mutation. Given its role in FTO, Glu234 may directly participate in the enzyme's demethylation catalytic activity. The negative charge of its side chain could be crucial for the correct positioning of RNA or for electron transfer in the catalytic reaction. During the evolutionary process, any variation in Glu234 might lead to a decrease or loss in protein functionality, thus affecting the survival and reproduction of the organism. Therefore, it is under strong negative selective pressure and is highly conserved among homologous proteins. This result further confirms that Glu234 may be a critical site in FTO.

### 4. What effect does the replacement of key residues have on secondary structure?

PSIPRED (Position Specific Iterated Prediction) (http://bioinf.cs.ucl.ac.uk/psipred/) is based on the method of Position Specific Scoring Matrices (PSSM) and predicts the presence of α-helices, β-sheets, and random coils within a protein by analyzing amino acid sequences. Thus, we can use PSIPRED to predict both the original and mutated sequences. 

Original sequence:
```
>pdb|5ZMD|A Chain A, Alpha-ketoglutarate-dependent dioxygenase FTO
EFYQQWQLKYPKLILREASSVSEELHKEVQEAFLTLHKHGCLFRDLVRIKGKDLLTPVSRILIGNPGCTY
KYLNTRLFTVPWPVKGSNIKHTEAEIAAACETFLKLNDYLQIETIQALEELAAKEKANEDAVPLCMSADF
PRVGMGSSYNGQDEVDIKSRAAYNVTLLNFMDPQKMPYLKEEPYFGMGKMAVSWHHDENLVDRSAVAVYS
YSCEGPEEESEDDSHLEGRDPDIWHVGFKISWDIETPGLAIPLHQGDCYFMLDDLNATHKHCVLAGSQPR
FSSTHRVAECSTGTLDYILQRCQLALQNVCDDVDNDDVSLKSFEPAVLKQGEEIHNEVEFEWLRQFWFQG
NRYRKCTDWWCQPMAQLEALWKKMEGVTNAVLHEVKREGLPVEQRNEILTAILASLTARQNLRREWHARC
QSRIARTLPADQKPECRPYWEKDDASMPLPFDLTDIVSELRGQ
```
<img src="https://p.ipic.vip/6magtd.png" width="900" height="auto" align="middle" />

E234Q mutated sequence:
```
>pdb|mutant_5ZMD|A Chain A, Alpha-ketoglutarate-dependent dioxygenase FTO
EFYQQWQLKYPKLILREASSVSEELHKEVQEAFLTLHKHGCLFRDLVRIKGKDLLTPVSRILIGNPGCTY
KYLNTRLFTVPWPVKGSNIKHTEAEIAAACETFLKLNDYLQIETIQALEELAAKEKANEDAVPLCMSADF
PRVGMGSSYNGQDEVDIKSRAAYNVTLLNFMDPQKMPYLKEEPYFGMGKMAVSWHHDQNLVDRSAVAVYS
YSCEGPEEESEDDSHLEGRDPDIWHVGFKISWDIETPGLAIPLHQGDCYFMLDDLNATHKHCVLAGSQPR
FSSTHRVAECSTGTLDYILQRCQLALQNVCDDVDNDDVSLKSFEPAVLKQGEEIHNEVEFEWLRQFWFQG
NRYRKCTDWWCQPMAQLEALWKKMEGVTNAVLHEVKREGLPVEQRNEILTAILASLTARQNLRREWHARC
QSRIARTLPADQKPECRPYWEKDDASMPLPFDLTDIVSELRGQ
```
<img src="https://p.ipic.vip/cpcxox.png" width="900" height="auto" align="middle" />

Surprisingly, the predictions for both sequences were identical. This may be due to PSIPRED's reliance on the overall sequence information, especially the evolutionary information of amino acid residues at each position reflected through the PSSM. Both Glu and Gln are chemically similar, each having a polar side chain. Therefore, in predicting the formation of secondary structures, the machine learning algorithms used by PSIPRED may not be sensitive enough to these subtle changes, resulting in consistent predictions.

Additionally, it's possible that the E234Q mutation does not affect the overall folding pattern of the protein, and the change in charge could only lead to abnormal protein recognition and binding to RNA.

Nevertheless, these are predictions based on computational models. Biological experiments would be necessary to explore the real impact of the E234Q mutation.

### 5. What are the interactions between key residues and substrates?

The interaction between key residues and substrates often involves multiple molecular mechanisms. Hydrogen bonding is one of the most common types of protein-substrate interactions. The amino, hydroxyl, or carboxyl groups in the chains of key residues can form hydrogen bonds with the corresponding functional groups of the substrate, thereby stabilizing the binding of the protein to the substrate. Secondly, charged residues in the protein can interact with oppositely charged parts of the substrate through electrostatic forces, enhancing the stability of the complex. Thirdly, hydrophobic residues in the protein may come into close contact with hydrophobic parts of the substrate, enhancing affinity through hydrophobic interactions. Fourthly, van der Waals forces can provide additional stabilization in the surface contact areas between the protein and substrate, although these forces are relatively weak. Fifthly, key residues may form temporary covalent bonds with the substrate, which is crucial for the progression of catalytic reactions. Sixthly, key residues may also facilitate the correct positioning of metal ions to interact with the substrate and promote the reaction.

In our protein of interest, FTO, it has been observed that RNA forms numerous hydrogen bonds with surrounding amino acid residues, aiding the binding of the protein to RNA. From the analysis above, it is also noted that the residues around RNA are positively charged, which is beneficial for interacting with the negatively charged RNA. By coloring hydrophobic amino acids (Gly, Ala, Val, Leu, Ile, Pro, Phe, Met, Trp) yellow and other amino acids blue, it is observed that the distribution of hydrophobic and other amino acids is relatively even, leading to the conclusion that there may not be wide and strong hydrophobic regions inside the FTO protein. Considering that the DNA strand at the catalytic center of the FTO protein is hydrophilic, it is speculated that the catalytic center of the FTO protein may be more hydrophilic. Additionally, manganese ions are at the center of the NTD. Some key residues may assist in positioning the manganese ion, promoting RNA editing.

| <img src="https://p.ipic.vip/t4iov2.png" width="300" height="auto" align="middle" />|<img src="https://p.ipic.vip/dv7yn4.png" width="300" height="auto" align="middle" /> |
| :---: | :---: |




## Functional Question D  FTO is responsible for m6A demethylation modification of RNA, and m6A modification mainly occurs in the RRACH sequence (R = G or A. H = A, C, or U). How does the FTO active site specifically recognize this sequence through hydrogen bonds?



### 1. What modifications are present on the RNA substrates?
FTO can demethylate N6-methyladenosine (m6A) and N6,2′-O-dimethyladenosine (m6Am) in mRNA and snRNA, as well as m6A and m6Am in snRNA. It can also demethylate N1-methyladenosine (m1A) in tRNA (Zhang X. et al., 2019).

m6A is the most common methylation modification in mRNA, typically occurring at specific sites within the RNA sequence, particularly in the RRACH motif. m6A affects RNA splicing, export, localization, translation, and degradation. The addition and removal of m6A are carried out by "writer" and "eraser" enzymes, while its recognition and functional implementation depend on "reader" proteins. FTO belongs to the "eraser" category of enzymes.

m6Am is a rarer modification in some RNAs' 5' cap structure. It is similar to m6A but also includes a methyl group at the 2' oxygen position of adenosine. This double methylation helps regulate RNA stability and translation efficiency.

The m1A modification occurs at the N1 position of adenosine and is more commonly found in tRNA and rRNA, though it is also observed in mRNA. m1A modifies the secondary structure of RNA and its interaction with proteins, thereby regulating RNA function and stability. Adding m1A can significantly alter the biochemical properties of RNA, such as its ability to bind to the translation machinery.

**References:**

Zhang, X., Wei, L. H., Wang, Y., Xiao, Y., Liu, J., Zhang, W., Yan, N., Amu, G., Tang, X., Zhang, L., & Jia, G. (2019). Structural insights into FTO's catalytic mechanism for the demethylation of multiple RNA substrates. Proceedings of the National Academy of Sciences of the United States of America, 116(8), 2919–2924. https://doi.org/10.1073/pnas.1820574116

### 2. What common features do the bases in the R and H base groups have?

<img src="https://p.ipic.vip/quv6u0.png" width="300" height="auto" align="middle" /> 

"R" including A and G. Both bases share the common feature of containing a double-ring structure (a five-membered ring and a six-membered ring), and both are nitrogen-containing heterocyclic compounds. Through steric hindrance, amino acids can easily distinguish between AG and CU.

"H" including C, U, and A. Aside from A, C and U share the common feature of containing a single-ring structure. Distinguishing between AG and CU is more straightforward; however, A and G have very similar structures, making it challenging to differentiate between A and G.

Nobeli I. et al. (2001) studied the molecular recognition and differentiation of the A and G ligand parts in protein complexes, finding significant differences in the protein environment surrounding the two nucleobase types. A is typically more exposed to the solvent, often using water molecules to achieve its atoms' hydrogen bonding potential. In contrast, G is almost always deeply buried within the binding site and primarily uses protein residues to form hydrogen bonds. Furthermore, the authors charted the tendency of each of the 20 most common amino acids to be located near A and G, as shown in the following diagram. Charged and polar residues are generally more favored, but there are significant differences in the tendencies of the two environments for A and G. Although there is no significant structural difference between A and G directly, proteins have evolved to distinguish between the two bases accurately.

<img src="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC60199/bin/gke58908.jpg" width="700" height="auto" align="middle" /> 

**Reference:**

Nobeli, I., Laskowski, R. A., Valdar, W. S., & Thornton, J. M. (2001). On the molecular discrimination between adenine and guanine by proteins. Nucleic acids research, 29(21), 4294–4309. https://doi.org/10.1093/nar/29.21.4294

### 3. Which specific amino acid residues frequently participate in interactions with RNA bases?
Jeong, E. et al. (2003) studied the characteristics of direct hydrogen bonds and water-mediated hydrogen bonds between amino acids and RNA bases, finding that polar and charged amino acids have a strong tendency to interact with nucleotides. The propensity of amino acids to form hydrogen bonds is influenced by the number of electronegative atoms and the accessibility of these atoms. Therefore, when discussing the tendency of amino acids to bind to RNA, it is necessary to consider both the number of charges on the amino acid residues and the structure of the amino acids.

As shown in the figure, amino acid residues that have a greater tendency to bind to RNA include ARG, LYS, ASN, SER, and THR. ARG has a positively charged guanidinium group, LYS has a positively charged amino group, and ASN, SER, and THR all contain polar side chains. These enable them to form multiple hydrogen bonds with RNA bases, making them particularly suited for interacting with the negative charges on RNA.

<img src="https://ars.els-cdn.com/content/image/1-s2.0-S1016847823137836-gr4_lrg.jpg" width="700" height="auto" align="middle" /> 

**References:**

Jeong, E., Kim, H., Lee, S.-W., & Han, K. (2003). Discovering the Interaction Propensities of Amino Acids and Nucleotides from Protein-RNA Complexes. Molecules and Cells/Molecules and Cells, 16(2), 161–167. https://doi.org/10.1016/s1016-8478(23)13783-6

### 4. Which specific amino acid residues frequently participate in interactions with methylated RNA bases?
Although there are no reports on how FTO recognizes methylated RNA, we can infer how FTO might recognize and bind to methylated RNA by observing the patterns in which other proteins recognize methylated RNA. 

Nicastro, G. et al. (2023) reported that the KH4 domain of the IMP1 protein can directly recognize m6A methylation. Although m6A does not significantly affect the dynamics of the IMP1-RNA association, it increases the lifespan of the protein-RNA complex by about eight times. Within the KH4 domain, the RNA backbone interacts with the conserved GxxG loop, and the nucleobase contacts the protein's hydrophobic groove. The adenine nucleobase is located on a hydrophobic platform, which assembles into a shallow "hydrophobic cradle" to accommodate the methyl group. The recognition of methylated RNA is likely driven by local interactions generated by the m6A methyl group.

Similar to IMP1, the recognition by FTO might also rely on hydrophobic interactions and the stereochemical complementarity to the m6A methyl group. As a hydrophobic moiety, the methyl group could interact with corresponding hydrophobic amino acid residues in FTO, enhancing the stability of the complex. Furthermore, FTO might recognize the m6A mark by recognizing specific three-dimensional structures of RNA. This could involve recognizing specific folding or flexible regions of the surrounding RNA sequence to accommodate the spatial requirements of methylated adenine.

Therefore, the hydrophobic amino acids in the active site of FTO might be more helpful in recognizing m6A methylation, achieving specific recognition and binding of methylated RNA through hydrophobic interactions, and recognition of specific three-dimensional structures of RNA.

<img src="https://p.ipic.vip/udh8ml.png" width="500" height="auto" align="middle" /> 

**References:**

Nicastro, G., Abis, G., Klein, P., Esteban-Serna, S., Gallagher, C., Chaves-Arquero, B., Cai, Y., Figueiredo, A. M., Martin, S. R., Patani, R., Taylor, I. A., & Ramos, A. (2023). Direct m6A recognition by IMP1 underlays an alternative model of target selection for non-canonical methyl-readers. Nucleic acids research, 51(16), 8774–8786. https://doi.org/10.1093/nar/gkad534

### 5. What amino acid residues in the FTO active site may form hydrogen bonds with the RNA sequence?

Polar amino acids, positively charged amino acids, and nitrogen-containing amino acids are typically involved in forming hydrogen bonds with RNA sequences. Due to their side chains containing functional groups that can act as donors or acceptors for hydrogen bonds, Polar amino acids are likely to form hydrogen bonds with RNA bases and the phosphate backbone. Common polar amino acids include Asp, Glu, Ser, Thr, and Tyr. Positively charged amino acids can interact with oppositely charged parts of RNA and may also form hydrogen bonds. For example, the side chains of Lys and Arg are positively charged. Additionally, nitrogen-containing amino acids like His have an imidazole ring in their side chains, which can form hydrogen bonds with nitrogen or oxygen atoms in RNA bases.

According to the study by Toh, J. D. W. et al. (2015), Tyr108, His231, Leu109, Val228, Arg96, and Glu234 are potential key residues of FTO.

<img src="https://p.ipic.vip/srpem6.png" width="500" height="auto" align="middle" /> 

The figure shows these key amino acids are very close to RNA. Analyzing each of these amino acids, we can find that Tyr108's side chain contains a hydroxyl group, making it capable of acting as a donor or acceptor of hydrogen bonds. His231 could form hydrogen bonds with the oxygen or nitrogen atoms in RNA bases through the nitrogen atoms in its side chain. The side chain of Arg96, which carries a positively charged guanidinium group, could form hydrogen bonds with the negatively charged parts of RNA bases. The carboxyl group at the end of Glu234's side chain can act as a hydrogen bond acceptor. All these amino acids could participate in forming hydrogen bonds with RNA, aiding in RNA recognition and stabilization.

However, the side chains of Leu109 and Val228 are hydrophobic, and they usually do not directly participate in hydrogen bond formation. Considering the discussed mechanisms for recognizing methyl groups, these two might form a hydrophobic pocket to assist in recognizing m6A methylation.

<img src="https://p.ipic.vip/fvye13.png" width="500" height="auto" align="middle" /> 

**References:**

Toh, J. D. W., Sun, L., Lau, L. Z. M., Tan, J., Low, J. J. A., Tang, C. W. Q., Cheong, E. J. Y., Tan, M. J. H., Chen, Y., Hong, W., Gao, Y. G., & Woon, E. C. Y. (2015). A strategy based on nucleotide specificity leads to a subfamily-selective and cell-active inhibitor of N6-methyladenosine demethylase FTO. Chemical science, 6(1), 112–122. https://doi.org/10.1039/c4sc02554g