# Week 5

## Objectives so far
* Code a graph generator for Unity
* Code force fields in Unity
* Code Lennard-Jones forces for polymer chain entanglement
* Code Brownian motion of polymer chains
* Research the steps of crystallisation of cellulose acetate or thermoplastic starch
* Read and update myself on the literature

## July 11

Today I return from New York so I will summarize the papers that I read on the bus and at campus.

Last time among the multiple things we were told we could check, we had [this](https://www.youtube.com/watch?v=yofjFQddwHE) video regarding Transfer Learning. I thought it would be interesting to give it a check so I went and watched it while taking notes on it.

>**Video:** Transfer Learning (C3W2L07)
>
>**Link:** [Video](https://www.youtube.com/watch?v=yofjFQddwHE)
>
>**Important Points:**
>* When you have a Deep Neural Network architecture, where we take some input $\textbf{x}$ and we get some output in a task A (let's call the output $\textbf{y}_1$; we can transfer some part of the DNN data to another task B by replacing the last layers of the network with ones that reconstruct the output $\textbf{y}_2$.
>* For meaningful results, Task A must have much more data than Task B.
>* Obviously, the features for Task A must be useful for the features of Task B.

In addition, Paloma sent a paper for ML simulation on coarse grain polymers.

>**Title:** Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design
>
>**Author(s):** Dahn Nguyen, Lei Tao, Ying Li
>
>**Link:** [PDF Article](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8819075/pdf/fchem-09-820417.pdf)
>
>**Important Points:**
>* *Context:* Recent years have seen the development of techniques to synthesize polymers. However, limitations in the design approaches and cost-time issues related to monomer sequence-structure-property has made it difficult to make much progress.
>* However, coarse-grained molecular dynamics (CGMD) simulations and machine learning (ML) algorithms have allowed establishing the structure-function correlation of polymer chains.
>* The chemical structure of monomers, as well as their arrangements, govern the properties of polymer from microstructures to physical and mechanical behaviors. For instance, conductivity, elasticity, rigidity or biodegradability can be calibrated by the sequence-defined polymers.
>* Since polymers are long-chain molecules, their size effect typically originates form their molecular weight than from the monomer size. Furthermore, the inter- and intra-molecular interactions beyween polymer chains can lead to very different microstructures, such as phase separation and crystallization; influencing thermal and mechanical properties dramatically.
>* Among many molecular dynamics' techniques, coarse-grained molecular dynamics (CGMD) rather than all-atom modeling can serve as an effective approach for reducing tremendously computational cost and complexity of chemical space while maintaining accuracy.
>* Common theories in polymer simulations include field-theoretic computer simulatio (FTCS), self-consistent-field-theory (SCFT)/density functional theory (DFT), dynamic mean-field theory (DMFT), integral equation polymer reference interaction site model (PRISM).
>* Of these theories, PRISM is particularly interesting. It describes the liquid-like structural correlations in single- and multi-component polymer melts, solutions, nanocomposites, and complex fluid systems. The theory uses "closure relations" to reflect pairwise interaction potentials acting between components. However, the methods has some limitations: it cannot be directly applied to materials different from the liquid-like system and can be extremely slow to converge.
>* ML tools in the field of polymers faces some limitations: **1.** Dataset availability **2.** Ease for interpretation and **3.** Transferability.
>* *Course-grained technique:* 
>    * **1.** Map the small groups of atoms from the all-atom (AA) simulation to the beads. The number of heavy atoms per bead represents the level of coarse-graining.
>    * **2.** The interactions between beads are defined by two approaches: bottom-up, where we adopt the AA simulation as reference to derive the force fields or interactions between CG beads; and top-down, where the force field of CG beads is tuned from macroscopic experimental observation. Most famous one: the MARTINI model, with 4-to-1 mapping. Another model is the mesoscopic particle-based model or dissipative particle dynamics (DPD).
>    * **3.** Choose a thermodynamics ensemble —it will depend on the purpose and the experimental conditions. The microcanonical statistical-mechanical ensemble or NVE is a common one.
>* The paper goes over some ML methods: Feed-Forward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Decision Tree (DT), Gaussian Process Regression (GPR), Generative Models (GANs and VAEs), Bayesian Optimization (BO) and Pareto Active Learning.
>* In the use of CGMD coupled ML models, which ML algorithm is efficiently used when it comes to sequential data? → LSTM-RNN (a method from NLP) could be useful.
>* The paper goes over multiple examples of ML coupled with CGMD. It finishes by raising some questions: how many CGMD data-points will be sufficient for the ML training process? How do we featurize molecules effectively for CGMD/ML coupling? Which topology to use?

Elena also provided some notes and information that was useful regarding Force Fields and Thermoplastic Starch, both of which are included as follows:

>**Title:** Force Fields Techniques
>
>**Author(s):** Elena Garza
>
>**Link:** [PDF](https://nbviewer.org/github/LouisTheLuis/MCSC-Summer-2022/blob/master/Summaries/Force%20Field%20Techniques.pdf)

>**Title:** Termoplastic Starch
>
>**Author(s):** Elena Garza
>
>**Link:** [PDF](https://nbviewer.org/github/LouisTheLuis/MCSC-Summer-2022/blob/master/Summaries/Thermoplastic%20Starch.pdf)

Neil also shared some Stanford lectures regarding Neural Network potentials. They are offered [here](https://cs371.stanford.edu/2018_slides/learning-energy-functions.pdf).

## July 12

### Meeting for Unity (11:20am - 7/12/2022)

For starters, we could just go to the [Unity website](https://unity.com/), download the most recent version of Unity Hub. We start a project (3D).

Check ML Agents, and Unity Barracuda. There is a tutorial online for using Unity ML agents [here](https://www.gocoder.one/blog/3d-volleyball-environment-with-unity-ml-agents). We can also use [this](https://learnxinyminutes.com/) for checking C# real quick. Unity itself has tutorials online and in YouTube.

## July 13
At the end, we did not have a meeting. I spent the entire day essentially practicing with Unity.

## July 14

>**Title:** Modeling and Simulations of Polymers: A Roadmap
>
>**Author(s):** Thomas E. Gartner, III and Arthi Jayaraman
>
>**Link:** [PDF Article](https://pubs.acs.org/doi/pdf/10.1021/acs.macromol.8b01836)
>
>**Important Points:**
>* *Which model to use?* → If you want to understand the local, monomer-level arrangements, fluctuations and interactions within a polymer system, then atomistic models. If you want to predict structure/morphology for polymeric systems at a broad range of conditions, then coarse-grained (CG) models.
>* One may choose to alter the strength of the pairwise Lennard-Jones interactions and/or create new dihedral interactions between the standard MARTINI beads as one pleases (to agree with experimental observations).
>* *How can the data be simulated?* → The main two simulation methods for polymers are Monte Carlo (MC) and molecular dynamics (MD) —a couple of citations give references to these.
>* When making a simulation you:
>    * **1.** Choose a thermodynamic ensemble (e.g. NVE ensemble)
>    * **2.** Choose size of the system (i.e. number of atoms or CG beads and/or inital simulation box size).
>    * **3.** Equilibration.
>    * **4.** Sampling.

### Meeting (2:00pm - 7/14/2022)

*Neil* → Suppose we have a graph. To turn this into embeddings we have to run GNN on these graphs, for then put them into RL algorithms. What would be the objective function of the GNN? How would this architecture even implement backpropagation from RL and GNN?

*Paloma* → Separate into two different trainings: one that trains on the graph, and another one that trains one the coarse-grained. According to Greg, CG simulations are almost lines; they are **very** simplified. We could train a polymer chain and make simulations that we could use later (*pre-training*).

*Andy* → MARTINI model, they group all monomers on beads with certain groups (polar, nonpolar, etc.). The manual grouping of monomers on beads will be very limiting and it could not contain the information we want.

We could set up an Actor vs. Discriminator ML system: Graph → *GNN* → CG → *RL* → Results.

What information is on each bead? Maybe some force fields, weight and kinematics of the joints. What will be the objective function of the RL part? Maybe bond angles, Leonard-Jones Forces, Brownian Motion. 

*Jehan* → They will compile some data in an Excel sheet to validate our model. She will do some research on the radius of gyration for cellulose acetate (preferably by next Thursday).

There will be a meeting with Greg tomorrow Friday. We will present to him.

The point is to create data synthetically. The GNN is data-driven, however; so we will need to use PolyInfo. 

## July 15

>**Title:** Coarse grained force field for the molecular simulation of natural gases and condensates
>
>**Author(s):** Carmelo Herdes, Tim S. Totton, Erich A. Müller
>
>**Link:** [PDF Article](https://reader.elsevier.com/reader/sd/pii/S0378381215300297?token=144878D7CAAB8AB8B4BDEAB7A4B898F8363525D03264022E867F9C2A891C1E0C1219E10A566F0F335AAC2B0AA5C91E8A&originRegion=us-east-1&originCreation=20220715145735)
>
>**Important Points:**
>* Typical crude oil will be made of thousands of different chemical species, of similar chemical nature but varying in molecular size, morphology, and thermophysical behaviour. 
>* It just isn't possible to model these complex systems by taking all distinct molecules in the system into account. Thus, two schemes have become mainstream tools to model these mixtures: either the description as a continuum distribution, or the description as a discrete but finite set of pseudo-components.
>* However, classical molecular simulations have risen as a modern approach to study thermophysical properties of fluid mixtures. 
>* The approach made by these researchers consists on employing a molecular-based Equation of State (EoS) to parametrize a force field that can be employed in molecular simulations.
>* The Statistical Associating Fluid Theory (SAFT) is a perturbation theory usef to describe quantitatively the volumetric properties of fluids. There are many versions, all of which differ on the underlying intermolecular potential employed to describe the unbounded particles. The current work focuses on SAFT-VR Mie → SAFT-γ, which describes the macroscopical properties of the Mie potential of the form $\phi(r) = C\epsilon[(\frac{\sigma}{r})^{\lambda_r}-(\frac{\sigma}{r})^{\lambda_a}]$ 
>* In the CG application of the SAFT models one considers spherical elements that corresponds to a chemical moiety comprised of several atoms. These coarse-grained models do not provide information on the intramolecular interactions, as these are all averaged out during the fitting procedure.
>* The SAFT CG simulations are able to predict the phase behaviour of light crude oil mixtures and allow the simulation of reasonably large systems. This is enough to observe complex dynamics like cluster formation and phase segregation.


### Meeting with Greg

[Link to the Presentation](https://docs.google.com/presentation/d/107dPCTQEh7cfxZG0St0OPPqGGva6eeTglwci0ZXWHXk/edit#slide=id.p)

*Question raised by Greg:* Does this ML model output the average radius of gyration? **Yes, we can train it so that this is the case.**

*Question raised by Greg:* How do we account for all variations of conformers (that vary by thermal properties and torsion angles)?

*Conclusion:* Energy is more important than the radius of gyration in regards to the objective.

*Goal:* Get that conformer in Unity. Get a complete list of molecular dynamics' forces at play for a molecular conformer. Take a look a coarse-grained simulations (enough detail to get information and enough to work with).

Start with Unity on Monday.

>**Title:** The MARTINI Force Field:  Coarse Grained Model for Biomolecular Simulations
>
>**Author(s):** Siewert J. Marrink, H. Jelger Risselada, Serge Yefimov, D. Peter Tieleman, and Alex H. de Vries
>
>**Link:** [PDF Article](https://pubs.acs.org/doi/pdf/10.1021/jp071097f)
>
>**Important Points:**
>* The idea behind MARTINI is to aim for a broader range of applications without the need to reparametrize the model each time instead of focus on an accurate reproduction in a particular context.
>* The MARTINI force field model improved previous problems:
>    * The sponteanous curvature of coarse-grained phospholipids being too negative.
>    * Using a CG water model which a tendency to freeze too easily.
>    * Too coarsened definition of interaction energy levels, making mapping of CG to real compounds very difficult.
>
>  by providing:
>    * more interaction energy levels and particle types
>    * performing a thorough analysis of partition free energies linked to chemical functional groups
>* Coined *MARTINI*, after the nickname for the city of Groningen where the force field was developed.
>* **Model**:
>    * **1.** Interaction Sites: 4-to-1 mapping (i.e. on average four heavy atoms are mapped to a single center; different for ring structures). Only consider 4 types of interaction sites: polar (P), nonpolar (N), apolar (C), and charged (Q). Each types has several subtypes; by hydrogen-bonding capabilities or degree of polarity.
>    * **2.** Nonbonded Interactions: Shifted Lennard-Jones (LJ) 12-6 potential energy functions is used to describe the nonbonded interactions $$U_{LJ}(r) = 4\epsilon_{ij}[(\frac{\sigma_{ij}}{r})^{12}-(\frac{\sigma_{ij}}{r})^6]$$ where $\sigma_{ij}$ represents the closest distance of approach between two particles and $\epsilon_{ij}$ the strength of their interaction. In addition, charged groups (type Q) have the Coulombic potential energy function $$U_{el}(r) = \frac{q_iq_j}{4\pi\epsilon_0\epsilon_rr}$$
>    * **3.** Bonded Interactions: Described by a weak harmonic potential $$V_{bond}(R) = \frac{1}{2}K_{bond}(R-R_{bond})^2$$
>    * **4.** Ring Particles: Includes as many CG sites as necessary in order to keep the ring geometry, typically resulting in a 2 or 3 to 1 mapping of ring atoms onto CG beads.
>    * **5.** Antifreeze Particles: An antifreeze agent is used to prevent the unwanted freezing of the CG water.  They interact as a special particle type denoted $BP_4$.
>    * **6.** Simulation Parameters: The simulations were performed with the GROMACS simulation package version 3.0. 
>    * **7.** Interpretation of Time Scale: In comparison to atomistic models, the dynamics observed with CG models is faster as the underlying energy landscape is much smoother as a result of larger particle sizes.
>    * **8.** Topologies: Solvents, Ions, Phospholipids, and Cholesterol are modelled in particular configurations.
>* *Results:* improved behavior of lipid bilayrs in terms of the stress profile across the bilayer and its tendency to form pores. Accurate agreement with all atom simulations for free energy of lipid desorption and (to a lesser extent) flip-flopping across the bilayer.
>* *Limitations:* The model has been parametrized for the fluid phase, and thus, properties of solids such as crystal packing are not expected to be accurate. The thermodynamic behavior of solid/fluid and gas/fluid is problematic too. The inherent entropu loss on CG is compensated by a reduced enthalpy term.