From 30ca3fc3f1eae822cca6abaa821cc37216d0ab8e Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 09:23:26 +0800 Subject: [PATCH 1/7] =?UTF-8?q?Create=20OpenLAM=EF=BD=9CVisualization=20an?= =?UTF-8?q?d=20Analysis=20of=20Learned=20Representations=20in=20DPA-2:=20E?= =?UTF-8?q?ncoding=20Chemical=20and=20Configurational=20Information.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ... DPA-2: Encoding Chemical and Configurational Information.md" | 1 + 1 file changed, 1 insertion(+) create mode 100644 "source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" diff --git "a/source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" "b/source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" new file mode 100644 index 00000000..8b137891 --- /dev/null +++ "b/source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" @@ -0,0 +1 @@ + From 309bb810de57a13d34cc4ac7ed1143e32eb44719 Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 09:23:52 +0800 Subject: [PATCH 2/7] =?UTF-8?q?Rename=20OpenLAM=EF=BD=9CVisualization=20an?= =?UTF-8?q?d=20Analysis=20of=20Learned=20Representations=20in=20DPA-2:=20E?= =?UTF-8?q?ncoding=20Chemical=20and=20Configurational=20Information.md=20t?= =?UTF-8?q?o=20OpenLAM-Visualization=20and=20Analysis=20of=20Learned=20Rep?= =?UTF-8?q?resentations=20in=20DPA-2:=20Encoding=20Chemical=20and=20Config?= =?UTF-8?q?urational=20Information.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...in DPA-2: Encoding Chemical and Configurational Information.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename "source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" => source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md (100%) diff --git "a/source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md similarity index 100% rename from "source/_posts/OpenLAM\357\275\234Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md" rename to source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md From 9843d6b916729b7a740d6b39b87015f88e7203c1 Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 09:27:08 +0800 Subject: [PATCH 3/7] Update OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md --- ...ng Chemical and Configurational Information.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md index 8b137891..214df693 100644 --- a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md +++ b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md @@ -1 +1,16 @@ +--- +title: "OpenLAM | Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Informationt" +date: 2024-03-14 +categories: +- OpenLAM +--- +The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new infrastructure for microscale scientific research and drive the transformation of microscale industrial design in fields such as materials, energy, and biopharmaceuticals by establishing an open-source ecosystem around large microscale models. Relevant models, data, and workflows will be consolidated around the AIS Square; related software development will take place in the DeepModeling open-source community. At the same time, we welcome open interaction from different communities in model development, data sharing, evaluation, and testing. + +See [AIS Square](https://www.aissquare.com/openlam) for more details. + + +## Model Structure + +- The DPA-2 model structure (PyTorch based) has been released, showing a significant increase in fitting and transferability compared to the DPA-1 ([arxiv:2312.15492](https://arxiv.org/abs/2312.15492)). +- A new capability for unsupervised denoise pretraining has been added ([DOI:10.5281/zenodo.10483908](https://doi.org/10.5281/zenodo.10483908)). From 3eb03a81ac5dd6bbaf3f484feeeb8cba700caefe Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 09:28:58 +0800 Subject: [PATCH 4/7] Update OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md --- ...n DPA-2: Encoding Chemical and Configurational Information.md | 1 - 1 file changed, 1 deletion(-) diff --git a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md index 214df693..ac8c6826 100644 --- a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md +++ b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md @@ -1,4 +1,3 @@ - --- title: "OpenLAM | Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Informationt" date: 2024-03-14 From 260d01d9fdd2fb4c5bd4e9be6f1d40c235a20849 Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 09:32:35 +0800 Subject: [PATCH 5/7] Update OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md --- ... Chemical and Configurational Information.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md index ac8c6826..0f2a9e4f 100644 --- a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md +++ b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md @@ -9,7 +9,18 @@ The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new See [AIS Square](https://www.aissquare.com/openlam) for more details. -## Model Structure +Recently, we reveal a remarkable correspondence between the learned representations by DPA-2 and existing chemical knowledge and the periodic table. And the DPA-2 representation effectively distinguishes between various chemical and configurational environments, atoms sharing similar chemical and configurational environments are closer in the representation space learned by the DPA-2 model. It underscores the potential of the proposed model architecture and the multi-task training scheme. + +![image](https://github.com/Chengqian-Zhang/blog/assets/100290172/c6feeb3b-1c91-4986-88f4-a7340e09e162) + +We present a visualization of the update of single-atom representations by the final repformer layer using a 2-dimensional t-SNE plot, as depicted in Fig.4. In Fig.4(a), colors denote distinct groups in the periodic table, as annotated in Fig.4(b). Notably, Fig.4(a) reveals that representations of identical chemical species tend to form cohesive clusters in the t-SNE latent space. The distribution of these representations distinctly aligns with known chemistry: The elements in groups IA and IIA are clustered at the top right of the t-SNE plot; The non-metals cluster predominantly at the top left and bottom; The transition metals, typically positioned at the middle of the periodic table, are accordingly situated in the central region of the t-SNE figure. However, hydrogen (H) presents an exception, exhibiting two clusters: one aligned with metals, primarily in water datasets, and another near non-metals, particularly in molecular datasets such as Drug, ANI-1x, and Transition-1x. + +Elements such as Copper (Cu), Silver (Ag), and Gold (Au) in group IB exhibit a tendency to cluster closer to Lithium (Li) than other transition metals due to their shared possession of one s-electron in the outermost electron shell. Similarly, representations of group IIA elements like Calcium (Ca) and Strontium (Sr) closely associate with those of group IIB elements such as Zinc (Zn) and Cadmium (Cd) owing to their shared possession of two s-electrons in the outermost electron shell. Additionally, there's a discernible trend for elements from the same group in the periodic table to cluster together, as evident with Phosphorus (P), Arsenic (As), and Antimony (Sb) from group VII, and Selenium (Se) and Tellurium (Te) from group VIII. + +The DPA-2 representation effectively distinguishes between various chemical and configurational environments, as showcased in Fig.4(c-e). In Fig.4(c), representations of Aluminum (Al) atoms from the Alloy and OC2M datasets are depicted. The color gradient from purple to yellow indicates the distance of the Al atom from the closest adsorbate in the OC2M dataset, while Al atoms from the Alloy dataset (all-metal environment) are colored red. Notably, Al atoms distanced from adsorbates closely resemble those in the Alloy dataset, indicative of similar chemical and configurational environments, whereas those in proximity to adsorbates exhibit discernible differences (see the red-circled blue cluster). Similarly, Fig.4(d) illustrates representations of Carbon (C) atoms in the Drug and OC2M datasets. Carbon atoms in adsorbates closer to catalyst materials are positioned farther away in latent space from representations in the Drug dataset due to more pronounced differences in their chemical and configurational environments. + +Moreover, the DPA-2 representation shows insensitivity to DFT labeling accuracy. As demonstrated in Fig.4(e), representations of sulfur (S) in SSE-PBE (labeled with PBE exchange correlation functional) and SSE-PBESol (labeled with PBE-Sol exchange correlation functional) datasets exhibit mutual overlap. The S atoms form two clusters, with one cluster indicating a phosphorus neighboring atom and the other representing a neighboring Si/Ge/Sn atom. + +In summary, our analysis reveals that atoms sharing similar chemical and configurational environments are closer in the representation space learned by the DPA-2 model. Thus, the DPA-2 representation emerges as a promising candidate for encoding chemical and configurational information in molecular and condensed-phase applications. + -- The DPA-2 model structure (PyTorch based) has been released, showing a significant increase in fitting and transferability compared to the DPA-1 ([arxiv:2312.15492](https://arxiv.org/abs/2312.15492)). -- A new capability for unsupervised denoise pretraining has been added ([DOI:10.5281/zenodo.10483908](https://doi.org/10.5281/zenodo.10483908)). From 716b2ac8702219fbed801d1bcd46cc33563983be Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 14:56:45 +0800 Subject: [PATCH 6/7] Update OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md --- ... DPA-2: Encoding Chemical and Configurational Information.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md index 0f2a9e4f..f597271d 100644 --- a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md +++ b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md @@ -1,5 +1,5 @@ --- -title: "OpenLAM | Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Informationt" +title: "OpenLAM | Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information" date: 2024-03-14 categories: - OpenLAM From fe141bff8616475a4c888e772aba449073ce9ea9 Mon Sep 17 00:00:00 2001 From: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com> Date: Fri, 15 Mar 2024 14:58:54 +0800 Subject: [PATCH 7/7] Update OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md --- ...PA-2: Encoding Chemical and Configurational Information.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md index f597271d..5de7fa63 100644 --- a/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md +++ b/source/_posts/OpenLAM-Visualization and Analysis of Learned Representations in DPA-2: Encoding Chemical and Configurational Information.md @@ -9,11 +9,11 @@ The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new See [AIS Square](https://www.aissquare.com/openlam) for more details. -Recently, we reveal a remarkable correspondence between the learned representations by DPA-2 and existing chemical knowledge and the periodic table. And the DPA-2 representation effectively distinguishes between various chemical and configurational environments, atoms sharing similar chemical and configurational environments are closer in the representation space learned by the DPA-2 model. It underscores the potential of the proposed model architecture and the multi-task training scheme. +Recently, we revealed a remarkable correspondence between the learned representations by DPA-2 and existing chemical knowledge and the periodic table. The DPA-2 representation effectively distinguishes between various chemical and configurational environments, atoms sharing similar chemical and configurational environments are closer in the representation space learned by the DPA-2 model. It underscores the potential of the proposed model architecture and the multi-task training scheme. ![image](https://github.com/Chengqian-Zhang/blog/assets/100290172/c6feeb3b-1c91-4986-88f4-a7340e09e162) -We present a visualization of the update of single-atom representations by the final repformer layer using a 2-dimensional t-SNE plot, as depicted in Fig.4. In Fig.4(a), colors denote distinct groups in the periodic table, as annotated in Fig.4(b). Notably, Fig.4(a) reveals that representations of identical chemical species tend to form cohesive clusters in the t-SNE latent space. The distribution of these representations distinctly aligns with known chemistry: The elements in groups IA and IIA are clustered at the top right of the t-SNE plot; The non-metals cluster predominantly at the top left and bottom; The transition metals, typically positioned at the middle of the periodic table, are accordingly situated in the central region of the t-SNE figure. However, hydrogen (H) presents an exception, exhibiting two clusters: one aligned with metals, primarily in water datasets, and another near non-metals, particularly in molecular datasets such as Drug, ANI-1x, and Transition-1x. +We present a visualization of single-atom representations using a 2-dimensional t-SNE plot, as depicted in Fig.4. In Fig.4(a), colors denote distinct groups in the periodic table, as annotated in Fig.4(b). Notably, Fig.4(a) reveals that representations of identical chemical species tend to form cohesive clusters in the t-SNE latent space. The distribution of these representations distinctly aligns with known chemistry: The elements in groups IA and IIA are clustered at the top right of the t-SNE plot; The non-metals cluster predominantly at the top left and bottom; The transition metals, typically positioned at the middle of the periodic table, are accordingly situated in the central region of the t-SNE figure. However, hydrogen (H) presents an exception, exhibiting two clusters: one aligned with metals, primarily in water datasets, and another near non-metals, particularly in molecular datasets such as Drug, ANI-1x, and Transition-1x. Elements such as Copper (Cu), Silver (Ag), and Gold (Au) in group IB exhibit a tendency to cluster closer to Lithium (Li) than other transition metals due to their shared possession of one s-electron in the outermost electron shell. Similarly, representations of group IIA elements like Calcium (Ca) and Strontium (Sr) closely associate with those of group IIB elements such as Zinc (Zn) and Cadmium (Cd) owing to their shared possession of two s-electrons in the outermost electron shell. Additionally, there's a discernible trend for elements from the same group in the periodic table to cluster together, as evident with Phosphorus (P), Arsenic (As), and Antimony (Sb) from group VII, and Selenium (Se) and Tellurium (Te) from group VIII.