In [None]:
!pip install numpy pandas scikit-learn plotly ipywidgets ipython openpyxl nbformat

In this notebook, you will explore how **Principal Component Analysis (PCA)** can be applied to chemical data — specifically, to element properties and binary compounds. The goal is to understand how elemental features influence chemical similarity and structure-type clustering.

First, we have to start with uploading the data that will be used in the program:

In [None]:
filepath = "data/elemental-property-list.xlsx"  # filepath to the elemental properties 

binary_data = "data/pauling.xlsx" # filepath to binary data for visualization 

**PCA Periodic Table of the Elements**
   - You’ll start by visualizing 80 elements using 74 numerical features (atomic, electronic, thermal, and DFT-derived).
   - These features are reduced to a 2D space using PCA, and the elements are projected into this new “map” of chemical space.
   - By selecting or deselecting features or feature groups, you can explore how the structure of the PCA space changes.
   - This section helps you answer: *How do element properties influence their relative positions in feature space?*

In [None]:
from pca.pca_table import run_pca_analysis

run_pca_analysis(filepath)

**Visualization of Binary Compounds**
- You’ll use historical compound data (from Pauling’s 1929 work) and plot equiatomic binary compounds in the same PCA space.
- Each compound is shown as a line connecting its two constituent elements, with a midpoint marker.
- Compounds are colored by their known crystal structure types (CsCl, NaCl, ZnS).
- You can test how different subsets of features affect the clustering or separation of these structures.

In [None]:
from pca.pca_binary import run_pca_analysis_structures

run_pca_analysis_structures(filepath, binary_data)