## 2. Unsupervised Learning in the Ising Model
In this problem you will use principal component analysis (PCA) to identify phases in the Ising
model without any explicit labels. You will reproduce Figures 1 and 2 of L. Wang, Phys. Rev. B 94,
195105 (2016), https://arxiv.org/abs/1606.00318.

As in problem 1, the data is stored in an $M\times N$ matrix $X$, where each of the $M$ rows stores a spin configuration (a set of integers $\pm1$ corresponding to up/down) for a system with $N = L^2$ spins. The data files corresponding to $L = 20, 40, 80$ are included in the data directory of the course repository as compressed files: Ising2D_config_LZ.dat.gz where $Z=20,40,80$.

Each file contains 100 spin configurations at each of the 20 temperatures $T/ J = 1.0, 1.1, 1.2, . . . , 2.9$ such that $M = 2000$ for each lattice size.

For each $L$, there is a corresponding file storing the temperature at which the configuration was generated named Ising2D_temps_LZ.dat where $Z = 20,40,80$.

(a) Read in the spin configurations for the Ising model for each lattice size and determines the principal components $\mathbf{v}_j$. Make a scatter plot of the first two projected principal components $\mathbf{x'}_1=\mathbf{X}\mathbf{v}_1$ vs. $\mathbf{x'}_2=\mathbf{X}\mathbf{v}_2$ for each lattice size. Do you observe any trends as L is increased?

(b) Label the points in your plot such that they are coloured according to their temperature and
compare with Figure 2 of the Wang reference. Can you distinguish between the phases of the
2D Ising model?

(c) Consider now the explained variance ratios

$r_l=\frac{\lambda_l}{\sum_{i=1}^N \lambda_i}$

Plot the largest 10 values of $r_l$ for each lattice size and compare with Figure 1 of the reference.
How many principal components are needed to explain how the Ising spin configurations vary
as a function of temperature?


In [15]:
# %load ./include/header.py
import numpy as np
import matplotlib.pyplot as plt
import sys
from tqdm import trange,tqdm
sys.path.append('./include')
import ml4s
import scipy.linalg
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.style.use('./include/notebook.mplstyle')
np.set_printoptions(linewidth=120)
ml4s._set_css_style('./include/bootstrap.css')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

In [13]:
x_20 = np.loadtxt('data/Ising2D_config_L20.dat.gz')
Tval_20 = np.loadtxt('data/Ising2D_temps_L20.dat')
x_40 = np.loadtxt('data/Ising2D_config_L40.dat.gz')
Tval_40 = np.loadtxt('data/Ising2D_temps_L40.dat')
x_80 = np.loadtxt('data/Ising2D_config_L80.dat.gz')
Tval_80 = np.loadtxt('data/Ising2D_temps_L80.dat')

In [16]:
N=x_20.shape[0]
x=x_20
x-=np.average(x_20,axis=0)
Σ=x.T@x/(N-1)
λ,V=scipy.linalg.eigh(Σ)

In [17]:
λ=λ[::-1]
V=np.flip(V,axis=1)

print(f'λ = {λ}')
print(f'V = {V}')

λ = [2.58274436e+02 4.69206606e+00 4.51021730e+00 4.18847372e+00 4.13060521e+00 3.05778616e+00 2.85374394e+00
 2.67508582e+00 2.50100969e+00 1.99978242e+00 1.84387502e+00 1.80076694e+00 1.66712603e+00 1.59763353e+00
 1.54666029e+00 1.52853851e+00 1.47690542e+00 1.44244203e+00 1.33720897e+00 1.29948334e+00 1.26464313e+00
 1.17034262e+00 1.11898009e+00 1.08893998e+00 1.06201925e+00 1.04056254e+00 1.00177710e+00 9.77500811e-01
 9.68666168e-01 9.27666790e-01 9.22959907e-01 8.86954986e-01 8.63661347e-01 8.34854492e-01 8.13157040e-01
 7.87637059e-01 7.78521371e-01 7.54220058e-01 7.41690141e-01 7.22474195e-01 7.18026134e-01 7.01393279e-01
 6.94927915e-01 6.82943083e-01 6.79263053e-01 6.57316512e-01 6.43756343e-01 6.34939640e-01 6.25050780e-01
 6.24163541e-01 6.16451499e-01 6.09751637e-01 5.97839126e-01 5.82955741e-01 5.81948201e-01 5.76171462e-01
 5.74155686e-01 5.66129033e-01 5.56956326e-01 5.44847763e-01 5.42206989e-01 5.31874370e-01 5.25061130e-01
 5.20790277e-01 5.11968413e-01 5.10632407e