## Notebook to cluster eels spectrum from multiple locations.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pycroscopy/DTMicroscope/blob/utk/notebooks/1_stem_eels_clustering_COLAB-Hackathon.ipynb)


## Server setup

In [None]:
!pip install -q pyro5
!pip install -q scifireaders
!pip install -q sidpy
!pip install -q pynsid
!pip install -q git+https://github.com/pycroscopy/DTMicroscope.git@utk

In [None]:
!run_server

## Client side starts

In [1]:
import matplotlib.pylab as plt
import numpy as np
import Pyro5.api
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

### 1. connect to server

In [None]:
# Connect to the microscope server
uri = "PYRO:microscope.server@localhost:9091"
mic_server = Pyro5.api.Proxy(uri)




### 2. Download and Register dataset

#### 2a. download dataset

In [None]:
# download dataset
!wget https://github.com/pycroscopy/DTMicroscope/raw/utk/data/STEM/SI/test_stem.h5

#### 2b. register dataset in the DigitalTwin

In [None]:
# Initialize microscope and register data
mic_server.initialize_microscope("STEM")
mic_server.register_data("test_stem.h5")

# Get overview image
array_list, shape, dtype = mic_server.get_overview_image()
im_array = np.array(array_list, dtype=dtype).reshape(shape)

# Display the overview image
plt.imshow(im_array)
plt.axis("off")
plt.title("Overview Image")
plt.show()


### 3. Get spectra from 100 locations

In [None]:

# Query spectra from 100 locations
spectra = []
locations = []
for x in range(10):
    for y in range(10):
        array_list, shape, dtype = mic_server.get_point_data("Channel_001", x, y)
        spectrum = np.array(array_list, dtype=dtype).reshape(shape)
        spectra.append(spectrum.flatten())  # Flatten each spectrum to make it 1D
        locations.append((x, y))

spectra = np.array(spectra)  # Convert list of spectra to a NumPy array




### 4. PCA over the data to reduce dimension

In [None]:
# Perform PCA to reduce to 2 dimensions
pca = PCA(n_components=2)
data_pca = pca.fit_transform(spectra)  # Now data_pca has shape (100, 2)


### 5.  Kmeans clustering with K=3

In [None]:

# Perform K-means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(data_pca)

# Plotting the PCA results with clusters
plt.figure(figsize=(8, 6))
plt.scatter(data_pca[:, 0], data_pca[:, 1], c=clusters, cmap='viridis', marker='o', s=50)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Data with K-means Clustering')
plt.colorbar(label='Cluster')
plt.show()

# Sample STEM problems

There are many ML problems in STEM. Here we present just a few of them:

- active learning for spectral optimization
- hyper-spectral image reconstructions (e.g., pan sharpening)
- modeling of structure-property relationships


# Hyper-spectral image reconstructions

You have the overview scan, <i>im_array</i> You also have a list of spectra, <i>spectra</i>

Can you predict the full hyperspectral dataset with this data? 

Many approaches. E.g., use PCA, predict PCA components -> simpler model
Choose another dimensionality reduction method
Or try to learn directly with a deep neural network, etc.



## Modeling of structure-property relationships

This is a somewhat related problem: can you predict the spectra based on the image patch, or, conversely, can you predict the image patch based on the spectra?



## Active learning in STEM

One significant challenge in STEM is that it is not always possible to acquire spectroscopy across a dense grid of points due to the beam causing sample damage. Even in cases where there is little damage, this method is still highly wasteful. Instead, it is useful to be able to adaptively sample to maximize some property of interest. This example shows how deep kernel learning can be used for this adaptive sampling/optimization.