## Using UMAP with Python
In Python, UMAP analysis and visualization can be performed using the ``UMAP()`` function from ``umap-learn`` package.

In [None]:
#!pip install umap-learn

Here, we will use the example of scRNA-seq dataset for visualizing the hidden biological clusters using UMAP. The scRNA-seq dataset of __Arabidopsis thaliana__ root cells processed by 10x genomics.

This scRNA-seq dataset contains **4406 cells** with **~75K sequence reads** per cells. 

This dataset is pre-processed we will only use 2000 highly variable genes (variables or features) for UMAP cluster visualization.

Now, import the pre-processed scRNA-seq data as pandas DataFrame:

In [None]:
import pandas as pd

df = pd.read_csv("ath_sc_expression.csv")
df.head()

In [None]:

# set first column as index
df = df.set_index('cells')

# check the dimension (rows, columns)
df.shape

This scRNA-seq dataset consists of 4406 cells and 2000 genes. This high-dimensional data (2000 gene features) could not be visualized using a scatter plot.

Hence, we will use the UMAP() function to reduce the high-dimensional data (2000 features) to 2-dimensional data. By default, UMAP reduces the high-dimensional data to 2-dimension.

**Note**: As UMAP is a stochastic algorithm, it may produce slightly different results if run multiple times. To reproduce similar results, you can use the ``random_state`` parameter in ``UMAP()`` function.

In [None]:
import umap

embedding = umap.UMAP(random_state=42).fit_transform(df.values)
embedding.shape

The resulting embedding has 2-dimensions (instead of 2000) and 4406 samples (cells). 

Each observation (row) of the reduced data (embedding) represents the corresponding high-dimensional data.

Now, visualize the UMAP clusters as a scatter plot:

In [None]:
import matplotlib.pyplot as plt

plt.scatter(embedding[:, 0], embedding[:, 1])
plt.title('UMAP Dimensionality Reduction', fontsize=12)
plt.xlabel('UMAP1')
plt.ylabel('UMAP2')
plt.show()

The above scatter plot suggests that the UMAP method was able to identify the structure of the high-dimensional data in low-dimensional space.

### Exercise: Can you now test it against previous example in Churn Dataset? :)