#Dimensionality Reduction and Clustering
## Nomad Cities around the World

In this tutorial we are going to play with dimensionality reduction and clustering using a nomadlist dataset. The data describes 780 cities around the world and includes variables interesting for nomads traveling to these destinations.
Features include: internet speed, cost variables and socio-political indicators.

### Reading data and libraries

In [None]:
!pip install umap-learn -q

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt
import umap
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA, NMF
from sklearn.cluster import KMeans

sns.set()

In [None]:
# reading in the data
data = pd.read_csv('https://sds-aau.github.io/SDS-master/M1/data/cities.csv')

For convenience, we already included some easier to work with geo-variables (e.g. country-code, region and sub-region)

In [None]:
data.head()

In [None]:
data.info()

### Preprocessing for UML

Typical pre-processing steps for UML include different forms of scaling. This is similar to supervised approaches.
- Standard scaling: Data will have a mean of 0 and σ of 1
- Min-max scaling: Features are scaled to a range, typically 0, 1

In [None]:
# We select only numerical features from the dataframe
# naming is in anticipation of future clustering
data_to_cluster = data.iloc[:,4:]

In [None]:
# import and instantiate scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [None]:
# learn x-y relationships (principal components) and transform
data_to_cluster_scaled = scaler.fit_transform(data_to_cluster)

In [None]:
# very similar syntax for min-max scaling
from sklearn.preprocessing import MinMaxScaler
scaler_min_max = MinMaxScaler()

In [None]:
data_to_cluster_minmax = scaler_min_max.fit_transform(data_to_cluster)

In [None]:
data_to_cluster

#### Let's check how our data look pre/post scaling

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

# nomad-cost (pre-scaling)
sns.kdeplot(data=data_to_cluster, x="cost_nomad", ax=axes[0, 0])
axes[0, 0].set_title("Nomad Cost (pre-scaling)")

# coffee (pre-scaling)
sns.kdeplot(data=data_to_cluster, x="coffee_in_cafe", ax=axes[0, 1])
axes[0, 1].set_title("Coffee in Cafe (pre-scaling)")

# convert scaled data to dataframe
scaled_df = pd.DataFrame(data_to_cluster_scaled, columns=data_to_cluster.columns)

# nomad-cost (post-scaling)
sns.kdeplot(data=scaled_df, x="cost_nomad", ax=axes[1, 0])
axes[1, 0].set_title("Nomad Cost (post-scaling)")

# coffee (post-scaling)
sns.kdeplot(data=scaled_df, x="coffee_in_cafe", ax=axes[1, 1])
axes[1, 1].set_title("Coffee in Cafe (post-scaling)")

plt.tight_layout()
plt.show()


### Dimensionality reduction with PCA

PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. (Source: Wikipedia)

In [None]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Stitch Fix is using something called eigenvector decomposition, a concept from quantum mechanics, to tease apart the overlapping “notes” in an individual’s style. Using physics, the team can better understand the complexities of the clients’ style minds. <a href="https://t.co/iULGyYsd5c">https://t.co/iULGyYsd5c</a></p>&mdash; WIRED (@WIRED) <a href="https://twitter.com/WIRED/status/1181437300414275584?ref_src=twsrc%5Etfw">October 8, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

For a deep dive into PCA, please consider [this chapter](https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html)

Principal Component Analysis (PCA) primarily focuses on identifying and quantifying the underlying patterns in the data. It seeks to explain variance-covariance (or correlation) structures from linear combinations of the initial variables.

Mathematically, for a data matrix $X$ (where rows are observations and columns are features), the main steps in PCA are:
1. Standardize the data (subtract the mean, devide by standard deviation).
2. Calculate the covariance matrix.
3. Calculate the eigenvectors and eigenvalues of the covariance matrix.
4. Sort the eigenvectors by decreasing eigenvalues and choose the first $k$ eigenvectors to form a matrix $W$ of dimensions $(features \times k)$.
5. Project the data onto $W$ to get the principal components.

Given that, let's compute the covariance matrix for our scaled data and find its eigenvalues and eigenvectors:


In [None]:
# Sample 3D data
sim_data = np.array([[2.5, 2.4, 2.1], [0.5, 0.7, 0.2], [2.2, 2.9, 2.3], [1.9, 2.2, 1.8], [3.1, 3.0, 2.8]])
sim_data.shape

In [None]:
# 1. Standardize the dataset (mean = 0, variance = 1)
X_mean = np.mean(sim_data, axis=0)
X_std = np.std(sim_data, axis=0)
X_normalized = (sim_data - X_mean) / X_std

In [None]:
# 2. Compute covariance matrix of the standardized dataset
covariance_matrix = np.cov(X_normalized.T)

### Interlude: Covariance matrix from scratch

Suppose we have two variables, $X$ and $Y$, with the following data points:

$$ X = [2, 4, 6] $$
$$ Y = [3, 6, 9] $$

### Step-by-step Calculation:

1. **Compute the mean of each variable:**
   $$ \mu_X = \frac{\sum{X}}{n} $$
   $$ \mu_Y = \frac{\sum{Y}}{n} $$
  
2. **Compute the products of the differences from the mean for each data point:**
   $$ (x_i - \mu_X)(y_i - \mu_Y) $$

3. **Compute the covariance for each pair of variables:**
   $$ \text{Cov}(X, Y) = \frac{1}{n-1} \sum{(x_i - \mu_X)(y_i - \mu_Y)} $$

4. **Populate the covariance matrix:**
   The covariance matrix for variables $X$ and $Y$ is:
   
   $$
   \begin{bmatrix}
   \text{Var}(X) & \text{Cov}(X,Y) \\
   \text{Cov}(Y,X) & \text{Var}(Y) \\
   \end{bmatrix}
   $$

   Where:
   - $\text{Var}(X)$ is the variance of $X$ = $\text{Cov}(X, X)$
   - $\text{Var}(Y)$ is the variance of $Y$ = $\text{Cov}(Y, Y)$


In [None]:
## INTERLUDE - covmatrix by hand

# Given data points
X = np.array([2, 4, 6])
Y = np.array([3, 6, 9])

# Means of X and Y
mu_X = np.mean(X)
mu_Y = np.mean(Y)

# Compute Covariance
cov_XY = np.sum((X - mu_X) * (Y - mu_Y)) / (len(X) - 1)
var_X = np.sum((X - mu_X)**2) / (len(X) - 1)
var_Y = np.sum((Y - mu_Y)**2) / (len(Y) - 1)

# Covariance Matrix
cov_matrix = np.array([[var_X, cov_XY], [cov_XY, var_Y]])
print(cov_matrix)


In [None]:
# check...if we did OK ## INTERLUDE OVER
np.cov(np.array([[2,4,6],[3,6,9]]))

In [None]:
# 3. Obtain the eigenvectors and eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

In [None]:
# Sort by eigenvalue in descending order
sorted_idx = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_idx]

In [None]:
# 4. Reduce dimensions by selecting top 'num_components' eigenvectors
num_components = 2
reduced_eigenvectors = sorted_eigenvectors[:, :num_components]

# Dot-product or Matrix Multiplication

Matrix multiplication, also known as the dot product, is an operation that takes two matrices, A and B, and produces another matrix, C. The number of columns in matrix A must be equal to the number of rows in matrix B to be able to multiply them. The resulting matrix C has a size determined by the number of rows in matrix A and the number of columns in matrix B.

For a clearer and interactive visualization on how each element of the resulting matrix is computed, visit [this link](http://matrixmultiplication.xyz/).


In [None]:
# 5. Transform the dataset to the new subspace
reduced_data_scratch = X_normalized.dot(reduced_eigenvectors)

In [None]:
reduced_data_scratch

In [None]:
# OR if using Sklearn

# 1. Standardize the dataset
scaler = StandardScaler()
sim_data_standardized = scaler.fit_transform(sim_data)

# 2. Create a PCA instance and fit
num_components = 2
pca = PCA(n_components=num_components)
pca.fit(sim_data_standardized)

# 3. Transform the original data to the new subspace
reduced_data_sklearn = pca.transform(sim_data_standardized)

In [None]:
reduced_data_sklearn

### Let's apply on our example data

In [None]:
# load up and instantiate PCA
pca = PCA(n_components=2)

In [None]:
# fit-transform the data
data_reduced_pca = pca.fit_transform(data_to_cluster_scaled)
pca.components_.shape

In [None]:
print(pca.explained_variance_ratio_)

In [None]:
# we can now plot the reduced data
sns.scatterplot(x=data_reduced_pca[:,0], y=data_reduced_pca[:,1])

Let's make a more informative plot using altair and bringing some data back into the picture

In [None]:
# Create a new DataFrame based on the reduced data from PCA
vis_data = pd.DataFrame(data_reduced_pca)

# Add 'place' column from the original 'data' DataFrame to 'vis_data'
vis_data['place'] = data['place']

# Add 'country' column, represented by its alpha-2 code, from the original 'data' DataFrame to 'vis_data'
vis_data['country'] = data['alpha-2']

# Rename the columns of 'vis_data' for better clarity:
# The first two columns represent the two principal components from PCA
# The third and fourth columns are 'place' and 'country' respectively
vis_data.columns = ['x', 'y', 'place', 'country']

# Using the Altair library to create an interactive scatter plot:
# - The x and y axes represent the two principal components.
# - Each data point (or circle) in the scatter plot corresponds to a 'place' in a 'country'.
# - Hovering over a data point reveals a tooltip with the 'place' and 'country' information.
alt.Chart(vis_data).mark_circle(size=60).encode(
    x='x',          # Set the x-axis to represent the first principal component
    y='y',          # Set the y-axis to represent the second principal component
    tooltip=['place', 'country']  # Display 'place' and 'country' information as a tooltip on hover
).interactive()   # Enable interactive features such as panning and zooming


In [None]:
plt.figure(figsize=(18,2))
sns.heatmap(pd.DataFrame(pca.components_, columns=data_to_cluster.columns), annot=True)

From looking at the components, we can "see" that while the 1st is capturing political features (i.e. freedom and fragility), the second is bringing together all cost-variables (that are correlated)

In [None]:
#quick correlation check

# Compute the correlation matrix
corr = data_to_cluster.corr()

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

I also very much recommend you to follow [this tutorial](https://youtu.be/52d7ha-GdV8) where you will learn to implement PCA starting out with the math and building your own module.

In [None]:
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed/52d7ha-GdV8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

### Dimensionality Reduction with NMF

NMF is another popular dimensinality reduction technique based on matrix-decomposition. One advantage here is that components are often "more equal" in their importance. It is a more modern technique and is often very good at capturing latent paterns in data.

The number of components is a bit of a "debated issue" and as with many things in UML (that is also debatable) more of a choice of the analyst.

In [None]:
# import nmf
from sklearn.decomposition import NMF

In [None]:
# instantiate with 4 components
nmf = NMF(n_components=4)

In [None]:
# notice, we are using the min-max scaled data
data_reduced_nmf = nmf.fit_transform(data_to_cluster_minmax)

In [None]:
data_reduced_nmf.shape

In [None]:
nmf.components_.shape

In [None]:
plt.figure(figsize=(20,3))
sns.heatmap(pd.DataFrame(nmf.components_, columns=data_to_cluster.columns), annot=True)

### Moving into more modern algorithms

In the recent years more advanced algos evolved and are being used for dimensionality reduction and visualization. t-SNE was popular around 2016 but then "replaced" by UMAP.

In [None]:
import umap

In [None]:
# we totally could specify more than 2 dimensions (as well as some other parameters)
umap_scaler = umap.UMAP()

In [None]:
# umap accepts standard-scaled data
embeddings = umap_scaler.fit_transform(data_to_cluster_scaled)

umap reduced data is often called "embeddings" which brings it terminology-wise closer to deep learning approaches. Probably this is because it is sometimes used in combination with modern NLP techniques like SBERT.

In [None]:
# just as PCA, umap reduced data can be plottet
sns.scatterplot(x=embeddings[:,0], y=embeddings[:,1])

Umap combines global and local features for deminsionality reduction with axis representing a combination of features that often align well with "human intuation" about data.

In [None]:
# Construct a new DataFrame from the embeddings and merge with 'place' and 'country' columns from the original data
vis_data = pd.DataFrame({
    'x': embeddings[:, 0],         # Assuming embeddings is a 2D array or similar structure
    'y': embeddings[:, 1],
    'place': data['place'],
    'country': data['alpha-2']
})

# Create an interactive scatter plot using Altair
chart = alt.Chart(vis_data).mark_circle(size=60).encode(
    x='x',
    y='y',
    tooltip=['place', 'country']
).interactive()

chart


## Clustering

Similar to dimensionality reduction, clustering aims at identifying latent patterns in the data. In addition, clustering algorithms sort data into (simetimes) predefined clusters.

There exist many different approaches to clustering. One of the most used ones is K-means.

For a deep-dive, consider [this chapter](https://jakevdp.github.io/PythonDataScienceHandbook/05.11-k-means.html)

Consider also [this tutorial](https://youtu.be/vtuH4VRq1AU) where you learn how to implement the algorithm from scratch (starting with the math) in Python.

# Deep Dive into K-means Clustering

K-means clustering aims to partition $n$ observations into $k$ clusters in which each observation belongs to the cluster with the nearest mean. The algorithm works iteratively to assign each data point to one of the $k$ groups based on the features provided.

Mathematically, the primary objective is to minimize:

$$\ J = \sum_{i=1}^{k} \sum_{x \in S_i} ||x - \mu_i||^2 \$$

Where:

- $J$ is the objective function,
- $x$ is a data point in cluster $S_i$,
- $\mu_i$ is the centroid of $S_i$.

## Steps:

1. Initialize the $k$ cluster centroids (randomly pick samples from the data as initial centroids).
2. Assign each data point to the closest centroid.
3. Recompute the centroids based on the current cluster assignments.
4. Repeat steps 2 and 3 until the assignments do not change or a maximum number of iterations is reached.

Let's write a simple implementation of the k-means clustering for a clearer understanding:


In [None]:
max_iters=100
k = 3

In [None]:
# 1. Initialize the k cluster centroids
centroids = data_reduced_pca[np.random.choice(data.shape[0], k, replace=False)]

In [None]:
# Plot observations
sns.scatterplot(x=data_reduced_pca[:, 0], y=data_reduced_pca[:, 1], alpha=0.6, color='blue')

# Plot centroids
sns.scatterplot(x=centroids[:, 0], y=centroids[:, 1], color='red', s=100)

plt.title('PCA Reduced Data and Initial Centroids')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()

In [None]:
# 2. Assign each data point to the closest centroid
distances = np.linalg.norm(data_reduced_pca - centroids[:, np.newaxis], axis=2)
labels = np.argmin(distances, axis=0)

In [None]:
# 3. Recompute the centroids
new_centroids = np.array([data_reduced_pca[labels == i].mean(axis=0) for i in range(k)])

In [None]:
new_centroids

In [None]:
# Plot observations after 1st interation
sns.scatterplot(x=data_reduced_pca[:, 0], y=data_reduced_pca[:, 1], alpha=0.6, color='blue')

# Plot centroids
sns.scatterplot(x=new_centroids[:, 0], y=new_centroids[:, 1], color='red', s=100)

plt.title('PCA Reduced Data and 1st interation Centroids')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()

In [None]:
# A simple implementatino of K-means

def k_means_simple(data, k, max_iters=100):
    # 1. Initialize the k cluster centroids
    centroids = data[np.random.choice(data.shape[0], k, replace=False)]

    for _ in range(max_iters):
        # 2. Assign each data point to the closest centroid
        distances = np.linalg.norm(data - centroids[:, np.newaxis], axis=2)
        labels = np.argmin(distances, axis=0)

        # 3. Recompute the centroids
        new_centroids = np.array([data[labels == i].mean(axis=0) for i in range(k)])

        # Check for convergence
        if np.all(centroids == new_centroids):
            break

        centroids = new_centroids

    return labels, centroids


In [None]:
# Let's test our simple k-means
labels, final_centroids = k_means_simple(data_reduced_pca, 3)
print("Cluster centroids:\n", final_centroids)

In [None]:
# Plot observations after 100st interation
sns.scatterplot(x=data_reduced_pca[:, 0], y=data_reduced_pca[:, 1], alpha=0.6, color='blue')

# Plot centroids
sns.scatterplot(x=final_centroids[:, 0], y=final_centroids[:, 1], color='red', s=100)

plt.title('PCA Reduced Data and last interation Centroids')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()

In [None]:
clusterer = KMeans(n_clusters=3)

The number of clusters is a bit of a hot topic.
The "elbow method" is still widely used for "estimating" the optimal number. It looks at _inertia_ as a measure of clustering quality and suggest to use the number where an "elbow" can be seen when plotting inertia values for different $n\_cluster$.

In [None]:
# Initializing an empty list to store the sum of squared distances for each 'k'
Sum_of_squared_distances = []

# Define a range for possible cluster values (1 to 9)
K = range(1, 10)

# For each possible 'k', fit a KMeans model and compute the sum of squared distances
for k in K:
    km = KMeans(n_clusters=k, n_init = "auto")               # Initialize the KMeans model with 'k' clusters
    km.fit(data_to_cluster_scaled)          # Fit the model on the scaled data
    Sum_of_squared_distances.append(km.inertia_)  # Append the model's inertia (sum of squared distances) to the list


In [None]:
# Plot the sum of squared distances for each 'k' to determine the 'elbow'
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Sum of Squared Distances')
plt.title('Elbow Method For Optimal k')
plt.grid(True)  # Add a grid for better visualization
plt.show()

Choosing the optimal number of clusters is often a subject of debate. The "elbow method" remains a popular technique to estimate this number. This method evaluates the _inertia_ (sum of squared distances) as a metric for clustering quality. By plotting inertia values against varying cluster counts, we look for an "elbow" point. This "elbow" typically suggests the optimal number of clusters where adding more doesn't provide significant better fit to the data.



In [None]:
umap_scaler_km = umap.UMAP(n_components=6)
embeddings_km = umap_scaler.fit_transform(data_to_cluster_scaled)


Sum_of_squared_distances = []
K = range(1,10)
for k in K:
    km = KMeans(n_clusters=k, n_init = "auto")
    km = km.fit(embeddings_km)
    Sum_of_squared_distances.append(km.inertia_)


plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()


In [None]:
# back to our k-means instance. We take 3 clusters on non-reduced data
clusterer.fit(data_to_cluster_scaled)

In [None]:
# we can then copy the cluster-numbers into the original file and start exploring
data['cluster'] = clusterer.labels_

In [None]:
# e.g. which cluster seems most lgbt-friendly 🌈
data.groupby('cluster').lgbt_friendly.mean()

In [None]:
# e.g. which cluster seems most party-places 🥳
data.groupby('cluster').nightlife.mean()

Let's combine clustering with our UMAP embeedings in the viz.

In [None]:
vis_data = pd.DataFrame(embeddings)
vis_data['place'] = data['place']
vis_data['cluster'] = data['cluster']
vis_data['country'] = data['alpha-2']
vis_data.columns = ['x', 'y', 'place', 'cluster','country']

In [None]:
alt.Chart(vis_data).mark_circle(size=60).encode(
    x='x',
    y='y',
    tooltip=['place', 'country'],
    color=alt.Color('cluster:N', scale=alt.Scale(scheme='dark2')) #use N after the var to tell altair that it's categorical
).interactive()

## Similarity and Distance in Recommendations

Recommendation systems often deploy two key methodologies:

1. **Content-Based Recommendation:** Based on underlying characteristics or properties of items. For instance, recommending similar products or content. This method often employs principles from unsupervised machine learning (UML).
   
2. **Collaborative Filtering:** Leverages behavioral patterns of users to recommend items. This is based on the idea of "users similar to you also liked..."

The foundation of many recommendation approaches is determining how "similar" or "distant" items or users are from one another.

### Euclidean Distance

Euclidean Distance is a widely used metric to calculate similarity in the context of vectors.

![Euclidean Distance Visualization](https://upload.wikimedia.org/wikipedia/commons/5/55/Euclidean_distance_2d.svg)

While the concept is illustrated in 2D in the image above, it's scalable to n-dimensional vectors. The formula for Euclidean Distance in \( n \) dimensions is:

$ (\vec{u}, \vec{v}) = \| \vec{u} - \vec{v} \| = \sqrt{\sum_{i=1}^{n} (u_i - v_i)^2} \$

**Example:**

Given the vectors:
$ \ \vec{u} = (2, 3, 4, 2) \ $
and
$ \ \vec{v} = (1, -2, 1, 3) \ $

The Euclidean Distance between them is:

$$
\begin{align*}
d(\vec{u}, \vec{v}) &= \sqrt{(2-1)^2 + (3+2)^2 + (4-1)^2 + (2-3)^2} \\
&= \sqrt{1 + 25 + 9 + 1} \\
&= \sqrt{36} \\
&= 6
\end{align*}
$$

In [None]:
# Let's use the NMF reduction


print(data_reduced_nmf[0,:])
print(data_reduced_nmf[1,:])

In [None]:
# with numpy
np.linalg.norm(data_reduced_nmf[0,:] - data_reduced_nmf[1,:])

In [None]:
import math

In [None]:
math.sqrt((0.07242715-0.1211694)**2+(0.15113185-0.0946327)**2+(0-0.01483596)**2+(0.28106393-0.40376001)**2)

In [None]:
np.linalg.norm(data_reduced_nmf[0,:] - data_reduced_nmf[2,:])

In [None]:
np.linalg.norm(data_reduced_nmf[1,:] - data_reduced_nmf[2,:])

In [None]:
data['place'][:3]

In [None]:
# or easier
from sklearn.metrics.pairwise import euclidean_distances

In [None]:
euclidean_matrix = euclidean_distances(data_reduced_nmf)
euclidean_matrix.shape

In [None]:
np.argsort(euclidean_matrix[0,:])[:3]

In [None]:
data[data['place']=='Aalborg']

In [None]:
ixs = np.argsort(euclidean_matrix[588,:])[:10]
print(data['place'][ixs])

In [None]:
def recommender_city(place, n_recs):
  if place in list(set(data.place)):
    ix = data[data['place']==place].index[0]
    ixs = np.argsort(euclidean_matrix[ix,:])[n_recs:]
    return data['place'][ixs]
  else:
    return 'Place not in the dataset'

In [None]:
recommender_city('Beijing', 10)

In [None]:
!pip install gradio -q

In [None]:
import gradio as gr

In [None]:
demo = gr.Interface(fn=recommender_city,
                    inputs= [gr.Dropdown(
            data['place'].tolist(), label="City I like!", info="Pick one!"),

                            gr.Slider(1, 15, 5, step=1,
                            label="Number of recommendations")],
                    outputs="text")

In [None]:
demo.launch()