# How an old Nintendo baddie boosts portfolio analysis

K-medoids is similar to k-means except k-means clusters data by assigning points to the nearest mean of a cluster, while k-medoids assigns points to the nearest actual data point. The data point is designated as the "medoid" of a cluster.

In the context of clustering portfolio returns and volatility, k-medoids is more robust to outliers, since it uses actual portfolio feature values as cluster centers, whereas k-means can be influenced by extreme returns or volatilities. In the case of a portfolio of high-volatility tech stocks, this can be a problem.

By understanding k-medoids and how it's used in practice, you can make more informed decisions about your investments.

## Imports and set up

We’ll use the scikit-learn-extra module to run the k-medoids analysis. scikit-learn-extra is a module for machine learning that extends scikit-learn. It includes algorithms that are useful but do not satisfy the scikit-learn inclusion criteria.

In [4]:
import numpy as np
import pandas as pd
from sklearn_extra.cluster import KMedoids
import matplotlib.pyplot as plt
import openbb

Extensions to add: fixedincome@0.1.0a4, crypto@0.1.0a4, fred@0.1.0a4, tradingeconomics@0.1.0a4, stocks@0.1.0a4, economy@0.1.0a4, benzinga@0.1.0a4, news@0.1.0a4, fmp@0.1.0a4, polygon@0.1.0a4, forex@0.1.0a4, oecd@0.1.0a4, intrinio@0.1.0a4
Extensions to remove: crypto@0.1.0a3, stocks@0.1.0a3, intrinio@0.1.0a3, benzinga@0.1.0a3, fmp@0.1.0a3, polygon@0.1.0a3, tradingeconomics@0.1.0a3, alpha_vantage@0.1.0a3, economy@0.1.0a3, fixedincome@0.1.0a3, forex@0.1.0a3, news@0.1.0a3, fred@0.1.0a3

Building...


In [None]:
nq = pd.read_html("https://en.wikipedia.org/wiki/Nasdaq-100")[4]
symbols = nq.Ticker.tolist()
data = openbb.stocks.ca.hist(
    symbols, 
    start_date="2020-01-01", 
    end_date="2022-12-31"
)

In [None]:
moments = (
    data
    .pct_change()
    .describe()
    .T[["mean", "std"]]
    .rename(columns={"mean": "returns", "std": "vol"})
) * [252, np.sqrt(252)]

# Running the k-medoid analysis

Getting the medoids is only one line of code. The remaining code creates colors for the points in each cluster.

In [None]:
km = KMedoids(n_clusters=5).fit(moments)
labels = km.labels_
unique_labels = set(labels)
colors = [
    plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))
]

In [None]:
for k, col in zip(unique_labels, colors):
    class_member_mask = labels == k

    xy = moments[class_member_mask]
    plt.plot(
        xy.iloc[:, 0],
        xy.iloc[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
    )

plt.plot(
    km.cluster_centers_[:, 0],
    km.cluster_centers_[:, 1],
    "o",
    markerfacecolor="cyan",
    markeredgecolor="k",
)
plt.xlabel("Return")
plt.ylabel("Ann. Vol.")

Each cluster represents a set of stocks with similar risk-return characteristics. By examining these clusters, we can identify stocks that are statistically similar in terms of their performance metrics.

This information is valuable for asset allocation strategies when seeking specific risk and return objectives. For instance, a cluster with high returns and low volatility would be particularly appealing for risk-averse investors seeking stable growth.

Conversely, a cluster with high returns and high volatility might be more suitable for investors with a higher risk tolerance.