**Introduction**

For me, I've always had an interest in investment in digital commodities, like cryptocurrencies, US and other regulatory commissions effectively classify such assets as commodities. The selection of a well-diversified portfolio of stocks requires a thoughtful analysis of various factors and consideration of their potential for covariant movement. In this introductory section, I will explain the reasons behind me choosing a set of diverse stocks, namely Apple, Nike, Levi Strauss & Co., Southwest Airlines, Tesla, Sony, Spotify, Bitcoin, and Amazon. I will explore the commonalities among these stocks and the motivation to clump them together to assess their potential for moving in a covariant manner. 

Apple, Nike, Levi Strauss & Co., and Sony are all established brands with strong market positions in their respective sectors. These companies have built reputations for delivering high-quality products and experiences, often driving customer loyalty and generating consistent revenue streams. The recognition and trust associated with these brands contribute to their potential for long-term growth and stability. Furthermore, these stocks represent diverse industries, including technology, consumer goods, transportation, entertainment, and cryptocurrency. This deliberate diversification across sectors aims to reduce concentration risk and provide exposure to various segments of the global economy. 

**Why Cluster Commodities, to Study Bitcoin or these other brands?**

In conclusion, the selection of Apple, Nike, Levi Strauss & Co., Southwest Airlines, Tesla, Sony, Spotify, Bitcoin, and Amazon in this portfolio is based on their association with renowned global brands, diverse industries, innovation, and disruptive potential. By clumping these stocks together, I aim to explore the potential for covariant movement, reducing risk through diversification, and capturing growth opportunities. I myslef have a portfolio of some of these stocks but not all. I own bitcoin, apple, spotify, and Tesla with plans to expand into more of these brands based off of this assignment. Using some of the stocks I owned was a good idea to me since I could learn more about them apart from what I already know.

**Using Cluster Matrices to Study Covariant, Affine Price Behaviors between Bitcoin and Other Commodity Flows**

This study samples the recent price behavior of 37 commodities, then traces the covariant, linear behavior, matrix style. Affine, or common mover groups are established, and presented interactively, for the viewer in a visual milieu. 

Discussion of data pipeline used, and the subsequent data transformations needed in order to create this affine matrix, as well as the technical tools to facilitate this. 

In [4]:
!pip install yfinance
!pip install vega_datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [5]:
import yfinance as yf
from time import time,ctime, clock_gettime
from time import gmtime, time, time_ns

def ifs(input):
    ni = ''
    if input =='gff':
        input = 'GFF'
        ni = "GF=F"
    elif input == 'zff':
        input = 'ZFF'
        ni = "ZF=F"
    else:
        input = input.upper()
        ins = "="
        before = "F"
        ni = input.replace(before, ins + before , 1)
    print(ni)
    data = yf.download( 
        tickers = ni,
        period = "14d",
        interval = "30m",
        group_by = 'ticker',
        auto_adjust = True,
        prepost = True,
        threads = True,
        proxy = None
    )
    epoch = ctime()
    filename = input
    data.to_csv(filename)
#!ls #only in jupy

**Trigger Data Downloads**

The following code customizes the commodities under investigation. In order to compare every commodity's price history versus the rest in your matrix, the lengths of the data captures are minimized to the length of the smallest data set. Thus, larger sets are only captured at the length of the smallest set.

The volatility of every price tick is calculated via [close price minus open price].

In [6]:
#read in csv data from each commodity capture, gather
#assign 'open' to an array, create df from arrays
import numpy as np 
import pandas as pd
from  scipy.stats import pearsonr
symbol_dict = {"AAPL":"Apple Inc.", "NKE":"NIKE, Inc.", "LUV":"Southwest Airlines Co.",
               "TSLA":"Tesla, Inc.", "LEVI":"Levi Strauss & Co.", "SONY":"Sony Group Corporation", "SPOT":"Spotify Technology S.A.", 
               "BTC-USD":"Bitcoin USD", "AMZN":"Amazon.com, Inc."} #QQ, SPY , TNX, VIX

sym, names = np.array(sorted(symbol_dict.items())).T

for i in sym:    #build all symbol csvs, will populate/appear in your binder. Use linux for efficient dp
    ifs(i)

quotes = []
lens = []
for symbol in sym:
    symbol = symbol.upper()
    t = pd.read_csv(symbol) 
    lens.append(t.shape[0])
mm = np.amin(lens)-1
print("min length of data: ",mm)

for symbol in sym:
    symbol = symbol.upper()
    t = pd.read_csv(symbol) 
    t= t.truncate(after=mm)
    quotes.append(t)
mi = np.vstack([q["Close"] for q in quotes]) #min
ma = np.vstack([q["Open"] for q in quotes]) #max

volatility = ma - mi 
      

AAPL
[*********************100%***********************]  1 of 1 completed
AMZN
[*********************100%***********************]  1 of 1 completed
BTC-USD
[*********************100%***********************]  1 of 1 completed
LEVI
[*********************100%***********************]  1 of 1 completed
LUV
[*********************100%***********************]  1 of 1 completed
NKE
[*********************100%***********************]  1 of 1 completed
SONY
[*********************100%***********************]  1 of 1 completed
SPOT
[*********************100%***********************]  1 of 1 completed
TSLA
[*********************100%***********************]  1 of 1 completed
min length of data:  359


In [7]:
from sklearn import covariance
import altair as alt
alphas = np.logspace(-1.5, 1, num=15)
edge_model = covariance.GraphicalLassoCV(alphas=alphas)
X = volatility.copy().T
X /= X.std(axis=0)
l =edge_model.fit(X)
n= []
print(type(l.alphas))
for  i in range(len(l.alphas)):
    print(l.alphas[i])
    dict = {"idx":i , "alpha":l.alphas[i]}
    n.append(dict)
    
dd = pd.DataFrame(n)
alt.Chart(dd).mark_point(filled=True, size=100).encode(
    y=alt.Y('idx'),
    x=alt.X('alpha'),tooltip=['alpha'],).properties(
        width=800,
        height=400,
        title="Edges Present Within the Graphical Lasso Model"
    ).interactive()

<class 'numpy.ndarray'>
0.03162277660168379
0.047705826961439296
0.07196856730011521
0.10857111194022041
0.16378937069540642
0.2470911227985605
0.372759372031494
0.5623413251903491
0.8483428982440722
1.279802213997954
1.9306977288832505
2.9126326549087382
4.39397056076079
6.628703161826448
10.0


**Definining cluster Membership, by Covariant Affinity**

Clusters of covariant, affine moving commodities are established. This group is then passed into a dataframe so that the buckets of symbols can become visible. 

In [8]:
from sklearn import cluster
                                                    #each symbol, at index, is labeled with a cluster id:
_, labels = cluster.affinity_propagation(edge_model.covariance_, random_state=0)
n_labels = labels.max()                             #integer limit to list of clusters ids    
# print("names: ",names,"  symbols: ",sym)
gdf = pd.DataFrame()
for i in range(n_labels + 1):
    print(f"Cluster {i + 1}: {', '.join(np.array(sym)[labels == i])}")
    l = np.array(sym)[labels == i]
    ss = np.array(names)[labels == i]
    dict = {"cluster":(i+1), "symbols":l, "size":len(l), "names":ss}
    gdf = gdf.append(dict, ignore_index=True, sort=True)
    
gdf.head(15)


Cluster 1: LEVI
Cluster 2: BTC-USD, NKE, SONY
Cluster 3: LUV, SPOT
Cluster 4: AAPL, AMZN, TSLA


  gdf = gdf.append(dict, ignore_index=True, sort=True)
  gdf = gdf.append(dict, ignore_index=True, sort=True)
  gdf = gdf.append(dict, ignore_index=True, sort=True)
  gdf = gdf.append(dict, ignore_index=True, sort=True)


Unnamed: 0,cluster,names,size,symbols
0,1,[Levi Strauss & Co.],1,[LEVI]
1,2,"[Bitcoin USD, NIKE, Inc., Sony Group Corporation]",3,"[BTC-USD, NKE, SONY]"
2,3,"[Southwest Airlines Co., Spotify Technology S.A.]",2,"[LUV, SPOT]"
3,4,"[Apple Inc., Amazon.com, Inc., Tesla, Inc.]",3,"[AAPL, AMZN, TSLA]"


In [None]:
for i in gdf['cluster']:
    print("cluster ",i)
    d = gdf[gdf['cluster'].eq(i)]
    for j in d.names:
        print(j, ", ")

In [9]:
import altair as alt
def runCluster():
    c = alt.Chart(gdf).mark_circle(size=60).encode(
        x= alt.X('cluster:N'),
        y= alt.Y('size:Q'),
        color='size:Q',
        tooltip=['names'],
        size=alt.Size('size:Q')
    ).properties(
        width=800,
        height=400,
        title="10 Top Global Commodities, Clustered by Affine Covariance"
    ).interactive()
    #.configure_title("10 Top Global Commodities, Clustered by Affine Covariance")
        
    chart =c 
    return chart
runCluster()


**Conclusions**

For the experiment I grouped together about 10 different companies/ stocks all ranging from technology, clothing, entertainement,etc... In the experiment witht the cluster chart Bitcoin, Nike, and Sony all move togther. Southwest and Spotify moved togehter for the next one, while Apple, Amazon, and Tesla moved together for the last one. I saw that it was a list of companies moving together with bigger names moving with each other. A pattern I noticed was within the Lasso model. Each number or point steadily increased then towards then end it became bigger gaps.

**References**

1. Gael Varoquaux. Visualizing the Stock Market Structure. Scikit-Learn documentation pages, https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html
2. Ran Aroussi. YFinance API documents. https://github.com/ranaroussi/yfinance
3. The Altair Charting Toolkit. https://altair-viz.github.io/index.html