# **Finding subgroups in stock market using Affinity Propagation model**
**Affinity Propagation** is a clustering algorithm that doesn't require us to specify the number of clusters beforehand. Because of its generic nature and simplicity of implementation, it has found a lot of applications across many fields. It finds out representatives of clusters, called exemplars, using a technique called message passing. We start by specifying the measures of similarity that we want it to consider. It simultaneously considers all training data points as potential exemplars. It then passes messages between the data points until it finds a set of exemplars.

The message passing happens in two alternate steps, called responsibility and availability. Responsibility refers to the message sent from members of the cluster to candidate exemplars, indicating how well suited the data point would be as a member of this exemplar's cluster. Availability refers to the message sent from candidate exemplars to potential members of the cluster, indicating how well suited it would be as an exemplar. It keeps doing this until the algorithm converges on an optimal set of exemplars.

There is also a parameter called preference that controls the number of exemplars that will be found. If you choose a high value, then it will cause the algorithm to find too many clusters. If you choose a low value, then it will lead to a small number of clusters. A good value to choose would be the median similarity between the points.

Let's use Affinity Propagation model to find subgroups in the stock market. We will beusing the stock quote variation between opening and closing as the governing feature.

In [4]:
import datetime
import json

In [24]:
import os

In [5]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import covariance, cluster
# from matplotlib.finance imMport quotes_historical_yahoo_ochl as quotes_yahoo

In [16]:
import yfinance as yf
import mplfinance as mpf

In [6]:
from mplfinance.original_flavor import candlestick_ohlc

# Input file containing company symbols 
We will be using the stock market data available in ```matplotlib```. The company symbols are mapped to their full names in the file ```company_symbol_mapping.json```:

In [7]:
input_file = 'company_symbol_mapping.json'

# Load the company symbol map

In [8]:
with open(input_file, 'r') as f:
    company_symbols_map = json.loads(f.read())

In [9]:
symbols, names = np.array(list(company_symbols_map.items())).T

In [10]:
symbols

array(['TOT', 'XOM', 'CVX', 'COP', 'VLO', 'MSFT', 'IBM', 'TWX', 'CMCSA',
       'CVC', 'YHOO', 'DELL', 'HPQ', 'AMZN', 'TM', 'CAJ', 'MTU', 'SNE',
       'F', 'HMC', 'NAV', 'NOC', 'BA', 'KO', 'MMM', 'MCD', 'PEP', 'MDLZ',
       'K', 'UN', 'MAR', 'PG', 'CL', 'GE', 'WFC', 'JPM', 'AIG', 'AXP',
       'BAC', 'GS', 'AAPL', 'SAP', 'CSCO', 'TXN', 'XRX', 'LMT', 'WMT',
       'WBA', 'HD', 'GSK', 'PFE', 'SNY', 'NVS', 'KMB', 'R', 'GD', 'RTN',
       'CVS', 'CAT', 'DD'], dtype='<U17')

In [11]:
names

array(['Total', 'Exxon', 'Chevron', 'ConocoPhillips', 'Valero Energy',
       'Microsoft', 'IBM', 'Time Warner', 'Comcast', 'Cablevision',
       'Yahoo', 'Dell', 'HP', 'Amazon', 'Toyota', 'Canon', 'Mitsubishi',
       'Sony', 'Ford', 'Honda', 'Navistar', 'Northrop Grumman', 'Boeing',
       'Coca Cola', '3M', 'Mc Donalds', 'Pepsi', 'Kraft Foods', 'Kellogg',
       'Unilever', 'Marriott', 'Procter Gamble', 'Colgate-Palmolive',
       'General Electrics', 'Wells Fargo', 'JPMorgan Chase', 'AIG',
       'American express', 'Bank of America', 'Goldman Sachs', 'Apple',
       'SAP', 'Cisco', 'Texas instruments', 'Xerox', 'Lookheed Martin',
       'Wal-Mart', 'Walgreen', 'Home Depot', 'GlaxoSmithKline', 'Pfizer',
       'Sanofi-Aventis', 'Novartis', 'Kimberly-Clark', 'Ryder',
       'General Dynamics', 'Raytheon', 'CVS', 'Caterpillar',
       'DuPont de Nemours'], dtype='<U17')

# Load the historical stock quotes 
Load the stock quotes from matplotlib:

In [12]:
start_date = datetime.datetime(2003, 7, 3)
end_date = datetime.datetime(2007, 5, 4)

In [None]:
# Old:
# quotes = [
#     quotes_yahoo(symbol, start_date, end_date, asobject=True) 
#                 for symbol in symbols
# ]

NameError: name 'quotes_yahoo' is not defined

In [18]:
# Download stock data
df = yf.download("AAPL", start=start_date, end=end_date)

  df = yf.download("AAPL", start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


In [22]:
# Set download folder path
download_path = "C:\\Users\\aashi\\GitHub\\My Repositories\\Machine-Learning-Implementation\\Detecting Patterns with Unsupervised Learning\\StockData"

In [25]:
# Create folder if it doesn't exist
os.makedirs(download_path, exist_ok=True)

In [27]:
quotes = [
    yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
]

  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['TOT']: YFTzMissingError('possibly delisted; no timezone found')
  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed
  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed
  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed
  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed
  yf.download(symbol, start=start_date, end=end_date) for symbol in symbols
[*********************100%***********************]  1 of 1 completed
  yf.download(symbol, start=start_date, end=

In [None]:
# Download and save each symbol's data
for symbol in symbols:
    df = yf.download(symbol, start=start_date, end=end_date)
    file_path = os.path.join(download_path, f"{symbol}.csv")
    df.to_csv(file_path)
    print(f"Saved {symbol} data to {file_path}")

  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['TOT']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved TOT data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\TOT.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved XOM data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\XOM.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved CVX data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CVX.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved COP data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\COP.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved VLO data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\VLO.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MSFT data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MSFT.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved IBM data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\IBM.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved TWX data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\TWX.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['CVC']: YFPricesMissingError('possibly delisted; no price data found  (1d 2003-07-03 00:00:00 -> 2007-05-04 00:00:00)')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved CMCSA data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CMCSA.csv
Saved CVC data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CVC.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['YHOO']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved YHOO data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\YHOO.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['DELL']: YFPricesMissingError('possibly delisted; no price data found  (1d 2003-07-03 00:00:00 -> 2007-05-04 00:00:00) (Yahoo error = "Data doesn\'t exist for startDate = 1057204800, endDate = 1178251200")')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved DELL data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\DELL.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved HPQ data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\HPQ.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved AMZN data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\AMZN.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['CAJ']: YFPricesMissingError('possibly delisted; no price data found  (1d 2003-07-03 00:00:00 -> 2007-05-04 00:00:00)')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved TM data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\TM.csv
Saved CAJ data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CAJ.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['MTU']: YFPricesMissingError('possibly delisted; no price data found  (1d 2003-07-03 00:00:00 -> 2007-05-04 00:00:00)')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MTU data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MTU.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['SNE']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved SNE data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\SNE.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved F data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\F.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved HMC data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\HMC.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['NAV']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved NAV data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\NAV.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved NOC data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\NOC.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved BA data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\BA.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved KO data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\KO.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MMM data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MMM.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MCD data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MCD.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved PEP data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\PEP.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MDLZ data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MDLZ.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved K data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\K.csv


[*********************100%***********************]  1 of 1 completed

1 Failed download:
['UN']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)


Saved UN data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\UN.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved MAR data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\MAR.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved PG data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\PG.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved CL data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CL.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved GE data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\GE.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved WFC data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\WFC.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved JPM data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\JPM.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved AIG data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\AIG.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved AXP data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\AXP.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved BAC data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\BAC.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved GS data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\GS.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved AAPL data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\AAPL.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved SAP data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\SAP.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved CSCO data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CSCO.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved TXN data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\TXN.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved XRX data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\XRX.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved LMT data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\LMT.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved WMT data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\WMT.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved WBA data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\WBA.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved HD data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\HD.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved GSK data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\GSK.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved PFE data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\PFE.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved SNY data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\SNY.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)


Saved NVS data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\NVS.csv


[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['RTN']: YFTzMissingError('possibly delisted; no timezone found')
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

Saved KMB data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\KMB.csv
Saved R data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\R.csv
Saved GD data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\GD.csv
Saved RTN data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\RTN.csv
Saved CVS data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CVS.csv
Saved CAT data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\CAT.csv



  df = yf.download(symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


Saved DD data to C:\Users\aashi\GitHub\My Repositories\Machine-Learning-Implementation\Detecting Patterns with Unsupervised Learning\StockData\DD.csv


In [19]:
df

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2003-07-03,0.287468,0.293779,0.285214,0.285515,137771200
2003-07-07,0.298588,0.303247,0.287468,0.289572,286272000
2003-07-08,0.306553,0.308055,0.292878,0.293328,256737600
2003-07-09,0.298889,0.307304,0.298889,0.303698,213645600
2003-07-10,0.294231,0.299640,0.291075,0.298739,170934400
...,...,...,...,...,...
2007-04-27,3.003014,3.003916,2.935994,2.950721,699403600
2007-04-30,2.999409,3.035474,2.995502,3.008124,616509600
2007-05-01,2.989489,3.015937,2.961840,2.993096,532523600
2007-05-02,3.017140,3.021648,2.989491,2.994901,505145200


# Extract opening and closing quotes
Compute the difference between opening and closing quotes:

In [30]:
# opening_quotes = np.array([quote.open for quote in quotes]).astype(np.float)
# closing_quotes = np.array([quote.close for quote in quotes]).astype(np.float)
opening_quotes = np.array([quote['Open'].iloc[0] for quote in quotes], dtype=float)
closing_quotes = np.array([quote['Close'].iloc[-1] for quote in quotes], dtype=float)

IndexError: single positional indexer is out-of-bounds

In [32]:
opening_quotes = []
closing_quotes = []

for quote in quotes:
    if not quote.empty and 'Open' in quote.columns and 'Close' in quote.columns:
        open_val = quote['Open'].iloc[0]
        close_val = quote['Close'].iloc[-1]
        if np.isscalar(open_val) and np.isscalar(close_val):
            opening_quotes.append(open_val)
            closing_quotes.append(close_val)
        else:
            opening_quotes.append(np.nan)
            closing_quotes.append(np.nan)
    else:
        opening_quotes.append(np.nan)
        closing_quotes.append(np.nan)

# Now convert to arrays
opening_quotes = np.array(opening_quotes, dtype=float)
closing_quotes = np.array(closing_quotes, dtype=float)


In [33]:
opening_quotes

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan])

In [34]:
closing_quotes

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan])

# Compute differences between opening and closing quotes 

In [None]:
quotes_diff = closing_quotes - opening_quotes

# Normalize the data 

In [None]:
X = quotes_diff.copy().T
X /= X.std(axis=0)

# Create a graph model 

In [None]:
edge_model = covariance.GraphLassoCV()

# Train the model

In [None]:
with np.errstate(invalid='ignore'):
    edge_model.fit(X)

# Build clustering model using Affinity Propagation model
Build the affinity propagation clustering model using the edge model we just trained:

In [None]:
_, labels = cluster.affinity_propagation(edge_model.covariance_)
num_labels = labels.max()

# Print the results of clustering

In [None]:
print('\nClustering of stocks based on difference in opening and closing quotes:\n')
for i in range(num_labels + 1):
    print("Cluster", i+1, "==>", ', '.join(names[labels == i]))