Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find all Pairs #1

Open
Bryson14 opened this issue Mar 2, 2019 · 1 comment
Open

Find all Pairs #1

Bryson14 opened this issue Mar 2, 2019 · 1 comment

Comments

@Bryson14
Copy link
Collaborator

Bryson14 commented Mar 2, 2019

`

@author Bryson Meiling

input is two tickers. The output is a correlation factor between -1 and 1

makes the two list the same length for the correlation calculation between -1 and 1

if one is longer, it will cut off the oldest data of the longer set

from statsmodels.tsa.stattools import coint
import math

def pearson_coor(data1: list, data2: list) -> float: # find the correlation of 2 list of even length

x, y = data1, data2
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = average(x)
avg_y = average(y)
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
    xdiff = x[idx] - avg_x
    ydiff = y[idx] - avg_y
    diffprod += xdiff * ydiff
    xdiff2 += xdiff * xdiff
    ydiff2 += ydiff * ydiff
return diffprod / math.sqrt(xdiff2 * ydiff2)

def average(x: list) -> float:
assert len(x) > 0
return float(sum(x)) / len(x)

def coint_test(ticker1: list, ticker2: list) -> float:
return coint(ticker1, ticker2)

def ticker_list(file) -> list:
with open(file, 'r') as fil:
data = fil.readlines()
for i in range(len(data)):
data[i] = data[i].strip()
return data

warning this function will take around 40 - 60 minutes to complete if comparing ~500 securities

def find_all_pairs(days: int, data_dir: str):
symbols = ticker_list()
# symbols = symbols[:10] # testing not 124000 possibilities
correlated = []
cointegrated = []
for i in range(len(symbols)):
for j in range(i + 1, len(symbols)):
if .95 < pearson_coor(Stock(symbols[i]).ranged_data_list('close', days),
Stock(symbols[j]).ranged_data_list('close', days)):
correlated.append([symbols[i], symbols[j]])

for i in range(len(correlated)):
    pvalue = coint(Stock(correlated[i][0]).ranged_data_list('close', days),
             Stock(correlated[i][1]).ranged_data_list('close', days))[1]

    if pvalue < .005:
        cointegrated.append([pvalue, correlated[i][0],correlated[i][1]])

return cointegrated

`

in function find_all_pairs, the only thing that needs to be changed is the way it accesses the data. I was using a class named Stock and that could return the last n days of close or open prices in a list.

`
MMM
ABT
ABBV
ABMD
ACN
ATVI
ADBE
AMD
AAP
AES
AET
AMG
AFL
A
APD
AKAM
ALK
ALB
ARE
ALXN
ALGN
ALLE
AGN
ADS
LNT
ALL
GOOGL
GOOG
MO
AMZN
AEE
AAL
AEP
AXP
AIG
AMT
AWK
AMP
ABC
AME
AMGN
APH
APC
ADI
ANSS
ANTM
AON
AOS
APA
AIV
AAPL
AMAT
APTV
ADM
ARNC
ANET
AJG
AIZ
T
ADSK
ADP
AZO
AVB
AVY
BHGE
BLL
BAC
BK
BAX
BBT
BDX
BRK.B
BBY
BIIB
BLK
HRB
BA
BKNG
BWA
BXP
BSX
BHF
BMY
AVGO
BR
BF.B
CHRW
COG
CDNS
CPB
COF
CAH
KMX
CCL
CAT
CBOE
CBRE
CBS
CELG
CNC
CNP
CTL
CERN
CF
SCHW
CHTR
CVX
CMG
CB
CHD
CI
XEC
CINF
CTAS
CSCO
C
CFG
CTXS
CLX
CME
CMS
KO
CTSH
CL
CMCSA
CMA
CAG
CXO
COP
ED
STZ
COO
CPRT
GLW
COST
COTY
CCI
CSX
CMI
CVS
DHI
DHR
DRI
DVA
DE
DAL
XRAY
DVN
DLR
DFS
DISCA
DISCK
DISH
DG
DLTR
D
DOV
DWDP
DTE
DRE
DUK
DXC
ETFC
EMN
ETN
EBAY
ECL
EIX
EW
EA
EMR
ETR
EOG
EFX
EQIX
EQR
ESS
EL
EVRG
ES
RE
EXC
EXPE
EXPD
ESRX
EXR
XOM
FFIV
FB
FAST
FRT
FDX
FIS
FITB
FE
FISV
FLT
FLIR
FLS
FLR
FMC
FL
F
FTNT
FTV
FBHS
BEN
FCX
GPS
GRMN
IT
GD
GE
GIS
GM
GPC
GILD
GPN
GS
GT
GWW
HAL
HBI
HOG
HRS
HIG
HAS
HCA
HCP
HP
HSIC
HSY
HES
HPE
HLT
HFC
HOLX
HD
HON
HRL
HST
HPQ
HUM
HBAN
HII
IDXX
INFO
ITW
ILMN
IR
INTC
ICE
IBM
INCY
IP
IPG
IFF
INTU

ISRG
IVZ
IPGP
IQV
IRM
JKHY
JEC
JBHT
JEF
SJM
JNJ
JCI
JPM
JNPR
KSU
K
KEY
KEYS
KMB
KIM
KMI
KLAC
KSS
KHC
KR
LB
LLL
LH
LRCX
LEG
LEN
LLY
LNC
LIN
LKQ
LMT
L
LOW
LYB
MTB
MAC
M
MRO
MPC
MAR
MMC
MLM
MAS
MA
MAT
MKC
MCD
MCK
MDT
MRK
MET
MTD
MGM
KORS
MCHP
MU
MSFT
MAA
MHK
TAP
MDLZ
MNST
MCO
MS
MOS
MSI
MSCI
MYL
NDAQ
NOV
NKTR
NTAP
NFLX
NWL
NFX
NEM
NWSA
NWS
NEE
NLSN
NKE
NI
NBL
JWN
NSC
NTRS
NOC
NCLH
NRG
NUE
NVDA
ORLY
OXY
OMC
OKE
ORCL
PCAR
PKG
PH
PAYX
PYPL
PNR
PBCT
PEP
PKI
PRGO
PFE
PCG
PM
PSX
PNW
PXD
PNC
RL
PPG
PPL
PFG
PG
PGR
PLD
PRU
PEG
PSA
PHM
PVH
QRVO
PWR
QCOM
DGX
RJF
RTN
O
RHT
REG
REGN
RF
RSG
RMD
RHI
ROK
COL
ROL
ROP
ROST
RCL
CRM
SBAC
SCG
SLB
STX
SEE
SRE
SHW
SPG
SWKS
SLG
SNA
SO
LUV
SPGI
SWK
SBUX
STT
SRCL
SYK
STI
SIVB
SYMC
SYF
SNPS
SYY
TROW
TTWO
TPR
TGT
TEL
FTI
TXN
TXT
TMO
TIF
TWTR
TJX
TMK
TSS
TSCO
TDG
TRV
TRIP
FOXA
FOX
TSN
UDR
ULTA
USB
UAA
UA
UNP
UAL
UNH
UPS
URI
UTX
UHS
UNM
VFC
VLO
VAR
VTR
VRSN
VRSK
VZ
VRTX
VIAB
V
VNO
VMC
WMT
WBA
DIS
WM
WAT
WEC
WCG
WFC
WELL
WDC
WU
WRK
WY
WHR
WMB
WLTW
WYNN
XEL
XRX
XLNX
XYL
YUM
ZBH
ZION
ZTS
`

This is the file where it pulls the names of the stocks to compare
THanks! Lmk how it works

@JAMMFam
Copy link

JAMMFam commented Oct 1, 2019

Test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants