# Building Algorithms for bond price movement

In [1]:
import io, os, sys, types
import numpy as np
import pandas as pd
import sklearn as sk
sys.path.append('../Python Scripts/APIs/')
import api_algorithm as algo
MP4 = True


## Gathering and Organising Data

In [2]:
data_us = algo.importData("US",MP4,max_corr = 1,topvars=True)
data_uk = algo.importData("UK",MP4,max_corr = 1,topvars=True)
data_jpn = algo.importData("JPN",MP4,max_corr = 1,topvars=True)
data_aus = algo.importData("AUS",MP4,max_corr = 1,topvars=True)
data_cnd = algo.importData("CND",MP4,max_corr = 1,topvars=True)
data_grm = algo.importData("GRM",MP4,max_corr = 1,topvars=True)


Testing Correlation between corresponding L and C AVG statistics for each underlying datatype
               I2        I1       GM2       GM1       FF1       MP1       MP4  \
Pearson  0.455115  0.509620  0.462543  0.469224  0.404973  0.422853  0.420722   
MIC      0.757963  0.344041  0.333861  0.605408  0.804772  0.815669  0.703927   

              MP2  
Pearson  0.390567  
MIC      0.706715  

 MP4 kept for US analysis
Testing Correlation between corresponding L and C AVG statistics for each underlying datatype
               I2        I1       GM2       GM1       FF1       MP1       MP4  \
Pearson  0.266038  0.569166  0.598136  0.455878  0.254232  0.245972  0.217841   
MIC      0.341591  0.584643  0.518848  0.380368  0.356305  0.287536  0.325596   

              MP2  
Pearson  0.307818  
MIC      0.295008  

 MP4 kept for UK analysis
Testing Correlation between corresponding L and C AVG statistics for each underlying datatype
               I2        I1       GM2       GM1       FF

In [3]:
##Enforcing neccesary lag on certain data points
data_us = algo.featureLag(data_us)
data_uk = algo.featureLag(data_uk)
data_jpn = algo.featureLag(data_jpn)
data_aus = algo.featureLag(data_aus)
data_cnd = algo.featureLag(data_cnd)
data_grm = algo.featureLag(data_grm)

In [4]:
data_us = algo.businessCycleSplitter([(data_us,"US")])[0]
list_all_country_data = algo.businessCycleSplitter([(data_uk,"UK"), (data_jpn,"JPN"), (data_aus,"AUS"), (data_cnd,"CND"), (data_grm,"GRM")])


 Splitting Each countries data into 3 sets defined by Mentality Cycle
US split completed

 Splitting Each countries data into 3 sets defined by Mentality Cycle
UK split completed
JPN split completed
AUS split completed
CND split completed
GRM split completed


In [5]:
test_country_data = algo.classAndFeature([data_us])
#all_country_data =  algo.classAndFeature([data_uk, data_jpn, data_aus, data_cnd, data_grm])
all_country_data =  algo.classAndFeature(list_all_country_data)


 Splitting Each countries data into feature and target variable
US split completed

 Splitting Each countries data into feature and target variable
UK split completed
JPN split completed
AUS split completed
CND split completed
GRM split completed


## Training Algos

In [6]:
all_country_data_with_algos  = algo.testingAlgoTypes(all_country_data,verbose=1,MP4=MP4)


 
 
 Testing various untrained classification algorithms on each country's seperate sub datasets 
[(69, 2)]
[(68, 2)]
[(119, 2)]
[(118, 2)]
[(34, 2)]
[(34, 2)]
        LDA     RC   LogR    KNN    SVM     RF    GBC     NN    PAC    SGD
UK-1  0.569  0.613  0.562  0.591  0.555  0.847  0.898  0.562  0.563  0.643
UK-2  0.586  0.599  0.591  0.654  0.599  0.802  0.878  0.536  0.552  0.586
UK-3  0.515  0.588    0.5  0.632  0.529  0.824  0.868  0.588  0.515  0.412


[(73, 2)]
[(72, 2)]
[(145, 2)]
[(145, 2)]
[(129, 2)]
[(128, 2)]
         LDA     RC   LogR    KNN    SVM     RF    GBC     NN    PAC    SGD
JPN-1  0.642  0.642  0.635  0.648  0.641  0.655  0.751  0.572  0.538  0.558
JPN-2  0.545   0.59  0.566  0.614  0.569  0.717  0.728  0.555  0.528  0.524
JPN-3  0.634  0.662  0.603  0.716  0.603    0.7  0.778  0.611  0.568  0.557


[(82, 2)]
[(81, 2)]
[(141, 2)]
[(141, 2)]
[(23, 2)]
[(22, 2)]
         LDA     RC   LogR    KNN    SVM     RF    GBC     NN    PAC    SGD
AUS-1  0.546  0.632  0.571  0

In [7]:
all_country_data_with_trained_algos = algo.fineTuneModel(all_country_data_with_algos)


 
 Fine Tuning Parameters for the top 3 predictive algorithms for each country for each sub dataset split by Mentality/Business Cycle 
[(16, 2)]
[(29, 2)]
[(15, 2)]
[(30, 2)]
[(14, 2)]
[(31, 2)]


Voting Classifier using aglortihms within each country within business cycle

In [8]:
##forms a voting ensemble out of the top3 algorithms for each ensemble
a_c_d_w_t_a_and_acc_scores = algo.votingEnsembleTest(all_country_data_with_trained_algos,test_country_data.get('US'))

 
 For each training set country for each sub dataset (split by Mentality Cycle): the top n trained algorithms form a Voting Classifiers. This Voting Classifiers is then tested on its corresponding US sub data set. An aggregate scocre for each trainging set country is calculated through an Aggregation of its 3 Voting Classifiers' performances
Voting Classifier trained on UK Mentality Cycle 1 has accuracy: 0.8304093567251462
Voting Classifier trained on UK Mentality Cycle 2 has accuracy: 0.8982456140350877
Voting Classifier trained on UK Mentality Cycle 3 has accuracy: 0.9217391304347826
Aggregated Classifier trained on UK has accuracy: 0.882661996497373 

Voting Classifier trained on JPN Mentality Cycle 1 has accuracy: 0.7076023391812866
Voting Classifier trained on JPN Mentality Cycle 2 has accuracy: 0.6771929824561403
Voting Classifier trained on JPN Mentality Cycle 3 has accuracy: 0.5043478260869565
Aggregated Classifier trained on JPN has accuracy: 0.6514886164623468 

Voting Class

Voting classifiers using voting classifiers across countries for a given business business cycle

In [9]:
algo.votingEnsembleTest2ndLayer(a_c_d_w_t_a_and_acc_scores,test_country_data.get('US'),2)

For each mentality cycle, the top 3 or 2 Voting Classifiers across countries are combined to form a 2nd Level Voting Classifier
Mentality Cycle 1 2nd Layer Voting Classifier Ensemble has accuracy: 0.8888888888888888
Mentality Cycle 2 2nd Layer Voting Classifier Ensemble has accuracy: 0.9368421052631579
Mentality Cycle 3 2nd Layer Voting Classifier Ensemble has accuracy: 0.9043478260869565
Aggregated accuracy of 2nd Layer Voting Classifiers is: 0.9159369527145359
