# Introduction
The bike has 20 gears but the power has a linear relationship to the speed. The cadence doesn't matter. So the observed speed could be as as a feature with a linear model of any tacx data set (data sets of any gear).
In the same gear we can correlate the cadence values of the tacx and the app's data set and find the observations in the tacx data set of same cadence values. The power value of the tacx data set can be predicted by the app's cadence value and mapped to the speed then.


# Problem
Question: For each cadence data set in the app data predict the power value in the tacx data set by using the nearest neighbors regressor with euclidean distance metric

# Solution
The idea is to

1. Use Radius nearest neighbor regressor with distance of 0.5 so use the cadence value minus 0.5 and plus 0.5 rpm

In [62]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import RadiusNeighborsRegressor

from src.tcx import Tcx, COLUMN_NAME_SPEED, COLUMN_NAME_WATTS, COLUMN_NAME_CADENCE, COLUMN_NAME_ACCELERATION, COLUMN_NAME_CADENCE_RATE
from src.test_data import TrainDataSet

tcx_app_gear7: Tcx = Tcx.read_tcx(file_path='test/tcx/cadence_1612535177298-gear7.tcx')
tcx_tacx_gear7: Tcx = Tcx.read_tcx(file_path='test/tcx/tacx-activity_6225123072-gear7-resistance3.tcx')
tcx_tacx_gear20: Tcx = Tcx.read_tcx(file_path='test/tcx/tacx-activity_6225123072-gear7-resistance3.tcx')

# generate test data
dts_app_gear7: TrainDataSet = TrainDataSet(tcx_app_gear7)
dts_tacx_gear7: TrainDataSet = TrainDataSet(tcx_tacx_gear7)
df_tacx_gear7 = dts_tacx_gear7.get_dataframe()
df_app_gear7 = dts_app_gear7.get_dataframe()


def create_regressor(X, y):
    regressor = RadiusNeighborsRegressor(radius=0.5, weights='distance')
    regressor.fit(X, y)
    return regressor


# predict
y_predicted = create_regressor(df_tacx_gear7[[COLUMN_NAME_CADENCE]], df_tacx_gear7[[COLUMN_NAME_WATTS]]).predict(df_app_gear7[[COLUMN_NAME_CADENCE]])
print(y_predicted)

# verification
X_train, X_test, y_train, y_test = train_test_split(df_tacx_gear7[[COLUMN_NAME_CADENCE]], df_tacx_gear7[[COLUMN_NAME_WATTS]], train_size=0.92)

regressor = create_regressor(X_train, y_train)

print("R² score={:.2f}".format(regressor.score(X_test, y_test)))

[[         nan]
 [ 95.5       ]
 [ 95.5       ]
 [ 96.8       ]
 [100.19230769]
 [ 98.90909091]
 [ 96.8       ]
 [ 95.5       ]
 [ 93.5       ]
 [ 87.5       ]
 [ 77.        ]
 [ 74.5       ]
 [ 77.        ]
 [ 77.        ]
 [ 78.        ]
 [ 77.        ]
 [ 82.        ]
 [ 88.91666667]
 [ 88.91666667]
 [ 94.        ]
 [ 93.5       ]
 [ 88.91666667]
 [ 88.91666667]
 [ 85.61290323]
 [ 85.61290323]
 [ 84.53846154]
 [ 82.        ]
 [ 80.4       ]
 [ 79.66666667]
 [ 78.        ]
 [ 78.        ]
 [ 78.        ]
 [ 80.4       ]
 [ 85.61290323]
 [ 88.91666667]
 [ 94.        ]
 [ 95.5       ]
 [ 98.90909091]
 [ 85.61290323]
 [ 88.91666667]
 [ 88.91666667]
 [101.8       ]
 [101.8       ]
 [102.94117647]
 [102.94117647]
 [102.94117647]
 [102.94117647]
 [105.        ]
 [106.28571429]
 [106.28571429]
 [106.28571429]
 [108.2       ]
 [112.05882353]
 [113.48      ]
 [117.13333333]
 [117.13333333]
 [113.48      ]
 [112.05882353]
 [108.2       ]
 [108.2       ]
 [106.28571429]
 [108.2       ]
 [108.2 

