Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Extraction Bug: FFT Data Leakage causing Fake Result #5

Closed
nova-land opened this issue May 20, 2023 · 1 comment
Closed

Feature Extraction Bug: FFT Data Leakage causing Fake Result #5

nova-land opened this issue May 20, 2023 · 1 comment

Comments

@nova-land
Copy link

nova-land commented May 20, 2023

Main Problem

The following code produces the data leakage FFT feature.

close_fft = np.fft.fft(np.asarray(data_combine['Close'].tolist()))
fft_df = pd.DataFrame({'fft':close_fft})
fft_df['absolute'] = fft_df['fft'].apply(lambda x: np.abs(x))
fft_df['angle'] = fft_df['fft'].apply(lambda x: np.angle(x))

plt.figure(figsize=(14, 7), dpi=100)
fft_list = np.asarray(fft_df['fft'].tolist())
for num_ in [3, 6, 9, 27, 81, 100]:
    fft_list_m10= np.copy(fft_list); fft_list_m10[num_:-num_] = 0
    data_combine[f'FT_{num_}components'] = np.fft.ifft(fft_list_m10)
    plt.plot(np.fft.ifft(fft_list_m10), label='Fourier transform with {} components'.format(num_))
plt.plot(data_combine['Close'].values,  label='Real')

What goes wrong

  • Generate the FFT from the whole time series.
  • Use the future data to provide the previous FFT value.

What can happen

Even an MLP model can have a good result in predicting the next day's up/down trend.

Solution

To provide FFT feature without data leakage. You will need to generate it by each bar.

such as :

for i in range(1, len(df)):
    window = df[:i]['close']
    index_data.append(df.index[i])
    fft_close = np.fft.fft(window.values)
    absolute = np.abs(fft_close)
    angle = np.angle(absolute)
    ...

After this alteration, the model will perform very badly.

@ChickenBenny
Copy link
Owner

thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants