<a href="https://colab.research.google.com/github/chengshengli/hflf/blob/main/Copy_of_Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
針對 m1fa-5rows.csv文件， 將 [high,low] 作為一個價格整體，[hf,lf] 作為一個指標整體，深入分析其關聯，以及綜合價格拐點 與 指標拐點 的關聯及特徵發現 及 特徵組合。
注： hf 和 lf 都是Delta RLE-Increments 编码 ： 【声明：编码处理按照 【Delta RLE（差分游程编码）+ end + length + sum 是对“主干连续+偶有跳点”的稀疏整数数组非常优雅且高效的表示方法】
RLE-Increments 编码/还原流程 * 编码* 若数组为空：全部为0或空。 计算首元素start。 按顺序计算差分序列，遇到连续+1合并为+1xN；跳跃直接写出（如+10）。 end为最后一个数。 length为元素个数。 sum为总和。 还原 取start，解开deltas（顺序累加），还原所有元素。 检查end是否一致，length是否一致。 检查sum一致性。 校验有误直接丢弃。
边界处理与异常情况 空数组：空串。 单元素：deltas为空。】
'hf' and 'lf' columns 仅使用 'end' 解码出来的字段来继续进行数据分析和特征工程。

通常情况下，在上涨段，[high,low] 以 high 为基准增涨，直到波峰拐点 high出现，随后就是下跌段；在下跌段，[high,low] 以 low 为基准减少，直到波谷拐点 low 出现，随后是上涨段； 这就是价格的上涨 - 下跌 - 上涨 这种交替的出现的描述。
另外当前数据是1分钟最小粒度的数据，是离散数据，有部分非交易时间或无交易的时间的数据空缺；
通常情况下，[hf,lf] 在上涨段 呈现为 hf 的 end 以某种（或某几种）线形或类线形的形式递增，而 lf 一般都是 0 ；在下降段 lf 的 end 会以某种（或某几种）线形或类线形的形式递增，而 hf 一般都是0；


Here is all the data you need:
"m1fa-5rows.csv"

## Data loading

### Subtask:
Load the data from "m1fa-5rows.csv" into a dataframe.


**Reasoning**:
Load the data into a pandas DataFrame and display the first few rows.



In [None]:
import pandas as pd

df = pd.read_csv('m1fa-5rows.csv')
display(df.head())

Unnamed: 0,strDt,high,low,hf,lf
0,2024-10-29T02:10:00,5975.5,5975.5,0||0|0|0,0||0|0|0
1,2024-10-29T08:35:00,5964.25,5964.25,0||0|0|0,0||0|0|0
2,2024-10-29T09:29:00,5968.0,5968.0,0||0|0|0,0||0|0|0
3,2024-10-29T10:12:00,5970.75,5970.5,2||2|1|2,0||0|0|0
4,2024-10-29T10:22:00,5974.75,5974.75,2|+1x1|3|2|5,0||0|0|0


## Data wrangling

### Subtask:
Decode the 'hf' and 'lf' columns using the 'end' value from the RLE-Increments encoding description provided. This will involve extracting the 'end' value from the string representation in each cell for 'hf' and 'lf' columns and creating new numerical columns for these decoded values.


**Reasoning**:
Define a function to extract the 'end' value from the RLE-Increments encoded string.



In [None]:
def extract_end_value(encoded_string):
  """
  Extracts the 'end' value from an RLE-Increments encoded string.

  Args:
    encoded_string: The RLE-Increments encoded string.

  Returns:
    The integer value of 'end' if found, otherwise None.
  """
  if not isinstance(encoded_string, str):
      return None
  parts = encoded_string.split('|')
  if len(parts) >= 4:
    try:
      return int(parts[3])
    except (ValueError, IndexError):
      return None
  return None

df['hf_end'] = df['hf'].apply(extract_end_value)
df['lf_end'] = df['lf'].apply(extract_end_value)

display(df.head())

Unnamed: 0,strDt,high,low,hf,lf,hf_end,lf_end
0,2024-10-29T02:10:00,5975.5,5975.5,0||0|0|0,0||0|0|0,0,0
1,2024-10-29T08:35:00,5964.25,5964.25,0||0|0|0,0||0|0|0,0,0
2,2024-10-29T09:29:00,5968.0,5968.0,0||0|0|0,0||0|0|0,0,0
3,2024-10-29T10:12:00,5970.75,5970.5,2||2|1|2,0||0|0|0,1,0
4,2024-10-29T10:22:00,5974.75,5974.75,2|+1x1|3|2|5,0||0|0|0,2,0


## Data preparation

### Subtask:
Based on the decoded 'hf_end' and 'lf_end' columns, identify potential price turning points. Since 'hf_end' increasing suggests an uptrend and 'lf_end' increasing suggests a downtrend, look for transitions between these states. Also, identify price turning points based on 'high' and 'low' values (peaks and troughs).


**Reasoning**:
Calculate the differences for 'hf_end' and 'lf_end', identify potential indicator turning points, and identify potential price peaks and troughs based on neighboring values. Create boolean columns to flag these points.



In [None]:
df['hf_end_diff'] = df['hf_end'].diff()
df['lf_end_diff'] = df['lf_end'].diff()

# Identify potential indicator turning points
# hf_end turning point: when hf_end_diff changes from non-positive to positive
# lf_end turning point: when lf_end_diff changes from non-positive to positive
df['hf_indicator_turning_point'] = ((df['hf_end_diff'].shift(1) <= 0) & (df['hf_end_diff'] > 0))
df['lf_indicator_turning_point'] = ((df['lf_end_diff'].shift(1) <= 0) & (df['lf_end_diff'] > 0))


# Identify potential price peaks (high is greater than immediate neighbors)
df['price_peak'] = (df['high'] > df['high'].shift(1)) & (df['high'] > df['high'].shift(-1))

# Identify potential price troughs (low is less than immediate neighbors)
df['price_trough'] = (df['low'] < df['low'].shift(1)) & (df['low'] < df['low'].shift(-1))

display(df[['hf_end_diff', 'lf_end_diff', 'hf_indicator_turning_point', 'lf_indicator_turning_point', 'price_peak', 'price_trough']].head())

Unnamed: 0,hf_end_diff,lf_end_diff,hf_indicator_turning_point,lf_indicator_turning_point,price_peak,price_trough
0,,,False,False,False,False
1,0.0,0.0,False,False,False,True
2,0.0,0.0,False,False,False,False
3,1.0,0.0,True,False,False,False
4,1.0,0.0,False,False,False,False


## Data analysis

### Subtask:
Analyze the relationship between the decoded 'hf_end', 'lf_end' and the price movements ([high, low]). Specifically, investigate the correlation between increases in 'hf_end' and rising prices, and increases in 'lf_end' and falling prices. Analyze the relationship between the identified price turning points and the turning points indicated by 'hf_end' and 'lf_end'.


**Reasoning**:
Calculate the correlation coefficients between 'hf_end' and price ('high', 'low'), and 'lf_end' and price ('high', 'low') as per instructions 1 and 2.



In [None]:
correlation_hf_high = df['hf_end'].corr(df['high'])
correlation_hf_low = df['hf_end'].corr(df['low'])
correlation_lf_high = df['lf_end'].corr(df['high'])
correlation_lf_low = df['lf_end'].corr(df['low'])

print(f"Correlation between hf_end and high: {correlation_hf_high}")
print(f"Correlation between hf_end and low: {correlation_hf_low}")
print(f"Correlation between lf_end and high: {correlation_lf_high}")
print(f"Correlation between lf_end and low: {correlation_lf_low}")

Correlation between hf_end and high: 0.020713712562954567
Correlation between hf_end and low: 0.02051047978156996
Correlation between lf_end and high: -0.04798378477228087
Correlation between lf_end and low: -0.04860713610444682


**Reasoning**:
Analyze the co-occurrence of indicator turning points and price turning points as per instructions 3 and 4 by counting the instances where 'hf_indicator_turning_point' coincides with 'price_peak' and 'lf_indicator_turning_point' coincides with 'price_trough'.



In [None]:
# Count occurrences where hf indicator turning point and price peak happen at the same time
hf_peak_coincidence_count = df[df['hf_indicator_turning_point'] & df['price_peak']].shape[0]

# Count occurrences where lf indicator turning point and price trough happen at the same time
lf_trough_coincidence_count = df[df['lf_indicator_turning_point'] & df['price_trough']].shape[0]

print(f"Number of times hf indicator turning point coincides with a price peak: {hf_peak_coincidence_count}")
print(f"Number of times lf indicator turning point coincides with a price trough: {lf_trough_coincidence_count}")

# Optional: Display rows where these coincidences occur
# display(df[df['hf_indicator_turning_point'] & df['price_peak']].head())
# display(df[df['lf_indicator_turning_point'] & df['price_trough']].head())

Number of times hf indicator turning point coincides with a price peak: 4716
Number of times lf indicator turning point coincides with a price trough: 4847


## Feature engineering

### Subtask:
Create new features that capture the relationship between price and the decoded 'hf_end' and 'lf_end'. This could include features like the difference between consecutive 'hf_end' values, the difference between consecutive 'lf_end' values, indicators of when 'hf_end' or 'lf_end' are increasing, and combinations of these with price changes. Also, create features representing the identified price and indicator turning points.


**Reasoning**:
Calculate the differences between consecutive 'hf_end' and 'lf_end' values, create boolean flags for increasing values, calculate price change, and then create interaction features and ensure turning point features are present. This combines steps 1 through 7 of the instructions.



In [None]:
df['hf_end_diff'] = df['hf_end'].diff()
df['lf_end_diff'] = df['lf_end'].diff()
df['hf_end_increasing'] = df['hf_end_diff'] > 0
df['lf_end_increasing'] = df['lf_end_diff'] > 0
df['price_change'] = df['high'].diff()
df['hf_increase_price_change'] = df['hf_end_increasing'] * df['price_change']
df['lf_increase_price_change'] = df['lf_end_increasing'] * df['price_change']

# Ensure turning point features are present (they were created in the previous step)
# The columns 'hf_indicator_turning_point', 'lf_indicator_turning_point',
# 'price_peak', 'price_trough' should already exist in the dataframe.

display(df[['hf_end_diff', 'lf_end_diff', 'hf_end_increasing', 'lf_end_increasing', 'price_change', 'hf_increase_price_change', 'lf_increase_price_change', 'hf_indicator_turning_point', 'lf_indicator_turning_point', 'price_peak', 'price_trough']].head())

Unnamed: 0,hf_end_diff,lf_end_diff,hf_end_increasing,lf_end_increasing,price_change,hf_increase_price_change,lf_increase_price_change,hf_indicator_turning_point,lf_indicator_turning_point,price_peak,price_trough
0,,,False,False,,,,False,False,False,False
1,0.0,0.0,False,False,-11.25,-0.0,-0.0,False,False,False,True
2,0.0,0.0,False,False,3.75,0.0,0.0,False,False,False,False
3,1.0,0.0,True,False,2.75,2.75,0.0,True,False,False,False
4,1.0,0.0,True,False,4.0,4.0,0.0,False,False,False,False


## Summary:

### Data Analysis Key Findings

*   The correlation between the decoded 'hf\_end' and price ('high' and 'low') is very low, close to zero.
*   Similarly, the correlation between the decoded 'lf\_end' and price ('high' and 'low') is also very low, with a slight negative correlation.
*   A significant number of instances (4716) show a coincidence between the 'hf\_indicator\_turning\_point' and a 'price\_peak'.
*   A significant number of instances (4847) show a coincidence between the 'lf\_indicator\_turning\_point' and a 'price\_trough'.
*   New features such as the differences in consecutive 'hf\_end' and 'lf\_end' values, boolean flags for when 'hf\_end' or 'lf\_end' are increasing, and interaction terms combining these increases with price changes have been successfully created.

### Insights or Next Steps

*   While linear correlation is weak, the high number of coincidences between indicator turning points and price turning points suggests that 'hf\_end' and 'lf\_end' may serve as useful signals for price reversals, possibly in a non-linear or temporal manner. Further investigation into the timing and sequence of these turning points is warranted.
*   The newly engineered features, including differences, increasing flags, and interaction terms, can be used to build predictive models to forecast price movements or identify trading opportunities based on the combined behavior of price and these indicators.
