# Import asqlcell

In [1]:
import asqlcell

# Data Preview
The data comes from A-shares.  
code: Stock code  
date: Date  
O: Opening price  
H: Highest price  
C: Closing price  
L: Lowest price  

In [2]:
%%sql
SELECT *
FROM stock.parquet
LIMIT 100

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

# Feature Engineering

The following sql code generates the relevant indicators of the stock, as well as the earnings of buying at the opening price of the next day and selling at the closing price of T+1 and T+2 respectively.
  
Description of the indicator,    
LOW5：5-day low  
HIGH5: 5-day high  
FA: 5-day average price  
FB: LOW5 / HIGH5  
FC: The closing price of the day divided by the 5-day low  
FD: The closing price of the day divided by the 5-day high  
FE: The daily high divided by the 5-day maximum of the low  
FF: 5-day average price divided by 10-day average price  
FG: 5th closing price  
FH: Median value of the 10-day closing price  
FI: Median of the difference between the 10-day closing price and the median MEDIAN(ABS(x-MEDIAN(x)))  
FJ: Correlation between 10-day highs and lows  
FK: Slope of 10-day H and L linear regression  
canBuy: Whether you can buy on the next trading day of the current trading day  
gain: The gain from buying at the opening price of the next trading day of the current trading day and selling at the closing price of T+1  
gain3: The gain of buying at the opening price of the next trading day of the current trading day, and selling at the closing price of T+2  
  
WINDOW  
Five: 5-day window, sorted by date partition by code  
ten: 10-day window, sorted by date partition by code  
norm: general window, unlimited number, sorted by date partition by code  
  
WHERE  
Filter gain fluctuates too much, it may be dirty data  
Filter for trading days where you can't buy

In [3]:
%%sql mytable
SELECT *
FROM
(
	select code, date,
			min(L) OVER five as LOW5, 
			max(H) OVER five as HIGH5,
			--Feature
			avg(C) OVER five as FA,
			LOW5 / HIGH5 as FB,
			C / LOW5 as FC,
			C / HIGH5 as FD,
			max(H / L) OVER five as FE,
			avg(C) OVER five / avg(C) over ten as FF,
			C / nth_value(C, 5) over ten as FG,
			quantile_cont(C, 0.5) OVER ten FH, --Median value
			mad(C) OVER ten FI,
			corr(H, L) OVER ten FJ,
			regr_slope(H, L) OVER ten FK,
			--2 Days return ratio
			lead(L, 1, null) OVER norm < lead(H, 1, null) OVER norm as canBuy,
			lead(C, 2, null) OVER norm / lead(O, 1, null) OVER norm - 1 as gain,
			lead(C, 3, null) OVER norm / lead(O, 1, null) OVER norm - 1 as gain3
	FROM stock.parquet
	WINDOW
		five AS (PARTITION BY code ORDER BY date ASC ROWS BETWEEN 4 PRECEDING AND 0 FOLLOWING),
		ten AS (PARTITION BY code ORDER BY date ASC ROWS BETWEEN 9 PRECEDING AND 0 FOLLOWING),
		norm AS (PARTITION BY code ORDER BY date ASC)
) a
WHERE gain is not null AND abs(gain) < 0.30 AND gain3 is not null AND
		canBuy is not null AND canBuy

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

The type of each field in mytable

In [4]:
[{column:mytable.dtypes[column].name} for column in mytable]

[{'code': 'object'},
 {'date': 'datetime64[ns]'},
 {'LOW5': 'float64'},
 {'HIGH5': 'float64'},
 {'FA': 'float64'},
 {'FB': 'float64'},
 {'FC': 'float64'},
 {'FD': 'float64'},
 {'FE': 'float64'},
 {'FF': 'float64'},
 {'FG': 'float64'},
 {'FH': 'float64'},
 {'FI': 'float64'},
 {'FJ': 'float64'},
 {'FK': 'float64'},
 {'canBuy': 'bool'},
 {'gain': 'float64'},
 {'gain3': 'float64'}]

# Percentage Return Ratio
Sort the indicator into 6 quantiles by day and calculate the average return for each quantile.    
 

In [5]:
%%sql
SELECT RA, avg(gain) * 100 GA, avg(gain3) * 100 GA3, count(1) c
FROM
(
	SELECT cast(percent_rank() OVER wa * 5 as int) RA, gain, gain3
	FROM mytable
	WINDOW wa as (PARTITION BY date ORDER BY FA)
)
GROUP BY RA

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

Here the syntax of GROUP BY CUBE is used to directly generate all subsets in parentheses, calculate the average payoff in each subcube.  
This method is more convenient for joint analysis of indicators.  

In [6]:
%%sql
SELECT RA, RB, RC, avg(gain) * 100 GA, avg(gain3) * 100 GA3, count(1) c
FROM
(
	SELECT cast(percent_rank() OVER wa * 5 as int) RA,
			cast(percent_rank() OVER wb * 5 as int) RB,
			cast(percent_rank() OVER wc * 5 as int) RC,
			gain, gain3
	FROM mytable
	WINDOW
		wa as (PARTITION BY date ORDER BY FA),
		wb as (PARTITION BY date ORDER BY FB),
		wc as (PARTITION BY date ORDER BY FC)
)
GROUP BY CUBE (RA, RB, RC)

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

# Correlation Analysis
Correlation analysis, calculated using 'colunms', shows that FF is negatively correlated with gain, in other words, when the FF indicator is larger, the more likely the stock is to fall in the short term.  

In [7]:
%%sql
SELECT corr(columns('F.+'), gain)
FROM mytable

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

# Performance Optimization
Use the following statement to save data in parquet format, which will greatly improve the efficiency of data storage and file loading.

In [8]:
%%sql
COPY (SELECT * FROM stock.parquet) to 'stock.parquet' (FORMAT PARQUET)

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

Export to the partition table of parquet

In [9]:
%%sql
COPY (SELECT * FROM stock.parquet) to 'stock' (FORMAT PARQUET, PARTITION_BY date, ALLOW_OVERWRITE TRUE)

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

Specify the partition by WHERE

In [10]:
%%sql
COPY (SELECT * FROM stock.parquet WHERE date='2020-09-02') to 'stock' (FORMAT PARQUET, PARTITION_BY date, ALLOW_OVERWRITE TRUE)

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

Load partition table

In [11]:
%%sql data
SELECT * FROM read_parquet('stock/*/*.parquet', HIVE_PARTITIONING=1)

SqlcellWidget(data_range=(0, 10, ''), index_sort=('', 0))

# Reference Python variables directly in %sql statements

In [12]:
colunm1 = 'FA'
colunm2 = 'FB'
%sql SELECT corr({colunm1}, {colunm2}) from mytable

Unnamed: 0,"corr(""FA"", ""FB"")"
0,-0.118705
