# ðŸ”µCorn Data 

In [1]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.graph_objects as go
from VersaQT.data_manip import *

In [2]:
plt.rcParams["figure.figsize"] = (10, 6)  # Width=10, Height=6 (in inches)

In [3]:
soy = yf.Ticker("ZS=F")
soy_history = soy.history(period="1y")
soy_history = soy_history.asfreq("B")

print(soy.info["shortName"])

# Interpolando os dados
soy_history = soy_history.interpolate()

soy_history.reset_index(inplace=True)

Soybean Futures,Mar-2025


In [4]:
corn = yf.download("ZC=F", start="2020-01-01", end="2023-12-31", interval="1d")
corn

[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,ZC=F,ZC=F,ZC=F,ZC=F,ZC=F
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,391.50,392.00,387.25,387.75,103987
2020-01-03,386.50,392.00,385.50,391.50,125931
2020-01-06,384.75,387.75,382.75,386.25,112130
2020-01-07,384.50,385.75,383.50,385.00,93541
2020-01-08,384.25,385.50,382.25,384.00,135523
...,...,...,...,...,...
2023-12-22,473.00,474.00,471.50,472.50,80484
2023-12-26,480.25,481.00,471.50,473.00,114198
2023-12-27,476.50,480.25,474.50,479.75,107950
2023-12-28,474.25,478.75,474.00,476.00,99398


In [None]:
corn.columns = corn.columns.get_level_values(0)
corn

Price,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-02,391.50,392.00,387.25,387.75,103987
2020-01-03,386.50,392.00,385.50,391.50,125931
2020-01-06,384.75,387.75,382.75,386.25,112130
2020-01-07,384.50,385.75,383.50,385.00,93541
2020-01-08,384.25,385.50,382.25,384.00,135523
...,...,...,...,...,...
2023-12-22,473.00,474.00,471.50,472.50,80484
2023-12-26,480.25,481.00,471.50,473.00,114198
2023-12-27,476.50,480.25,474.50,479.75,107950
2023-12-28,474.25,478.75,474.00,476.00,99398


In [6]:
fig = go.Figure()

fig.add_trace(go.Scatter(y=corn.Close, x=corn.index))

fig.show()

# ðŸ”µBars

In finance, bars refers to rows in tables. We usually consider different strategies to construct bar - that is, contruct your data.


## ðŸ”µ Time Bars (Time Series)
The problem with time bars is that they are sampled in a fixed time interval and the market doens't process information at a constant pace. More activity happens following the Open, than the hour around noon.
- Time bars oversample information during low-activity periods
- And undersample information during high-activity ones

They also exhibit poor statistical properties

## ðŸ”µDollar Bars

Sample original data every time a pre-defined market value is exchanged. 
- Have better statistical properties
- Reflect better the market

**IntuiÃ§Ã£o**: Considere um stock que vocÃª comprou $1000 e que teve apreciaÃ§Ã£o de 100% em um perÃ­odo. Para vender $1000 dessa aÃ§Ã£o no final desse perÃ­odo, vocÃª sÃ³ precisÃ¡ vender metade das participaÃ§Ã£o que vocÃª comprou em primeiro lugar.
- O nÃºmero de aÃ§Ãµes trocadas Ã© uma funÃ§Ã£o do valor de cada aÃ§Ã£o


Dollar bars tend to remove behaviours that only appear due to the nature of the price market and on how humans behave. 
- Trying to capture only the significant infomration from the data
- **Data Quality**

### ðŸ”µImplementation

$$
    \text{Dollar Volume} = \text{price} * \text{volume}
$$

In [7]:
corn['DollarVolume'] = corn['Close'] * corn['Volume']  # Using 'Close'  price for the calculation
corn["CummDollarVolume"] = corn["DollarVolume"].cumsum()

- Cummulative sum with a dollar threshold
- Generate BarID

Rows with the same BardID will be condense into a single bar

In [8]:
dollar_threshold1 = 1_000_000_000
corn1 = corn.__deepcopy__()
corn1["BarId"] = corn1["CummDollarVolume"]//dollar_threshold1
corn1["BarId"] = corn1["BarId"].shift(1, fill_value=0)
corn1 = corn1.reset_index()


dollar_threshold = 1_000_000
corn["BarId"] = corn["CummDollarVolume"]//dollar_threshold
corn["BarId"] = corn["BarId"].shift(1, fill_value=0)
corn = corn.reset_index()
corn.head(20)

Price,Date,Close,High,Low,Open,Volume,DollarVolume,CummDollarVolume,BarId
0,2020-01-02,391.5,392.0,387.25,387.75,103987,40710910.0,40710910.0,0.0
1,2020-01-03,386.5,392.0,385.5,391.5,125931,48672330.0,89383240.0,40.0
2,2020-01-06,384.75,387.75,382.75,386.25,112130,43142020.0,132525300.0,89.0
3,2020-01-07,384.5,385.75,383.5,385.0,93541,35966510.0,168491800.0,132.0
4,2020-01-08,384.25,385.5,382.25,384.0,135523,52074710.0,220566500.0,168.0
5,2020-01-09,383.25,387.0,382.25,383.75,130937,50181610.0,270748100.0,220.0
6,2020-01-10,385.75,386.75,376.5,383.25,226659,87433710.0,358181800.0,270.0
7,2020-01-13,389.5,389.5,385.5,386.5,139994,54527660.0,412709500.0,358.0
8,2020-01-14,389.0,390.5,388.25,389.25,115957,45107270.0,457816700.0,412.0
9,2020-01-15,387.5,390.25,386.5,388.75,141478,54822720.0,512639500.0,457.0


Aggregating all variables according to bars
- Some are summed
- Others you take the first, max, last, min, etc.

In [9]:
corn1_dollar_bars = corn1.groupby("BarId").agg({
    "Open": "first",
    "High": "max",
    "Low": "min",
    "Close": "last",
    "Volume": "sum",
    "DollarVolume": "sum",
    "Date": ["first", "last"]
})

corn1_dollar_bars.columns = ['Open', 'High', 'Low', 'Close', 'Volume', 'DollarVolume', 'StartTime', 'EndTime']

In [10]:
corn_dollar_bars = corn.groupby("BarId").agg({
    "Open": "first",
    "High": "max",
    "Low": "min",
    "Close": "last",
    "Volume": "sum",
    "DollarVolume": "sum",
    "Date": ["first", "last"]
})

corn_dollar_bars.columns = ['Open', 'High', 'Low', 'Close', 'Volume', 'DollarVolume', 'StartTime', 'EndTime']
corn_dollar_bars.head(20)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,DollarVolume,StartTime,EndTime
BarId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0.0,387.75,392.0,387.25,391.5,103987,40710910.0,2020-01-02,2020-01-02
40.0,391.5,392.0,385.5,386.5,125931,48672330.0,2020-01-03,2020-01-03
89.0,386.25,387.75,382.75,384.75,112130,43142020.0,2020-01-06,2020-01-06
132.0,385.0,385.75,383.5,384.5,93541,35966510.0,2020-01-07,2020-01-07
168.0,384.0,385.5,382.25,384.25,135523,52074710.0,2020-01-08,2020-01-08
220.0,383.75,387.0,382.25,383.25,130937,50181610.0,2020-01-09,2020-01-09
270.0,383.25,386.75,376.5,385.75,226659,87433710.0,2020-01-10,2020-01-10
358.0,386.5,389.5,385.5,389.5,139994,54527660.0,2020-01-13,2020-01-13
412.0,389.25,390.5,388.25,389.0,115957,45107270.0,2020-01-14,2020-01-14
457.0,388.75,390.25,386.5,387.5,141478,54822720.0,2020-01-15,2020-01-15


# ðŸ”µ Graphs

In [11]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=corn.Date, y=corn.Close, name="Time Series"))
fig.add_trace(go.Scatter(x=corn1_dollar_bars["StartTime"], y=corn1_dollar_bars["Close"], name="Dollar Bars - T1"))
fig.add_trace(go.Scatter(x=corn_dollar_bars["StartTime"], y=corn_dollar_bars["Close"], name="Dollar Bars - T2"))

fig.show()

In [12]:
fig = go.Figure(
    go.Candlestick(x=corn_dollar_bars["StartTime"],
                   open=corn_dollar_bars["Open"],
                   high=corn_dollar_bars["High"],
                   low=corn_dollar_bars["Low"],
                   close=corn_dollar_bars["Close"])
)

fig.update_layout(
    title="Dollar Bars Candle Stick Graph",
    width=1200,
    height=600
)

fig.show()

In [13]:
corn = yf.download("ZC=F", start="2020-01-01", end="2023-12-31", interval="1d")
corn.columns = corn.columns.get_level_values(0)
corn.reset_index(inplace=True)
corn

[*********************100%***********************]  1 of 1 completed


Price,Date,Close,High,Low,Open,Volume
0,2020-01-02,391.50,392.00,387.25,387.75,103987
1,2020-01-03,386.50,392.00,385.50,391.50,125931
2,2020-01-06,384.75,387.75,382.75,386.25,112130
3,2020-01-07,384.50,385.75,383.50,385.00,93541
4,2020-01-08,384.25,385.50,382.25,384.00,135523
...,...,...,...,...,...,...
1001,2023-12-22,473.00,474.00,471.50,472.50,80484
1002,2023-12-26,480.25,481.00,471.50,473.00,114198
1003,2023-12-27,476.50,480.25,474.50,479.75,107950
1004,2023-12-28,474.25,478.75,474.00,476.00,99398


In [16]:
corn_dollar = get_dollar_bar(corn, 100_000_000_0)
corn_dollar

Unnamed: 0_level_0,Open,High,Low,Close,Volume,DollarVolume,StartTime,EndTime
BarId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0.0,387.75,394.00,375.25,387.25,2640795,1.021563e+09,2020-01-02,2020-01-24
1.0,384.25,388.25,376.00,383.00,2580604,9.850633e+08,2020-01-27,2020-02-12
2.0,382.25,388.50,332.00,335.25,2755668,1.013231e+09,2020-02-13,2020-03-18
3.0,336.50,356.75,317.50,319.75,3005440,1.009171e+09,2020-03-19,2020-04-16
4.0,319.75,330.75,300.25,324.00,3109274,9.892978e+08,2020-04-17,2020-06-03
...,...,...,...,...,...,...,...,...
61.0,476.25,490.00,455.75,488.75,2150872,1.025750e+09,2023-08-24,2023-10-02
62.0,488.00,509.50,482.25,495.50,2133119,1.052591e+09,2023-10-03,2023-10-20
63.0,494.25,497.00,468.00,468.50,1984429,9.476919e+08,2023-10-23,2023-11-07
64.0,469.00,480.50,461.00,470.00,2116480,9.986897e+08,2023-11-08,2023-11-21


In [17]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=corn.Date, y=corn.Close, name="Time Series"))
fig.add_trace(go.Scatter(x=corn_dollar["StartTime"], y=corn_dollar["Close"], name="Dollar Bars - T2"))

fig.show()