# Get Historical Data

In [1]:
import pfeed as pe

pe.__version__

'0.0.2'

In [2]:
bybit_feed = pe.BybitFeed(data_tool='polars', use_ray=True)
yfinance_feed = pe.YahooFinanceFeed(data_tool='pandas', use_ray=False)

## Get Historical Data from Bybit

In [3]:
bybit_df = bybit_feed.get_historical_data(
    'BTC_USDT_PERP',
    rollback_period='2d',  # rollback 2 days
    resolution='1m',  # 1-minute data  
)

2025-02-03 14:25:23,106	INFO worker.py:1841 -- Started a local Ray instance.
Running BYBIT dataflows: 100%|[38;2;191;97;51m██████████[0m| 1/1 [00:06<00:00,  6.20s/it]


2025-02-03 14:25:32,523	INFO worker.py:1841 -- Started a local Ray instance.
Running BYBIT dataflows:   0%|[38;2;191;97;51m          [0m| 0/1 [00:00<?, ?it/s]2025-02-03T14:25:36+0800.692 | INFO | bybit_data | loaded BYBIT:2025-02-01:CRYPTO:BYBIT:BTC_USDT_PERP:1_MINUTE data to CACHE | dataflow.py fn:_load ln:150
2025-02-03T14:25:39+0800.115 | INFO | bybit_data | loaded BYBIT:2025-02-02:CRYPTO:BYBIT:BTC_USDT_PERP:1_MINUTE data to CACHE | dataflow.py fn:_load ln:150
Running BYBIT dataflows: 100%|[38;2;191;97;51m██████████[0m| 1/1 [00:06<00:00,  6.01s/it]


By calling just **one line of code** above, now you can play with the clean data returned.

In [4]:
bybit_df.collect().tail(1)

ts,resolution,product,symbol,open,high,low,close,volume
datetime[ns],str,str,str,f64,f64,f64,f64,f64
2025-02-02 23:59:00,"""1m""","""BTC_USDT_PERP""","""BTCUSDT""",97670.1,97692.8,97631.4,97645.7,48.57


## Get Historical Data from Yahoo Finance

In [5]:
'''
yfinance_feed uses `yfinance` to fetch data, 
so you can pass in kwargs supported by yfinance. Please refer to yfinance's doc:
https://ranaroussi.github.io/yfinance/index.html
'''
yfinance_kwargs = {}  
yfinance_df = yfinance_feed.get_historical_data(
    'TSLA_USD_STK',  # STK = stock
    resolution='1d',  # 1-day data
    start_date='2025-01-01',
    end_date='2025-01-31',
    **yfinance_kwargs
)

Running YAHOO_FINANCE dataflows: 100%|[38;2;103;159;109m██████████[0m| 31/31 [00:05<00:00,  5.38it/s]


Running YAHOO_FINANCE dataflows:   0%|[38;2;103;159;109m          [0m| 0/1 [00:00<?, ?it/s]2025-02-03T14:25:58+0800.458 | INFO | yahoo_finance_data | loaded YAHOO_FINANCE:(from)2025-01-01:(to)2025-01-31:YAHOO_FINANCE:TSLA_USD_STK:1_DAY data to CACHE | dataflow.py fn:_load ln:150
Running YAHOO_FINANCE dataflows: 100%|[38;2;103;159;109m██████████[0m| 1/1 [00:01<00:00,  1.29s/it]


In [6]:
yfinance_df.head(1)

Unnamed: 0,ts,resolution,product,symbol,open,high,low,close,volume,dividends,splits
0,2025-01-02 05:00:00,1d,TSLA_USD_STK,TSLA,390.100006,392.730011,373.040009,379.279999,109710700.0,0.0,0.0


```{caution} Auto-Resampling
:class: dropdown
Auto-resampling will be applied based on the data's original resolution and the target resolution.
e.g. if '1second' data is downloaded and the target resolution is '1minute', the data will be resampled accordingly.

**Non-standard columns will be DROPPED** because resampling logic is not defined for them.
e.g. if resampling bybit's tick data to '1minute' data, the 'tickDirection' column will be dropped.

To **avoid this data loss**, use the **same resolution as the downloaded data** and perform resampling manually.
```
