# Getting Started with Data Sources

Welcome to **PyBroker**! The best place to start is to learn about [DataSources](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource).  A ```DataSource``` is a class that can fetch data from external sources, which you can then use to backtest your trading strategies.

## Yahoo Finance

One of the built-in ```DataSources``` in **PyBroker** is  [Yahoo Finance](https://finance.yahoo.com). To use it, you can import [YFinance](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.YFinance):

In [1]:
from pybroker import YFinance

yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.270058
1,2021-03-01,MSFT,235.899994,237.470001,233.149994,236.940002,25324000,232.742355
2,2021-03-02,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.631813
3,2021-03-02,MSFT,237.009995,237.300003,233.449997,233.869995,22812500,229.726715
4,2021-03-03,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.608208
...,...,...,...,...,...,...,...,...
501,2022-02-24,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,291.728027
502,2022-02-25,AAPL,163.839996,165.119995,160.869995,164.850006,91974200,163.857407
503,2022-02-25,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,294.421600
504,2022-02-28,AAPL,163.059998,165.419998,162.429993,165.119995,95056600,164.125763


The above code queries data for AAPL and MSFT stocks, and returns a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) with the results.

## Caching Data

If you want to speed up your data retrieval, you can cache your queries using **PyBroker**'s caching system. You can enable caching by calling  [pybroker.enable_data_source_cache('name')](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache) where ```name``` is the name of the cache you want to use:

In [2]:
import pybroker

pybroker.enable_data_source_cache('yfinance')

<diskcache.core.Cache at 0x7fa1c83d1ac0>

The next call to [query](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource.query) will cache the returned data to disk. Each unique combination of ticker symbol and date range will be cached separately:

In [3]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,IBM,115.057358,116.940727,114.588913,115.430206,5977367,104.819473
1,2021-03-01,TSLA,230.036667,239.666672,228.350006,239.476669,81408600,239.476669
2,2021-03-02,IBM,115.430206,116.539200,114.971321,115.038239,4732418,104.463531
3,2021-03-02,TSLA,239.426666,240.369995,228.333328,228.813339,71196600,228.813339
4,2021-03-03,IBM,115.200768,117.237091,114.703636,116.978966,7744898,106.225853
...,...,...,...,...,...,...,...,...
501,2022-02-24,TSLA,233.463333,267.493347,233.333328,266.923340,135322200,266.923340
502,2022-02-25,IBM,122.050003,124.260002,121.449997,124.180000,4460900,118.284569
503,2022-02-25,TSLA,269.743347,273.166656,260.799988,269.956665,76067700,269.956665
504,2022-02-28,IBM,122.209999,123.389999,121.040001,122.510002,6757300,116.693848


Calling ```query``` again with the same ticker symbols and date range returns the cached data:

In [4]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df

Loaded cached bar data.



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,TSLA,230.036667,239.666672,228.350006,239.476669,81408600,239.476669
1,2021-03-02,TSLA,239.426666,240.369995,228.333328,228.813339,71196600,228.813339
2,2021-03-03,TSLA,229.330002,233.566666,217.236664,217.733337,90624000,217.733337
3,2021-03-04,TSLA,218.600006,222.816666,200.000000,207.146667,197758500,207.146667
4,2021-03-05,TSLA,208.686661,209.279999,179.830002,199.316666,268189500,199.316666
...,...,...,...,...,...,...,...,...
248,2022-02-22,IBM,124.199997,125.000000,122.680000,123.919998,5349700,118.036911
249,2022-02-23,IBM,124.379997,124.699997,121.870003,122.070000,4086400,116.274742
250,2022-02-24,IBM,120.000000,122.099998,118.809998,121.970001,6563200,116.179489
251,2022-02-25,IBM,122.050003,124.260002,121.449997,124.180000,4460900,118.284569


You can clear your cache using [pybroker.clear_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.clear_data_source_cache):

In [5]:
pybroker.clear_data_source_cache()

Or disable caching altogether using [pybroker.disable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.disable_data_source_cache):

In [6]:
pybroker.disable_data_source_cache()

Note that these calls should be made after first calling [pybroker.enable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache).

## AKShare

**PyBroker** also includes an [AKShare](https://github.com/akfamily/akshare) ```DataSource``` for fetching **Chinese** stock data. AKShare, a widely-used open-source package, is tailored for obtaining financial data, with a focus on the Chinese market. This free tool provides users with access to higher quality data compared to yfinance for the Chinese market. We will update AKShare ```DataSource``` to fetch data for more markets in the future. To use it, you can import [AKShare](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.AKShare):

In [7]:
from pybroker import AKShare

akshare = AKShare()
# You can substitute 000001.SZ with 000001, and it will still work!
df = akshare.query(['000001.SZ', '000002.SZ'], start_date='3/1/2021', end_date='3/1/2022')
df

Loading bar data...
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,000001.SZ,21.54,21.68,21.18,21.45,1125387,21.04
1,2021-03-01,000002.SZ,33.10,33.71,32.29,33.35,1280834,31.12
2,2021-03-02,000001.SZ,21.62,22.15,21.26,21.65,1473425,21.24
3,2021-03-02,000002.SZ,33.00,34.60,32.90,33.29,1220150,31.06
4,2021-03-03,000001.SZ,21.58,23.08,21.46,23.01,1919635,22.60
...,...,...,...,...,...,...,...,...
483,2022-02-25,000002.SZ,19.95,20.03,19.45,19.53,893970,18.55
484,2022-02-28,000001.SZ,15.90,15.92,15.62,15.75,723990,15.52
485,2022-02-28,000002.SZ,19.50,19.50,18.99,19.20,861954,18.22
486,2022-03-01,000001.SZ,15.79,15.95,15.62,15.92,935040,15.69


**adj_close** represents post-adjusted closing price

## Alpaca

**PyBroker** also includes an [Alpaca](https://alpaca.markets/) ```DataSource``` for fetching stock data. To use it, you can import [Alpaca](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.Alpaca) and provide your API key and secret: 

In [7]:
from pybroker import Alpaca
import os

alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])

You can query ```Alpaca``` for stock data using the same syntax as with Yahoo Finance, but Alpaca also supports querying data by different timeframes. For example, to query 1 minute data:

In [8]:
df = alpaca.query(
    ['AAPL', 'MSFT'], 
    start_date='3/1/2021', 
    end_date='4/1/2021', 
    timeframe='1m'
)
df

Loading bar data...
Loaded bar data: 0:00:06 



Unnamed: 0,date,symbol,open,high,low,close,volume,vwap
0,2021-03-01 04:00:00-05:00,AAPL,124.30,124.56,124.30,124.50,12267.0,124.433365
1,2021-03-01 04:00:00-05:00,MSFT,235.87,236.00,235.87,236.00,1429.0,235.938887
2,2021-03-01 04:01:00-05:00,AAPL,124.56,124.60,124.30,124.30,9439.0,124.481323
3,2021-03-01 04:01:00-05:00,MSFT,236.17,236.17,236.17,236.17,104.0,236.161538
4,2021-03-01 04:02:00-05:00,AAPL,124.00,124.05,123.78,123.78,4834.0,123.935583
...,...,...,...,...,...,...,...,...
33859,2021-03-31 19:57:00-04:00,MSFT,237.28,237.28,237.28,237.28,507.0,237.367870
33860,2021-03-31 19:58:00-04:00,AAPL,122.36,122.39,122.33,122.39,3403.0,122.360544
33861,2021-03-31 19:58:00-04:00,MSFT,237.40,237.40,237.35,237.35,636.0,237.378066
33862,2021-03-31 19:59:00-04:00,AAPL,122.39,122.45,122.38,122.45,5560.0,122.402606


## Alpaca Crypto

If you are interested in fetching cryptocurrency data, you can use [AlpacaCrypto](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.AlpacaCrypto). Here's an example of how to use it:

In [9]:
from pybroker import AlpacaCrypto

crypto = AlpacaCrypto(
    os.environ['ALPACA_API_KEY'], 
    os.environ['ALPACA_API_SECRET']
)
df = crypto.query('BTC/USD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df

Loading bar data...
Loaded bar data: 0:00:01 



Unnamed: 0,symbol,date,open,high,low,close,volume,vwap,trade_count
0,BTC/USD,2020-12-31 19:00:00-05:00,28973.0,29073.5,28775.0,29065.0,3.4437,28968.839097,72.0
1,BTC/USD,2020-12-31 20:00:00-05:00,29070.0,29481.0,29038.5,29404.5,4.6183,29359.399487,65.0
2,BTC/USD,2020-12-31 21:00:00-05:00,29528.0,29528.0,29218.0,29245.0,4.3423,29361.540923,42.0
3,BTC/USD,2020-12-31 22:00:00-05:00,29400.5,29400.5,29337.0,29367.5,0.3089,29400.447394,3.0
4,BTC/USD,2020-12-31 23:00:00-05:00,29449.0,29449.0,29136.5,29189.5,2.0245,29302.743369,34.0
...,...,...,...,...,...,...,...,...,...
736,BTC/USD,2021-01-31 15:00:00-05:00,32754.0,32939.0,32499.0,32893.0,153.3498,32622.897675,98.0
737,BTC/USD,2021-01-31 16:00:00-05:00,32887.0,32887.0,32570.0,32600.0,4.4939,32740.567859,106.0
738,BTC/USD,2021-01-31 17:00:00-05:00,32642.0,33100.0,32642.0,32993.0,52.4213,32717.239656,81.0
739,BTC/USD,2021-01-31 18:00:00-05:00,33059.0,33177.0,33030.0,33089.0,1.4816,33124.196207,59.0


In the above example, we're querying for hourly data for the BTC/USD currency pair.

[In the next notebook, we'll take a look at how to use DataSources to backtest a simple trading strategy](https://www.pybroker.com/en/latest/notebooks/2.%20Backtesting%20a%20Strategy.html).