In [1]:
import sys
sys.path.append('../')

from tickdata import dataset

**dataset()** provides a simplified interface for loading tick data.

Users only need to specify _*stock*_ (stock code), _*dbdir*_ (database path), and corresponding _*user*_ (username) and _*psw*_ (password) to load the tick data, which is constructed by **TickData**.

We also provide some basic interface setting in **config/data.json** to further customize loading data.

In [2]:
stock = '000001'
dbdir = '~/OneDrive/python-programs/reinforcement-learning/data/20140704'
user = 'cra001'
password = 'cra001'
td = dataset(stock, dbdir, user, password)

**TickData** provides abundant attrubutes and functions for data description and further processing data.

User can get timestamp series of quote or trade **TickData.quote_timeseries** or **TickData.trade_timeseries**. It returns a **list** sonsist of timestamps that denote the millisecond of a day.

In [12]:
"%s" % td.quote_timeseries[:10]

'[34200000, 34202000, 34204000, 34209000, 34212000, 34214000, 34218000, 34220000, 34223000, 34227000]'

User can use **TickData.get_quote(oonstructed by **pandas.DataFrame**. Argument _*t*_ can be a **int** timestamp or a **list** of timestamps

In [4]:
td.get_quote(td.quote_timeseries[0])

Unnamed: 0,time,bid1,bsize1,ask1,asize1,bid2,bsize2,ask2,asize2,bid3,...,ask8,asize8,bid9,bsize9,ask9,asize9,bid10,bsize10,ask10,asize10
0,34200000,9.96,4000.0,9.97,73699.0,9.95,19500.0,9.98,45600.0,9.94,...,10.04,85160.0,9.88,30600.0,10.05,342440.0,9.87,45200.0,10.06,51556.0


In [15]:
td.get_quote(td.quote_timeseries[:5])

Unnamed: 0,time,bid1,bsize1,ask1,asize1,bid2,bsize2,ask2,asize2,bid3,...,ask8,asize8,bid9,bsize9,ask9,asize9,bid10,bsize10,ask10,asize10
0,34200000,9.96,4000.0,9.97,73699.0,9.95,19500.0,9.98,45600.0,9.94,...,10.04,85160.0,9.88,30600.0,10.05,342440.0,9.87,45200.0,10.06,51556.0
1,34202000,9.95,110800.0,9.97,44209.0,9.94,7000.0,9.98,96420.0,9.93,...,10.04,93460.0,9.87,49600.0,10.05,347890.0,9.86,38200.0,10.06,54256.0
2,34204000,9.95,110800.0,9.97,35209.0,9.94,7000.0,9.98,96420.0,9.93,...,10.04,93460.0,9.87,49600.0,10.05,347890.0,9.86,38200.0,10.06,54256.0
3,34209000,9.95,110800.0,9.97,35509.0,9.94,7000.0,9.98,96420.0,9.93,...,10.04,93460.0,9.87,49700.0,10.05,347890.0,9.86,38200.0,10.06,54256.0
4,34212000,9.95,110800.0,9.96,1500.0,9.94,7000.0,9.97,35409.0,9.93,...,10.03,96800.0,9.87,49700.0,10.04,93460.0,9.86,38200.0,10.05,347890.0


If _*t*_ is not given, it will load all of the quote data. The usage of **TickData.get_trade** to get specified trade data is the same as **TickData.get_quote**.

Use **TickData.pre_quote(t)** to get previous **one** quote of _*t*_, which can be a timestamp _*str*_ or a quote _*pandas.DataFrame*_. If there is no previous quote data, return _*None*_.

In [14]:
td.get_quote()

Unnamed: 0,time,bid1,bsize1,ask1,asize1,bid2,bsize2,ask2,asize2,bid3,...,ask8,asize8,bid9,bsize9,ask9,asize9,bid10,bsize10,ask10,asize10
0,34200000,9.960,4000.0,9.970,73699.0,9.950,19500.0,9.980,45600.0,9.940,...,10.040,85160.0,9.880,30600.0,10.050,342440.0,9.870,45200.0,10.060,51556.0
1,34202000,9.950,110800.0,9.970,44209.0,9.940,7000.0,9.980,96420.0,9.930,...,10.040,93460.0,9.870,49600.0,10.050,347890.0,9.860,38200.0,10.060,54256.0
2,34204000,9.950,110800.0,9.970,35209.0,9.940,7000.0,9.980,96420.0,9.930,...,10.040,93460.0,9.870,49600.0,10.050,347890.0,9.860,38200.0,10.060,54256.0
3,34209000,9.950,110800.0,9.970,35509.0,9.940,7000.0,9.980,96420.0,9.930,...,10.040,93460.0,9.870,49700.0,10.050,347890.0,9.860,38200.0,10.060,54256.0
4,34212000,9.950,110800.0,9.960,1500.0,9.940,7000.0,9.970,35409.0,9.930,...,10.030,96800.0,9.870,49700.0,10.040,93460.0,9.860,38200.0,10.050,347890.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4736,53987000,9.910,215345.0,9.910,215345.0,0.000,173555.0,0.000,0.0,0.000,...,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0
4737,53990000,9.910,218545.0,9.910,218545.0,0.000,171555.0,0.000,0.0,0.000,...,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0
4738,53993000,9.910,218545.0,9.910,218545.0,0.000,171555.0,0.000,0.0,0.000,...,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0
4739,53997000,9.910,218545.0,9.910,218545.0,0.000,171555.0,0.000,0.0,0.000,...,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0,0.000,0.0


Use **TickData.next_quote(t)** to get next **one** quote of _*t*_. The usage of it is similar to **TickData.previous_quote(t)**

In [None]:
td.next_quote(td.quote_timeseries[0])

Use **get_trade_between(pre_quote, post_quote)** to get trade **pandas.DataFrame** between _*pre_quote*_ and _*post_quote*_, which can be timestamp _*str*_ or _*quote pandas.DataFrame*_. If _*post_quote*_ is not specified, it will automaticly select the next quote after _*pre_quote*_.

In [5]:
td.pre_quote(td.quote_timeseries[0])

Use **TickData.trade_sum(trade)** to group trade data by their price, _*i.e.,*_ combine trade data size of the same price.

In [7]:
td.pre_quote(td.quote_timeseries[2])

Unnamed: 0,time,bid1,bsize1,ask1,asize1,bid2,bsize2,ask2,asize2,bid3,...,ask8,asize8,bid9,bsize9,ask9,asize9,bid10,bsize10,ask10,asize10
1,34202000,9.95,110800.0,9.97,44209.0,9.94,7000.0,9.98,96420.0,9.93,...,10.04,93460.0,9.87,49600.0,10.05,347890.0,9.86,38200.0,10.06,54256.0


In [10]:
trade = td.get_trade_between(td.quote_timeseries[0])
trade

Unnamed: 0,time,price,size
0,34200150,9.960,200.0
1,34200150,9.960,1000.0
2,34200150,9.960,1100.0
3,34200160,9.960,5000.0
4,34200180,9.970,100.0
...,...,...,...
56,34201640,9.970,15991.0
57,34201640,9.970,23099.0
58,34201840,9.950,400.0
59,34201840,9.950,5900.0


In [11]:
td.trade_sum(trade)

Unnamed: 0,price,size
0,9.95,11400.0
1,9.96,133010.0
2,9.97,83090.0
