The sole purpose of this script is to modify the date strings in the original data file in order to allow faster loading time.

In [1]:
using Dates
using DataFrames

Loading the tab separated data from disk:

In [2]:
@time dat = readtable("dat.txt", separator = '\t', nastrings = ["NaN"])

elapsed time: 18.136022591 seconds (3961930376 bytes allocated, 31.29% gc time)


Unnamed: 0,Date,Option_Price,Bid,Ask,Volume,Open_Interest,Strike,Expiry,DAX,EONIA_matched,Time_to_Maturity,IsCall
1,2006-07-03,3931.1,,,1,104,1800,2006-12-15,5712.69,0.031667592146348,0.466666666666667,1
2,2006-07-03,0.1,,,0,5515,1800,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,0
3,2006-07-03,3734.0,,,0,2152,2000,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,1
4,2006-07-03,0.1,,,0,20941,2000,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,0
5,2006-07-03,3536.9,,,0,2,2200,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,1
6,2006-07-03,0.1,,,0,4626,2200,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,0
7,2006-07-03,3339.8,,,0,2009,2400,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,1
8,2006-07-03,0.1,,,0,13367,2400,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,0
9,2006-07-03,0.2,,,0,2297,2600,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,0
10,2006-07-03,2945.9,,,0,624,2800,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,1


In [3]:
eltypes(dat)

12-element Array{Type{T<:Top},1}:
 UTF8String
 Float64   
 Float64   
 Float64   
 Int64     
 Int64     
 Int64     
 UTF8String
 Float64   
 Float64   
 Float64   
 Int64     

First, the `UTF8Strings` need to be converted to `Dates`.

In [4]:
@time begin
    dat[:Date] = Date(array(dat[:, 1]));
    dat[:Expiry] = Date(array(dat[:, :Expiry]));
    dat[:IsCall] = bool(array(dat[:, :IsCall]));
end
dat

elapsed time: 105.086115973 seconds (10924043520 bytes allocated, 62.57% gc time)


Unnamed: 0,Date,Option_Price,Bid,Ask,Volume,Open_Interest,Strike,Expiry,DAX,EONIA_matched,Time_to_Maturity,IsCall
1,2006-07-03,3931.1,,,1,104,1800,2006-12-15,5712.69,0.031667592146348,0.466666666666667,true
2,2006-07-03,0.1,,,0,5515,1800,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,false
3,2006-07-03,3734.0,,,0,2152,2000,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,true
4,2006-07-03,0.1,,,0,20941,2000,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,false
5,2006-07-03,3536.9,,,0,2,2200,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,true
6,2006-07-03,0.1,,,0,4626,2200,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,false
7,2006-07-03,3339.8,,,0,2009,2400,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,true
8,2006-07-03,0.1,,,0,13367,2400,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,false
9,2006-07-03,0.2,,,0,2297,2600,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,false
10,2006-07-03,2945.9,,,0,624,2800,2006-12-15,5712.69,0.0316675921463482,0.466666666666667,true


Now, `Date` columns need to be transformed to date numbers.

In [5]:
typeof(Dates.value(dat[:Date][1]))

Int64

In [6]:
nObs = size(dat, 1)

2025129

In [7]:
dateNumbs = Int64[Dates.value(dat[ii, :Date]) for ii=1:nObs]
expiryNumbs = Int64[Dates.value(dat[ii, :Expiry]) for ii=1:nObs]

dat[:Date] = dateNumbs
dat[:Expiry] = expiryNumbs

dat

Unnamed: 0,Date,Option_Price,Bid,Ask,Volume,Open_Interest,Strike,Expiry,DAX,EONIA_matched,Time_to_Maturity,IsCall
1,732495,3931.1,,,1,104,1800,732660,5712.69,0.031667592146348,0.466666666666667,true
2,732495,0.1,,,0,5515,1800,732660,5712.69,0.0316675921463482,0.466666666666667,false
3,732495,3734.0,,,0,2152,2000,732660,5712.69,0.0316675921463482,0.466666666666667,true
4,732495,0.1,,,0,20941,2000,732660,5712.69,0.0316675921463482,0.466666666666667,false
5,732495,3536.9,,,0,2,2200,732660,5712.69,0.0316675921463482,0.466666666666667,true
6,732495,0.1,,,0,4626,2200,732660,5712.69,0.0316675921463482,0.466666666666667,false
7,732495,3339.8,,,0,2009,2400,732660,5712.69,0.0316675921463482,0.466666666666667,true
8,732495,0.1,,,0,13367,2400,732660,5712.69,0.0316675921463482,0.466666666666667,false
9,732495,0.2,,,0,2297,2600,732660,5712.69,0.0316675921463482,0.466666666666667,false
10,732495,2945.9,,,0,624,2800,732660,5712.69,0.0316675921463482,0.466666666666667,true


Now that the data is the desired format it can be saved to disk:

In [8]:
writetable("optData.csv", dat, separator = ',')

## Session info

In [9]:
versioninfo()

Julia Version 0.3.5
Commit a05f87b* (2015-01-08 22:33 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3


In [10]:
Pkg.status()

18 required packages:
 - DataArrays                    0.2.9
 - DataFrames                    0.6.0
 - Dates                         0.3.2
 - Debug                         0.0.4
 - Distributions                 0.6.3
 - EconDatasets                  0.0.2
 - GLM                           0.4.2
 - Gadfly                        0.3.10
 - IJulia                        0.1.16
 - JuMP                          0.7.3
 - MAT                           0.2.9
 - NLopt                         0.2.0
 - Quandl                        0.4.0
 - RDatasets                     0.1.1
 - Taro                          0.1.2
 - TimeData                      0.5.1
 - TimeSeries                    0.4.6
 - Winston                       0.11.7
56 additional packages:
 - ArrayViews                    0.4.8
 - BinDeps                       0.3.7
 - Blosc                         0.1.1
 - Cairo                         0.2.22
 - Calculus                      0.1.5
 - Codecs                        0.1.3
 - Color      