Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Julia equivalent to Pandas date_range function #264

Closed
femtotrader opened this issue Jul 19, 2016 · 28 comments
Closed

A Julia equivalent to Pandas date_range function #264

femtotrader opened this issue Jul 19, 2016 · 28 comments

Comments

@femtotrader
Copy link
Contributor

femtotrader commented Jul 19, 2016

Hello,

I wonder if a Julia equivalent to Pandas date_range function exists.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html

If not, adding such a function will be a nice feature.

With such a function you can for example get a date range given start date, periods and freq.

start = Date(2000,1,30)
periods = 9
freq = Dates.Day(1)

function date_range(start; periods=9, freq=Dates.Day(1))
    start:freq:start + (periods - 1) * freq
end

dates  = date_range(start, periods=periods, freq=freq)

But this date_range have many other features

Kind regards

@milktrader
Copy link
Contributor

This is how bracketing currently works

julia> using Base.Dates, TimeSeries, MarketData

julia> start = Date(2000,1,30)
2000-01-30

julia> finish = start + Day(9)
2000-02-08

julia> from(cl, start) |> x -> to(x, finish)
7x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-31 to 2000-02-08

             Close     
2000-01-31 | 103.75    
2000-02-01 | 100.25    
2000-02-02 | 98.81     
2000-02-03 | 103.31    
2000-02-04 | 108.0     
2000-02-07 | 114.06    
2000-02-08 | 114.88    

What do you think?

There are other ways to chain the methods too.

julia> to(from(cl, start),finish)
7x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-31 to 2000-02-08

             Close     
2000-01-31 | 103.75    
2000-02-01 | 100.25    
2000-02-02 | 98.81     
2000-02-03 | 103.31    
2000-02-04 | 108.0     
2000-02-07 | 114.06    
2000-02-08 | 114.88    

@femtotrader
Copy link
Contributor Author

Thanks @milktrader I never heard of bracketing methods previously.
Do you know some online documentation about it ?
It looks like Pandas pipe or %>% operator used in dplyr R package.
I think this kind of methods are useful to filter data... not to create them.

@femtotrader
Copy link
Contributor Author

In fact I'd like to create this timeserie in Julia

In [15]: pd.Series(
   ....:         np.array([1., 1., 1., 1., 1., 1., 1., 1., 1.]) / 100,
   ....:         index=pd.date_range('2000-1-30', periods=9, freq='D'))
Out[15]:
2000-01-30    0.01
2000-01-31    0.01
2000-02-01    0.01
2000-02-02    0.01
2000-02-03    0.01
2000-02-04    0.01
2000-02-05    0.01
2000-02-06    0.01
2000-02-07    0.01
Freq: D, dtype: float64

@milktrader
Copy link
Contributor

By bracketing I meant the to and from methods.

documentation

@femtotrader
Copy link
Contributor Author

I didn't know |> operator.

@milktrader
Copy link
Contributor

milktrader commented Jul 19, 2016

Something like this?

You could probably do better on constructing the dates variable though.

julia> dates  = [Date(1999,1,1):Date(1999,1,1)+(Day(8))]
9-element Array{Date,1}:
 1999-01-01
 1999-01-02
 1999-01-03
 1999-01-04
 1999-01-05
 1999-01-06
 1999-01-07
 1999-01-08
 1999-01-09

julia> TimeArray(dates, ones(9)/100, [""])
9x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 1999-01-01 to 1999-01-09


1999-01-01 | 0.01    
1999-01-02 | 0.01    
1999-01-03 | 0.01    
1999-01-04 | 0.01    

1999-01-06 | 0.01    
1999-01-07 | 0.01    
1999-01-08 | 0.01    1999-01-09 | 0.01    

@milktrader
Copy link
Contributor

Yeah the |> used to be simply | but I think that was colliding with or, can't remember off hand.

@milktrader
Copy link
Contributor

I think though it would be nice to still construct a TimeArray without providing a column names vector. Sounds like an issue to me.

@femtotrader
Copy link
Contributor Author

On my side I get a warning when creating dates

julia> dates  = [Date(1999,1,1):Date(1999,1,1)+(Dates.Day(8))]
WARNING: [a] concatenation is deprecated; use collect(a) instead
 in depwarn at deprecated.jl:73
 in oldstyle_vcat_warning at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in vect at abstractarray.jl:32
while loading no file, in expression starting on line 0
9-element Array{Date,1}:
 1999-01-01
 1999-01-02
 1999-01-03
 1999-01-04
 1999-01-05
 1999-01-06
 1999-01-07
 1999-01-08
 1999-01-09

is it really necessary to urge user to use collect ?

@milktrader
Copy link
Contributor

Yeah, that was old boilerplate code so it would be best to comply with that deprecation warning.

@milktrader
Copy link
Contributor

julia> collect(Date(2000,1,1,):Date(2000,1,1)+Day(8))
9-element Array{Date,1}:
 2000-01-01
 2000-01-02
 2000-01-03
 2000-01-04
 2000-01-05
 2000-01-06
 2000-01-07
 2000-01-08
 2000-01-09

@milktrader
Copy link
Contributor

Just changed the documentation to reflect this. Thanks @femtotrader

@milktrader
Copy link
Contributor

Closing this thinking there is nothing left to be done. Ping me if there is something left to do.

@femtotrader
Copy link
Contributor Author

femtotrader commented Aug 11, 2016

Here is Pandas date_range docstring

In [4]: ?pd.date_range
Signature: pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
Docstring:
Return a fixed frequency datetime index, with day (calendar) as the default
frequency

Parameters
----------
start : string or datetime-like, default None
    Left bound for generating dates
end : string or datetime-like, default None
    Right bound for generating dates
periods : integer or None, default None
    If None, must specify start and end
freq : string or DateOffset, default 'D' (calendar daily)
    Frequency strings can have multiples, e.g. '5H'
tz : string or None
    Time zone name for returning localized DatetimeIndex, for example
Asia/Hong_Kong
normalize : bool, default False
    Normalize start/end dates to midnight before generating date range
name : str, default None
    Name of the resulting index
closed : string or None, default None
    Make the interval closed with respect to the given frequency to
    the 'left', 'right', or both sides (None)

maybe a convenience function like this will help many user who wants to deals with time series.

@milktrader milktrader reopened this Aug 11, 2016
@milktrader
Copy link
Contributor

@femtotrader do you want to code this?

@femtotrader
Copy link
Contributor Author

femtotrader commented Aug 12, 2016

I think a first step is to see how Pandas do this.

two of start, end, or periods are required but we can have three given

This is a quite "complex" function
see https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py#L390

@milktrader
Copy link
Contributor

milktrader commented Aug 12, 2016

Not really, just a bunch of control flow statements. And this isn't pandas anyway. I don't mind taking some good ideas but there is no obligation to mimic pandas

@milktrader
Copy link
Contributor

milktrader commented Aug 14, 2016

Possible names ...

range
segment
slice
timestamp

@milktrader
Copy link
Contributor

Another thing to do is to create a new version of the timestamp method so you could duplicate your original example

@milktrader
Copy link
Contributor

Without opening up Python and pandas, it looks like pandas.date_range is just a constructor for a timestamp, no?

I'm not sure something like that belongs in TimeSeries. Maybe base would consider it?

It might be useful in another package too. Maybe MarketData.jl would support it since it's part of generating a TimeArray for testing/research?

@milktrader
Copy link
Contributor

milktrader commented Aug 14, 2016

Here you go

julia> using Base.Dates, TimeSeries

julia> function timestamps(start::TimeType, freq::DataType, obs::Int, step::Int=1)
         steps   = freq(step)
         periods = obs * step
         finish  = start + freq(periods - 1)
         collect(start:steps:finish)
       end
timestamps (generic function with 2 methods)

julia> TimeArray(timestamps(Date(2000,1,30), Day, 9), ones(9) * .01)
9x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-30 to 2000-02-07


2000-01-30 | 0.01    
2000-01-31 | 0.01    
2000-02-01 | 0.01    
2000-02-02 | 0.01    

2000-02-04 | 0.01    
2000-02-05 | 0.01    
2000-02-06 | 0.01    
2000-02-07 | 0.01 

@femtotrader
Copy link
Contributor Author

femtotrader commented Aug 14, 2016

Why are you using collect ?

I think it's more efficient to keep range...

TimeArray should accept range and collect should be used inside.

See JuliaLang/julia#18024

@milktrader
Copy link
Contributor

milktrader commented Aug 14, 2016

Why are you using collect ?

I'm confused, do you mean why use collect in the timestamps function above?

@femtotrader
Copy link
Contributor Author

femtotrader commented Aug 14, 2016

Yes, why using collect in timestamps function ?

Why not doing:

julia> function timestamps(start::TimeType, freq::DataType, obs::Int, step::Int=1)
         steps   = freq(step)
         periods = obs * step
         finish  = start + freq(periods - 1)
         start:steps:finish
       end

TimeArray contructor might accept StepRange as first parameter and use collect

@milktrader
Copy link
Contributor

milktrader commented Aug 14, 2016

Ah, ok I see. Yep that should work

You need to collect.

@milktrader milktrader reopened this Aug 14, 2016
@milktrader
Copy link
Contributor

milktrader commented Aug 14, 2016

Would a simple TimeArray constructor taking on the arguments of the example timestamps accomplish what you're looking for @femtotrader

@milktrader
Copy link
Contributor

julia> function TimeArray(start::TimeType, freq::DataType, obs::Int, step::Int=1)
        steps   = freq(step)
       periods = obs * step
       finish  = start + freq(periods - 1)
       TimeArray(collect(start:steps:finish), zeros(obs))
       end
TimeSeries.TimeArray{T,N,D<:Base.Dates.TimeType,A<:AbstractArray{T,N}}

julia> TimeArray(today(), Day, 6)
6x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2016-08-14 to 2016-08-19


2016-08-14 | 0       
2016-08-15 | 0       
2016-08-16 | 0       
2016-08-17 | 0       
2016-08-18 | 0       
2016-08-19 | 0       

@milktrader
Copy link
Contributor

Please submit a PR if you'd like to see this in TimeSeries and we could discuss it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants