Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale_x_bd seems to break geom_rect #10

Open
the-tourist- opened this issue Apr 19, 2016 · 9 comments
Open

scale_x_bd seems to break geom_rect #10

the-tourist- opened this issue Apr 19, 2016 · 9 comments

Comments

@the-tourist-
Copy link

the-tourist- commented Apr 19, 2016

I am considering trying to write a package of geoms and stats that would reproduce the quantmod charts in ggplot. One issue is that ggplot treats all dates as equal, where-as it is normal when looking at price charts to ignore weekends and holidays. bdscale offers an elegant solution to this. But the problem is it seems to break geom_rect, which is needed if I want to create a candlestick geom type. I'm not sure why this happens, maybe because geom_rect doesn't use x, instead xmin and xmax? In the following example p1 prints as expected, but with weekends as spaces. p2 which uses bdscale doesn't print, with a warning "Removed 73 rows containing missing values (geom_rect)"

library(quantmod)
library(ggplot2)
library(bdscale)

getSymbols("AMZN")
AMZN <- adjustOHLC(AMZN)
colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data)
BDDates <- bd2t(index(Data), DateRange)

p1 <- 
  ggplot(Data) + 
  geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) +
  scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen"))
print(p1)

p2 <- 
  ggplot(Data) + 
  geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close - Open)))) +
  scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) +
  scale_x_bd(business.dates = index(Data))
print(p2)
@dvmlls
Copy link
Owner

dvmlls commented Apr 19, 2016

Hey I can look a bit more when I'm at a computer later, but have you tried
this?

http://stackoverflow.com/a/32964192/908042

The code for bdscale doesn't handle fractional days, which makes it tricky
to do things like ohlc elegantly. Boxplot works fine, but segment and
apparently rect do not. The last time I looked into this, tried to reverse
engineer boxplot but didn't get anywhere in the hour or two I was willing
to sink into it.

Anyway, thanks for using the package!
On Apr 19, 2016 8:14 AM, "Graeme" notifications@github.com wrote:

I am considering trying to write a package of geoms and stats that would
reproduce the quantmod charts in ggplot. One issue is that ggplot treats
all dates as equal, where-as it is normal when looking at price charts to
ignore weekends and holidays. bdscale offers an elegant solution to this.
But the problem is it seems to break geom_rect, which is needed if I want
to create a candlestick geom type. I'm not sure why this happens, maybe
because geom_rect doesn't use x, instead xmin and xmax? In the following
example p1 prints as expected, but with weekends as spaces. p2 which uses
bdscale doesn't print, with a warning "Removed 73 rows containing missing
values (geom_rect)"

`library(quantmod)
library(ggplot2)
library(bdscale)

getSymbols("AMZN")
AMZN <- adjustOHLC(AMZN)
colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data)
BDDates <- bd2t(index(Data), DateRange)

p1 <-
ggplot(Data) +
geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin =
pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close -
Open)))) +
scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen"))
print(p1)

p2 <-
ggplot(Data) +
geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1,
ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill =
as.factor(sign(Close - Open)))) +
scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) +
scale_x_bd(business.dates = index(Data))
print(p2)
`


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#10

@the-tourist-
Copy link
Author

Hi,

Thanks for the quick reply.

I'll have a look at boxplot later, your example seemed to work quite well. I hadn't actually thought of using it, my first instinct was to use the rect and linerange geoms to create the candlesticks which worked well but not with bdscale.

I'd say not being able to work with bdscale would pretty much make any quant geom fairly useless, or at least unused. So I really appreciated your quick response.

On an aside, can I ask you a lazy question. It's one of those questions that I probably should fact-check before sounding stupid, but I'm far to lazy to do that. One minor thing I noticed in bdscales is that, although it intelligently chose the major date breaks, it didn't format them as ggplot does by default. But when I looked at the code on github, the date formatting was coded. Did this not get implemented, or am I asking a stupid lazy question?

Cheers,

Graeme

Sent from my iPad

On 19 Apr 2016, at 14:59, dvmlls notifications@github.com wrote:

Hey I can look a bit more when I'm at a computer later, but have you tried
this?

http://stackoverflow.com/a/32964192/908042

The code for bdscale doesn't handle fractional days, which makes it tricky
to do things like ohlc elegantly. Boxplot works fine, but segment and
apparently rect do not. The last time I looked into this, tried to reverse
engineer boxplot but didn't get anywhere in the hour or two I was willing
to sink into it.

Anyway, thanks for using the package!
On Apr 19, 2016 8:14 AM, "Graeme" notifications@github.com wrote:

I am considering trying to write a package of geoms and stats that would
reproduce the quantmod charts in ggplot. One issue is that ggplot treats
all dates as equal, where-as it is normal when looking at price charts to
ignore weekends and holidays. bdscale offers an elegant solution to this.
But the problem is it seems to break geom_rect, which is needed if I want
to create a candlestick geom type. I'm not sure why this happens, maybe
because geom_rect doesn't use x, instead xmin and xmax? In the following
example p1 prints as expected, but with weekends as spaces. p2 which uses
bdscale doesn't print, with a warning "Removed 73 rows containing missing
values (geom_rect)"

`library(quantmod)
library(ggplot2)
library(bdscale)

getSymbols("AMZN")
AMZN <- adjustOHLC(AMZN)
colnames(AMZN) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")

Data <- AMZN["2016/"]

DateRange <- index(Data)
BDDates <- bd2t(index(Data), DateRange)

p1 <-
ggplot(Data) +
geom_rect(aes(xmin = DateRange - 0.5, xmax = DateRange + 0.5, ymin =
pmin(Open, Close), ymax = pmax(Open, Close), fill = as.factor(sign(Close -
Open)))) +
scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen"))
print(p1)

p2 <-
ggplot(Data) +
geom_rect(aes(x = BDDates, y = Close, xmin = BDDates, xmax = BDDates + 1,
ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill =
as.factor(sign(Close - Open)))) +
scale_fill_manual(values = c("DarkRed", "DarkGreen", "DarkGreen")) +
scale_x_bd(business.dates = index(Data))
print(p2)
`


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#10


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@dvmlls
Copy link
Owner

dvmlls commented Apr 19, 2016

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr)
library(bdscale)
library(ggplot2)
library(quantmod)
library(magrittr)
library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>% 
  set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>%
  mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close), 
                     fill=open<close, group=date, middle=pmin(open,close))) + 
  geom_boxplot(stat='identity') +
  ggtitle("SPY: 2015") +
  xlab('') + ylab('') + theme(legend.position='none')

It displays this:
image

If I then add the bdscale:

# all that previous stuff +
  scale_x_bd(business.dates=input$date, max.major.breaks=10)

Then it gives me this:
image

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

# all that previous stuff +
  scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

image

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.

@the-tourist-
Copy link
Author

I'm sitting outside enjoying the sun. So can't actually confirm this, I'm nowhere near a computer. But when I looked at the code of scale_x_bd last night I think it included setting a label format variable, which doesn't get used later in the code. But I am sitting outside in the sun, and could be wrong on this.

Sent from my iPad

On 19 Apr 2016, at 16:46, dvmlls notifications@github.com wrote:

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr)
library(bdscale)
library(ggplot2)
library(quantmod)
library(magrittr)
library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>%
set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>%
mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close),
fill=open<close, group=date, middle=pmin(open,close))) +
geom_boxplot(stat='identity') +
ggtitle("SPY: 2015") +
xlab('') + ylab('') + theme(legend.position='none')
It displays this:

If I then add the bdscale:

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10)
Then it gives me this:

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@the-tourist-
Copy link
Author

month_format <- function(date) format(date, "%b '%y")
quarter <- function(date) ceiling(as.integer(format(date, '%m')) / 3)
quarter_format <- function(date) sprintf("Q%s '%s", quarter(date), format(date, '%y'))
year_format <- function(date) format(date, '%Y')

Sent from my iPad

On 19 Apr 2016, at 16:46, dvmlls notifications@github.com wrote:

Not quite sure what you mean - are you talking about the axis label formatting?

If I do this (no bdscale):

library(dplyr)
library(bdscale)
library(ggplot2)
library(quantmod)
library(magrittr)
library(scales)

getSymbols("SPY", from = Sys.Date() - 1460, to = Sys.Date(), adjust = TRUE, auto.assign = TRUE)

input <- data.frame(SPY["2015/"]) %>%
set_names(c("open", "high", "low", "close", "volume", "adjusted")) %>%
mutate(date=as.Date(rownames(.)))

input %>% ggplot(aes(x=date, ymin=low, ymax=high, lower=pmin(open,close), upper=pmax(open,close),
fill=open<close, group=date, middle=pmin(open,close))) +
geom_boxplot(stat='identity') +
ggtitle("SPY: 2015") +
xlab('') + ylab('') + theme(legend.position='none')
It displays this:

If I then add the bdscale:

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10)
Then it gives me this:

You can format the labels by passing a labels= parameter, but I agree it should be intelligently tied to the breaks you choose (see #2):

all that previous stuff +

scale_x_bd(business.dates=input$date, max.major.breaks=10, labels=date_format("%b %Y"))

Not sure why it still has the axis label date there - maybe something changed in ggplot2 2.0.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@dvmlls
Copy link
Owner

dvmlls commented Apr 19, 2016

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

  • format all the dates using the given format, e.g.
    • Jan 1 2012 --> Q1 2012
    • Jan 2 2012 --> Q1 2012
    • ...
  • group by the formatted value and find the first item, e.g.
    • Q1 2012 --> Jan 1 2012
  • use that as the date for the break
bd_breaks <- function(business.dates, n.max=5) {

  breaks.weeks    <- firstInGroup(business.dates, last_monday)
  breaks.months   <- firstInGroup(business.dates, month_format)
  breaks.quarters <- firstInGroup(business.dates, quarter_format)
  breaks.years    <- firstInGroup(business.dates, year_format)
  breaks.years.5  <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5))
  breaks.decades  <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10))
...

@the-tourist-
Copy link
Author

As a user of your package, the first thing I would do is format the dates. Rather than every user having to format the dates themselves, it would make sense to have them intelligently formatted like ggplot does. But reality suggests you have a real job, unlike myself, and could do it if you had the time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out where it should put the breaks.

format all the dates using the given format, e.g.
Jan 1 2012 --> Q1 2012
Jan 2 2012 --> Q1 2012
...
group by the formatted value and find the first item, e.g.
Q1 2012 --> Jan 1 2012
use that as the date for the break
bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday)
breaks.months <- firstInGroup(business.dates, month_format)
breaks.quarters <- firstInGroup(business.dates, quarter_format)
breaks.years <- firstInGroup(business.dates, year_format)
breaks.years.5 <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/5))
breaks.decades <- firstInGroup(business.dates, function(ds) floor(as.integer(format(ds, '%Y'))/10))
...

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@dvmlls
Copy link
Owner

dvmlls commented Apr 19, 2016

It's open source, pull requests welcome!

I think fixing this would mean having a single "object" that controls both
breaks and formatting. When it chooses breaks, store some state that later
gets pulled out to format the labels. Maybe the state is just the format
string itself.

Take a stab at it!
On Apr 19, 2016 11:31 AM, "Graeme" notifications@github.com wrote:

As a user of your package, the first thing I would do is format the dates.
Rather than every user having to format the dates themselves, it would make
sense to have them intelligently formatted like ggplot does. But reality
suggests you have a real job, unlike myself, and could do it if you had the
time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out
where it should put the breaks.

format all the dates using the given format, e.g.
Jan 1 2012 --> Q1 2012
Jan 2 2012 --> Q1 2012
...
group by the formatted value and find the first item, e.g.
Q1 2012 --> Jan 1 2012
use that as the date for the break
bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday)
breaks.months <- firstInGroup(business.dates, month_format)
breaks.quarters <- firstInGroup(business.dates, quarter_format)
breaks.years <- firstInGroup(business.dates, year_format)
breaks.years.5 <- firstInGroup(business.dates, function(ds)
floor(as.integer(format(ds, '%Y'))/5))
breaks.decades <- firstInGroup(business.dates, function(ds)
floor(as.integer(format(ds, '%Y'))/10))
...

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#10 (comment)

@the-tourist-
Copy link
Author

If I can I'll try to look at it. I have quite a lot of things I'm working
on simultaneously, but I'll try at some point to try to understand how it
works, which I admit I didn't after quickly looking through it.

On Tue, Apr 19, 2016 at 6:35 PM, dvmlls notifications@github.com wrote:

It's open source, pull requests welcome!

I think fixing this would mean having a single "object" that controls both
breaks and formatting. When it chooses breaks, store some state that later
gets pulled out to format the labels. Maybe the state is just the format
string itself.

Take a stab at it!

On Apr 19, 2016 11:31 AM, "Graeme" notifications@github.com wrote:

As a user of your package, the first thing I would do is format the
dates.
Rather than every user having to format the dates themselves, it would
make
sense to have them intelligently formatted like ggplot does. But reality
suggests you have a real job, unlike myself, and could do it if you had
the
time, but probably don't have that time.

Sent from my iPad

On 19 Apr 2016, at 17:04, dvmlls notifications@github.com wrote:

I don't use those to format the axis labels, I use those to figure out
where it should put the breaks.

format all the dates using the given format, e.g.
Jan 1 2012 --> Q1 2012
Jan 2 2012 --> Q1 2012
...
group by the formatted value and find the first item, e.g.
Q1 2012 --> Jan 1 2012
use that as the date for the break
bd_breaks <- function(business.dates, n.max=5) {

breaks.weeks <- firstInGroup(business.dates, last_monday)
breaks.months <- firstInGroup(business.dates, month_format)
breaks.quarters <- firstInGroup(business.dates, quarter_format)
breaks.years <- firstInGroup(business.dates, year_format)
breaks.years.5 <- firstInGroup(business.dates, function(ds)
floor(as.integer(format(ds, '%Y'))/5))
breaks.decades <- firstInGroup(business.dates, function(ds)
floor(as.integer(format(ds, '%Y'))/10))
...

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#10 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#10 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants