Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert static plotting from base r to ggplot2 #133

Closed
ErinBecker opened this issue Apr 5, 2018 · 53 comments · Fixed by #228
Closed

convert static plotting from base r to ggplot2 #133

ErinBecker opened this issue Apr 5, 2018 · 53 comments · Fixed by #228
Labels
help wanted Looking for Contributors

Comments

@ErinBecker
Copy link
Contributor

ErinBecker commented Apr 5, 2018

According to the CAC conversation:

For this we will be reading the data in as an sf object, recasting it to a dataframe, and using the stable ggplot on CRAN.

See up to date list of which episodes remain to be converted at the end of this thread.

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

Here is a ggplot lesson BUT it's using the tidy() function to go from sp -> a df. i'm not sure actually how to cast from sf @jsta or @chris-prener do either of you guys know how far off what i wrote is from what we will want to teach sf -> ggplot friendly df.

https://earthdatascience.org/courses/earth-analytics/spatial-data-r/make-maps-with-ggplot-in-R/

@janu123
Copy link

janu123 commented Apr 5, 2018 via email

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

here is my 2 cents. ggmap is super cool but sometimes it doesn't load properly on people's machines. sometimes you need the dev version. SO... i think it's fine to show people ggmap but i wouldn't count on it working 100% of the time for workshops so i'd keep it optional / as a demo. i'd love to hear other people's experiences tho. i normally show students ggmap and maps just in case.

@janu123
Copy link

janu123 commented Apr 5, 2018 via email

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

yes...it's really nice when it works. there are just install issues on some machines. I suggest it is an added component at the end of the plotting lesson - that can be demoed IF the instructor chooses to do so and with the risk that it won't work on some computers. :)

@janu123
Copy link

janu123 commented Apr 5, 2018 via email

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

We will be using leaflet for interactive maps. This particular task is specific to static plots! @tyson-swetnam is working on a leaflet lesson for interactive maps. Plotly is great - but i think it's better for interactive plots vs actual maps. Please focus on the basics of static plotting for this lesson but keep all ideas on the table (we just won't address them all now so we can get the basics covered first!)

@janu123
Copy link

janu123 commented Apr 5, 2018 via email

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

so appreciative of your work on this and the thoughts @janu123 thank you. :)

@janu123
Copy link

janu123 commented Apr 5, 2018 via email

@ErinBecker
Copy link
Contributor Author

Thanks @janu123 and @lwasser for the conversation above. I'm elaborating a bit on the CAC discussion to see if I can help clarify the goals here.

The CAC had an in-depth discussion about the various options for both static and interactive plotting of geospatial data. The options that were considered for static plotting were base R / ggplot / and tmap. For ggplot, there was a detailed conversation about using the stable version of ggplot (which doesn't currently support geospatial data objects) or the development version (which does, but is often difficult to install and has some serious bugs with geom_sf).

There is more detail about the CAC's discussions of these issues

here
here
and here

The decision at this point was to use the stable version of ggplot (on CRAN) for static plotting, and to convert from an sf object to a df.

I don't know the specifics of sf object formatting, so don't know how this works, but @jsta and @chris-prener I believe have both taught this way. Could one of you please clarify what the function is to do this conversion?

@juanfung
Copy link
Contributor

juanfung commented Apr 5, 2018

@ErinBecker I think the easiest way is some_sf_object %>% sf::st_set_geometry(NULL)

@jsta
Copy link
Member

jsta commented Apr 5, 2018

I am very familiar using sf with the development version of ggplot (geom_sf). I have neither taught nor am I very familiar with using sf in combination with the stable ggplot functions. I have only taught sf (and sp) with base-plotting.

My guess is we need to have a three-step process:
sf -> sp
sp -> "tidy df"
"tidy df" -> stable ggplot

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

@juanfung that approach does make a lot of sense given an sf obj is essentially already a df.
i'm in python now and heading to a meeting soon but when i get back to R i'll try it out but i bet it works or something like that. tidy just converts to a DF so it just takes the @DaTa and adds geom stuff for plotting.

or can someone else try it?

@juanfung
Copy link
Contributor

juanfung commented Apr 5, 2018

@lwasser The result is an object of class data.frame, though it won't have coordinates. If you want the coordinates for plotting: sf::st_coordinates(some_sf_object) returns a matrix with the coordinates.

@jsta
Copy link
Member

jsta commented Apr 5, 2018

I think that will get very messy with polygon and line objects right? My feeling is we want to avoid manual tidying of complex coordinates in sf list columns.

@lwasser
Copy link
Member

lwasser commented Apr 5, 2018

totally. i agree we want to avoid manual stuff. i was thinking it would create a nice clean geojson col like geopandas :) but it sounds like that may not be the case! gosh i wish geom_sf was just on the ggplot cran. maybe the workflow about is best then @jsta ? @juanfung we do want to plot and keep attributes for colors.

@juanfung
Copy link
Contributor

juanfung commented Apr 6, 2018

@jsta Yes you are absolutely correct! That approach won't work very well for non-points objects. What you suggested earlier (sf -> sp -> tidy -> ggplot) is the way to go, though it's not pretty.

Is it absolutely necessary that learners use ggplot2 to plot sf objects? The tmap package does all of the conversions in the background: https://cran.r-project.org/web/packages/sf/vignettes/sf5.html#tmap

Note that it fails for some sf objects, but for teaching purposes it should suffice.

@chris-prener
Copy link

hey @lwasser just getting caught up on this. I've tested this a little with the polygon data built into sf. Here is a reprex of the process:

library(broom)
library(ggplot2)

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(sp)

nc_sf <- st_read(system.file("shape/nc.shp", package="sf"), stringsAsFactors = FALSE)
#> Reading layer `nc' from data source `/Users/chris/Library/R/3.4/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

nc_sp <- as(nc_sf, "Spatial")

nc_df <- tidy(nc_sp)
#> Regions defined for each Polygons

ggplot() + 
  geom_polygon(data = nc_df, aes(x = long, y = lat, group = group)) +
  coord_equal()

I agree this is super clunky, and I wish geom_sf() would get released as you do, but it does seem to work...

@obrl-soil
Copy link
Contributor

geom_sf is certainly cleaner, but I don't think the above example is particularly clunky. Clunky is when you start trying to get properly cartographical in software that, lbr, isn't designed for it :P

wrapper packages like tmap, ggmap, and my fave, ggspatial can make things a little more brief, e.g.

library(sf)
library(sp)
library(ggplot2)
library(ggspatial)

nc_sf <- st_read(system.file("shape/nc.shp", package="sf"), stringsAsFactors = FALSE)

ggplot() +
  geom_spatial(as(nc_sf, 'Spatial'), aes(fill = AREA)) +
  coord_equal()

but at the expense of making it clear how spatial data is translated into a plottable form. I think that's an important concept to grasp.

@chris-prener
Copy link

hey @obrl-soil I was thinking of clunky more from a teaching perspective - my workflow above uses three different data frame types that would all have to be introduced. ggspatial() does have the advantage of "hiding" that mess. Not sure what you think @lwasser?

@lwasser
Copy link
Member

lwasser commented Apr 9, 2018

This is great discussion @chris-prener @obrl-soil !! Here is my 2 cents :)
If we think about fundamentals i think showing the tidy() step is good because it allows for a teaching moment where you explain that ggplot() fundamentally expects some sort of data.frame object. In the lesson you could show them the before object (sf) vs the after tidy object (df) and intermediate sp (eeks) if you'd like. Then we could have a nice breakout that talks about geom_sf() being available but in dev and we could even show them in that breakout how to install and use it but not explicitly teach it in the class...

with the hopes in 6 months it's no longer in dev! i think that might be a good approach for the time being if you guys agree too!! i think its valuable for students to understand ggplot plays nicely with df's in general! :)

@lachlandeer
Copy link
Contributor

Hi @lwasser @chris-prener @obrl-soil

it would be great to come up with a one example that many of us agree with so that we can post that here as an example and then create a todo list of the other graphs to update. We can then hopefully use some contributors from the bug BBQ to help us go through and make the updates.

I'm not a ggplot / mapping guru otherwise i would propose one myself ;) But if I get a sense of what we want I'd be happy to use my European time zone difference (advantage?) to get started on making some changes tomorrow morning or pick up from where we get to over the next 12 hours

@lwasser
Copy link
Member

lwasser commented Apr 12, 2018

@lachlandeer THANK YOU for helping us all get organized on this!! And for helping improve the lessons!!

I think the example that @chris-prener provided earlier could be good:
@jsta do you agree as well? and @lachlandeer does that example make sense to you?

# convert sf obj to sp
nc_sp <- as(nc_sf, "Spatial")
# tidy the data (ie convert to a df)
nc_df <- tidy(nc_sp)

# plot using ggplot
ggplot() + 
  geom_polygon(data = nc_df, aes(x = long, y = lat, group = group)) +
  coord_equal()

in this example you are plotting polygons
points will be a bit different if i remember correctly.

@lachlandeer
Copy link
Contributor

@lwasser - yes that looks like something pretty understandable!

Do you know if we can do all if this in a pipe?

i.e would the following (plus modifications most likely) work?

nc_df %>%
as("Spatial") %>%
tidy() %>%
ggplot() + 
         geom_polygon(aes(STUFF)) +
         coord_equal()

@lwasser
Copy link
Member

lwasser commented Apr 12, 2018

try it! it could work just fine @lachlandeer i haven't tested it but know people pipe sf objects a lot!
the as("Spatial") line is the one i'm not sure how the pipe will handle. but totally try it. pipes are great for students to see.

i think it would be good to show the non pipes syntax in a breakout just in case it's cognative overload for students to absorb pipes... @chris-prener what has been your experience with pipes AND spatial data? i've not used them with spatial stuff but my students LOVE pipes once they understand it.

@MicheleTobias
Copy link

I agree about the non-pipes syntax option.

@lachlandeer
Copy link
Contributor

Yes - both are defs useful

in the R-intro lesson as it stands I have left the dplyr stuff and there are some examples at the end (that I want to simplify by removing facet wraps) that do the "data-manipulation piped into ggplot" style so that learners have a reference to look at with a "simpler" rectangular dataset

@ErinBecker
Copy link
Contributor Author

Hi all, just wanted to add my two cents.

It seems to me (not a geospatial person), that the "basic" workflow we're looking at here is

  1. read in data as an sf object
  2. convert to a dataframe
  3. use standard ggplot syntax

Since the second step is turning out to be more complex than we perhaps at first expected, my proposal is to make the dataframe outside of the lesson and provide it to the learners (i.e. not require the learners to understand the code that generated the dataframe). This would enable the learners to focus on learning plotting rather than data restructuring. Is this something that would be possible?

We've done a similar thing in the Genomics lesson where we created a data structure "behind the scenes" and presented it to the learners as a done deal.

@lwasser
Copy link
Member

lwasser commented Apr 12, 2018

hey @ErinBecker !! this is a good thought... but i don't think it's an ideal approach. sf is essentially a df like object. step 2 is actually
convert from

  1. sf to sp
  2. sp to df

so you can use the tidy function which allows for ggplot plotting. This is the crux of spatial data in R and it's important for them to understand that ggplot doesn't natively plot spatial data. you need to do something

i think it's a good idea to explain what's happening here. i don't suggest altering the format to a df which is not a standard thing a user will encounter. the real goal of these lessons is to support users working with real data!

@pmarchand1
Copy link
Contributor

It seems most of the discussion of moving from base to ggplot was based on sf objects. For rasters, another advantage of using base plot (or specifically, the 'plot' method overloaded by the raster package) is that the whole data doesn't need to fit in R memory.

@chris-prener
Copy link

ok got yah @lwasser - I've never used base for plotting spatial data so that is where the real limit is! It seems like, based on what you and @pmarchand1 have said, that one option is to teach ggplot2 for vector but keep rasters in base. Or we could use rasterVis, which I just googled and it looks super cool...

@lachlandeer
Copy link
Contributor

so after beating my head against the wall if we do the hillshade with geom_tile and the then the geom_raster for elevation we can get something semi-good...

hill_shade <- ggplot() +
                geom_tile(data = DSM_hill_HARV_df , 
                          aes(x = x, y = y, 
                              alpha = fct_elevation)
                          ) +
                ggtitle("Hillshade - DSM - NEON Harvard Forest Field Site")

# can't use multiple scales, # this doesnt work right now
hill_shade +
    geom_raster(data = DSM_HARV_df ,
                         aes(x = x, y = y, fill = HARV_dsmCrop)
                         ) +
    scale_fill_gradientn(colors = rainbow(100))

yields:
image

which is slowly moving towards how the version from baseplot looked, and probbaly will be better with a finer grid on the grey hillside

Should I pursue this further?

@eliocamp
Copy link

eliocamp commented Apr 13, 2018

Hey, I don't know if this is solves the issue, but the metR package (disclaimer: I wrote it 😁 ) has a geom_relief() that draws a relief map from height data without touching any scale:

library(ggplot2)
library(data.table)
library(metR)

ggplot(melt(volcano), aes(Var1, Var2)) +
    geom_relief(aes(z = value)) +
    geom_tile(aes(fill = value), alpha = 0.5) +
    scale_fill_viridis_c()

http://eliocamp.github.io/metR/

@eliocamp
Copy link

A more general answer is to use ggplot2's annotation_custom().

library(data.table)
library(ggplot2)

data(volcano)
volcano <- as.data.table(melt(volcano, varnames = c("x", "y"),
                              value.name = "h"))

First, you can make one plot with it's own gradient.

(shade <- ggplot(volcano, aes(x, y)) +
    geom_raster(aes(fill = h), alpha = 0.5, interpolate = TRUE) +
    scale_fill_gradient(low = "black", high = "white"))

Then, you take the important part of the plot and add it to a second plot.

grob.shade <- ggplotGrob(shade)
grob.shade <- grob.shade$grobs[[6]]$children[[3]]

ggplot(volcano, aes(x, y)) +
    geom_raster(aes(fill = h), alpha = 0.5, interpolate = TRUE) +
    annotation_custom(grob = grob.shade) +
    scale_fill_gradient(low = "blue", high = "red") +
    coord_fixed() +
    theme_void() 

Of course, this example is meaningless because is the same data both times. The tricky part is the grob.shade <- grob.shade$grobs[[6]]$children[[3]] part. In this case you need to peruse the grob.shade to look for gTree (the 6th member of the list, in this case) and then the children with rec name.

@statnmap
Copy link

statnmap commented Apr 14, 2018

The problem when using geom_raster or geom_tiles directly on your entire raster is that you will quickly be limited with big rasters. You computer and ggplot will not be able to handle 1M pixels rasters. rasterVis::gplot allows for a regular sample of your raster to plot only pixels that are needed for a quick plot.
However, if you want more flexibility on information you plot from your raster, for instance, plotting something below your raster, with raster in semi-transparency, gplot does not allow for it. Personally, I use a modified version of gplot allowing to get the data.frame of sampled raster data, to be plotted using geom_tile.
Have a look at gplot_data in my SDMSelect package and how I use it in the vignette: https://github.com/statnmap/SDMSelect/blob/master/vignettes/SDM_Selection.Rmd#species-distribution-mapping

@paleolimbot
Copy link

There is an annotation_raster() if memory serves, or geom_spraster() in ggspatial that takes care of converting a raster layer into a suitable form, which can get complicated.

@lwasser
Copy link
Member

lwasser commented Apr 16, 2018

you annotation raster and the RStoolbox is the only way i've gotten this to work as well @paleolimbot it avoids the issue with needing 2 diffferent colormaps which ggmap doesn't natively support from what i can tell.

@paleolimbot
Copy link

I should mention that both need a 3 or 4 band raster, RGBA

@ErinBecker
Copy link
Contributor Author

ErinBecker commented Jun 26, 2018

@datacarpentry/geospatial-maintainers - Hi all, as we get closer to our target publication date (mid-July), I'm wondering how we can best move forward on the rest of the base plot --> ggplot conversions. @lachlandeer, @jsta, and @justinmillar did a great job on converting the first four episodes, but we have episodes 5 - 14 that still need converting. I'm unable to tackle this myself, as I'm not familiar with plotting geospatial data, but I'm happy to help coordinate efforts, resolve merge conflicts, and offer any pedagogical guidance that folks request. If you're able to help by converting one or two of the remaining episodes, could you please edit the list below to add your name to the episodes you would like to tackle? Many thanks!

@lwasser @jsta @tyson-swetnam @obrl-soil @janu123 @lachlandeer @chris-prener @juanfung @r4space

@juanfung
Copy link
Contributor

juanfung commented Jul 2, 2018

@ErinBecker I can work on episodes 9 and 10.

Just to clarify, we are using ggplot with sf?

@ErinBecker
Copy link
Contributor Author

@juanfung thanks! I'll put your name down on the list for those episodes. Yes, we are using ggplot with sf. The new release of ggplot2 on CRAN includes sf support, so learners will no longer need to install the development version of ggplot2!!

@ErinBecker
Copy link
Contributor Author

Closes with #228

@ErinBecker
Copy link
Contributor Author

Closed with Addressed with #228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.