Skip to content
This repository was archived by the owner on Aug 4, 2020. It is now read-only.
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion R_visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,31 @@ Challenge Questions:
* Fill in arguments for ggplot() in order to produce a new plot.
* Correct a mistake in a call to ggplot().

# First we are going to input data
*The text below is all rough, and much of it is just a start

First we are going to input data

`dataset<-read.table("file/name/here",sep=...,header=T)`

There are several differerent summary statistics that we can run

`mean(dataset$variable)
sd(dataset$variable)
quantile(dataset$variable, c(0.025,0.975))`

These statistics desribe how a particular variable is distributed, but we may have this variable from several genomes, and we would want to know how the distribution may differ. To do this we can use the `ddply()` function from the `"plyr"` package.
`library(plyr)`
`ddply(dataset, .(categorical_variable), summarise,
mean=mean(variable),
sd=sd(variable),
hi_95=quantile(variable, 0.975),
lo_95=quantile(variable, 0.025))
`

To start plotting this we will use the ggplot2() package. We will start with a blank plot and add aesthetic layers to it.

`ggplot(dataset) # note the error
ggplot(dataset)+geom_boxplot(aes(x=categorical_variable, y= variable))
`