<hr>

# <font color='Purple'>GENETIC ANALYSIS: FIGURE CREATION IN *R*</font>

<hr>
    
Dr Graham S Sellers *g.sellers@hull.ac.uk*

![image](./images/R.png)

# <font color='Purple'>1. Data presentation</font>

<hr>

Plotting the data as a meaningful figure is an important part of reporting on any analysis work.  
This is another form of scientific communication.  

# <font color='Purple'>"*The art of science*"</font>

In this session we will concentrate on making suitable figures to be included in your assignment for the Genetic Analysis practical.  

You will modify the existing code used to make the plots from the last session, adding to them and customising as you see fit (within reason OFC).  

## <font color='Purple'>1.1. Theme</font>
When creating plots for a report or manuscript it is a good idea to have a theme in mind.  

By "theme", at a minimum, I mean a colour scheme repeated across all relevant plots.  
Font size, axis labelling, line width and figure resolution all contribute to the theme too.

Having a theme for your figures will act as a unifying factor, a continuity that will make the manuscript seem more of a purposeful entity rather than a disparate smattering of figures. A "*random-acts-of-plotting*" piece of work is certainly not nice, nor is it easy to look at.  

## <font color='Purple'>1.2. Base *R*</font>
All the figures will be generated using basic *R*.  
There are packages available (e.g. *ggplot*) that can allow for the creation of immensely customisable plots.  

However, for this practrical we are keeping it *brutalistic* and *base*.  

The limited functionality of basic *R* plotting will introduce you to many concepts of figure production.  
These can then be improved on as you progress in your degree or career as a genetic analysist.

# <font color='Purple'>2. Recreate the data</font>

<hr>
We are in a new Jupyter Notebook, therefore the variables, dataframes and objects created in the last session do not exist.  

We will need to recreate them. This means to basically rerun the entire anlysis to get the data in the correct format and ready to plot again.  
This is as simple as putting all the releant code in a single cell and running it.


**In the cell below we will rerun the final anlysis from the last session**  

The data will all be formatted and ready for plotting.  
**Note:** Comments have been added to the code so we can see which bits are doing what.  

**When you are ready, run the cell:**

In [None]:
# ===================================================
# FORMATTING DATA SECTION
# ===================================================

# MAKE PROPORTION READS AND PRESENCE/ABSENCE DATAFRAMES

# read in the new cleaned dataset:
my_data_cleaned = read.csv('data/genetic_analysis_cleaned_data.tsv', sep = '\t', header = T, row.names = 1)

# make a proportion reads dataframe:
my_data_prop = my_data_cleaned
my_data_prop = my_data_prop/rowSums(my_data_prop)

# make a presence/absence dataframe:
my_data_pa = my_data_cleaned
my_data_pa[my_data_pa > 0] = 1


# ===================================================

# SUBSET THE DATA BY LIMB

# chopping up the data by limb:
l_leg = my_data_pa[grep("LL", rownames(my_data_pa)),]
r_leg = my_data_pa[grep("LR", rownames(my_data_pa)),]
l_arm = my_data_pa[grep("AL", rownames(my_data_pa)),]
r_arm = my_data_pa[grep("AR", rownames(my_data_pa)),]

body_parts = list(l_leg, r_leg, l_arm, r_arm)

names(body_parts) = c("left_leg", "right_leg", "left_arm", "right_arm")


# ===================================================

# CREATE OTU RICHNESS DATAFRAME

# create an empty dataframe based on my_data_pa:
my_data_rich = my_data_pa[FALSE]

# Add columns for species richness - sum of rows from my_data_pa:
my_data_rich$richness = rowSums(my_data_pa)

# Add limbs in as numbers rather than name:
my_data_rich$limb = rep(c(1, 2, 3, 4), each = 3)


# ===================================================

# CREATE TOTAL OTU RICHNESS LIST

# total  richness per limb:
l_leg_rich = sum(colSums(body_parts$left_leg) > 0)
r_leg_rich = sum(colSums(body_parts$right_leg) > 0)
l_arm_rich = sum(colSums(body_parts$left_arm) > 0)
r_arm_rich = sum(colSums(body_parts$right_arm) > 0)

# list of limb OTU richness:
total_richness = c(l_leg_rich, r_leg_rich, l_arm_rich, r_arm_rich)

# add names for list:
names(total_richness) = c("left leg", "right leg", "left arm", "right arm")


# ===================================================
# VEGAN SECTION
# ===================================================

# import Vegan library:
library(vegan)


# ===================================================

# VEGAN ENVIRONMENT DATAFRAME

# create an env dataframe for use in vegan:
veg_env = my_data_prop[FALSE]

# set sample name from rownames of data:
veg_env$site = rownames(veg_env)

# make the limb column: 
veg_env$limb = rep(c('leg', 'arm'), each = 6)

# make the side column:
veg_env$side = rep(rep(c('left', 'right'), each = 3), 2)

# make the side/limb clumn:
veg_env$side_limb = paste(veg_env$side, veg_env$limb, sep = '_')


# ===================================================

# LIST OF SPECACCUM OBJECTS PER LIMB

# make an empty list:
species_accum = list()

# specify the "limbs":
limbs = c("left_leg", "right_leg", "left_arm", "right_arm")

# loop through the limbs:
for(limb in limbs){
        # generate "species " accumulation:
        vegan_accum = specaccum(body_parts[[limb]],
              method = "exact",
              permutations = 100)
        # add to "species_accum" list:
        species_accum[[limb]] = vegan_accum
}


# ===================================================

# VEGAN ORDINATION CREATION

# vegan ord:
ord = metaMDS(my_data_prop, k = 3, try = 100, trymax = 10000, distance = 'bray', na.rm = T)


That was easy wasn't it? The entirety of the last session in a cell. Nice. 

**Questions:**
1. Can you tell exactly what was done in the cell?  
2. Do the comments help?

## <font color='Purple'>Clear *R*'s memory</font>


If you ever feel like having a complete reset of *R*'s memory, run the cell below:

In [None]:
rm(list = ls())

# <font color='Purple'>3. Plot to .png</font>

<hr>

The simplest way to get a figure from *R* to an image file (in this case a .png) via the code is to use the `png()` function.  

For example: 

In [None]:
png('plots/test.png', width = 1080, height = 1080, units = 'px', pointsize = 30)

# everything for the plot goes here:
plot(0, 0)

dev.off()

A short description of the function:  

`'plots/test.png'` is the path to the location where the image is to be saved.  

`width = 1080` the width of the image in units.  

`height = 1080` the height of the image in units.  

`units = 'px'` the units the image is measure in, in this case pixels.  

`pointsize = 30` the font size of the image.  

**Important:** The function needs to be closed after the plot commands.  

`dev.off()` does this.  

Have a look at the plot created from the cell above (it is in the "**plots**" directory as "**test.png**" in the file browser to the left of the notebook).

### <font color='Purple'>Simple .png save</font>

To simply save a plot as an image, the easiest way is to copy the `png()` command from the cell above.  
Then paste it as the first line of the cell with the plot code in.  
Add `dev.off()` as the very last line.  

# <font color='Purple'>4. Customising your plots</font>

<hr>

The cells below have the relevant code to produce the plots from the last session's analysis.  

The code has been expanded to include all the functionality needed for modification of colour, line width and many other aspects.  
Comments have been added to show what each part relates to on the plot.

## <font color='Purple'>4.1. Safety first!</font>
Use the `+` button in the top bar of the notebook to create a new cell.  
Copy the code of the cell you want to modify into it.  
This will stop you from irreversibly ruining the code - a safety feature ;)  
### <font color='Purple'>4.1.1. Have fun, go mad</font>

Tweak all the variables! Choose different colours!  
Run the cell to look at any changes you have made to the plot. Continue tweaking.  

**Remember your "*theme*" and make your plots look good**.  

Any issues or requests, ask a demonstrator.

## <font color='Purple'>4.2. Customised, at last...</font>
When you are happy with your output, check with a demonstrator.  

Next, add in the `png()` and `dev.off()` functions in the correct places in the cell and make sure the plot looks good as the final .png image.  
Check them via the file browser on the left of the notebook. They should be in the "**plots**" directory if you have included the path correctly.  

# <font color='Purple'>5. The plots</font>


### <font color='Purple'>OTU richness boxplot</font>

In [None]:
boxplot(my_data_rich$richness ~ my_data_rich$limb,
        names = c("left leg", "right leg", "left arm", "right arm"),
        # y axis title:
        ylab = 'OTU richness',
        # x axis title:
        xlab = 'Sample site',
        # size of axis labels:
        cex.lab = 1.5,
        # size of axis tick labels: 
        cex.axis = 1,
        # colours of boxes:
        col = 'grey80',
        las = 1)

### <font color='Purple'>Total OTU richness barplot</font>

In [None]:
barplot(total_richness,
        beside = T,
        # size of axis tick labels:
        cex.axis = 1,
        # size of x axis labels:
        cex.names = 1,
        #y axis label:
        ylab = 'OTU richness',
        # size of y axis label:
        cex.lab = 1.5,
        # bar colours:
        col = 'grey80',
        # colour of bar borders:
        border = 'black,'
        las = 1)

# add line across the bottom of the plot:
abline(h = 0)

### <font color='Purple'>OTU accumulation plot</font>

In [None]:
# make the plot have a 2x2 grid of plots:
par(mfrow = c(2, 2))

#list of "limbs":
limbs = c("left_leg", "right_leg", "left_arm", "right_arm")

# loop through limbs and plot:
for(i in 1 : length(limbs)){
    plot(species_accum[[limbs[i]]],
        # y axis upper and lower values:
        ylim = c(50, 250),
        # x axis upper and lower values:
        xlim = c(0, 4),
        main = sub('_', ' ', limbs[i]),
        # x axis title:
        xlab = 'Samples',
        # y axis label:
        ylab = 'OTUs detected',
        # size of axis labels:
        cex.lab = 1,
        # colour of lines:
        col = 'black',
        las = 1)
    
    # add thicker lines to the main line of each plot:
    lines(species_accum[[limbs[i]]]$richness,
          # line width of main line:
          lwd = 3,
          # colour of main line:
          col = 'black')
}

### <font color='Purple'>Vegan NMDS ordination plot</font>

In [None]:
png('plots/test.png', width = 1080, height = 1080, units = 'px', pointsize = 30)


# empty plot of ord, no axes:
plot(ord,
     disp = "sites",
     type = 'n',
     axes = F,
     # x axis upper and lower values:
     xlim = c(-1, 1),
     # y axis upper and lower values:
     ylim = c(-1, 1),
     # x axis label:
     xlab = 'NMDS1',
     # y axis label:
     ylab = 'NMDS2',
     bty = 'n',
     # size of axis labels:
     cex.lab = 1.5)

# add correct x and y axes:
axis(1, at = seq(-1,2,0.5), cex.axis = 1, padj = -0.5, tck = -0.01, lwd = 0, lwd.ticks = 2)
axis(2, at = seq(-1,1,0.5), cex.axis = 1, las = 1, tck = -0.02, lwd = 0, lwd.ticks = 2)

# add vertical and horizontal lines to show "0, 0":
abline(h = 0, v = 0, lwd = 2, col = 'grey')

# add ordination circumference elipses with corerect colours:
ordiellipse(ord,
            veg_env$side_limb,
            kind = "ehull",
            # line width of elipses:
            lwd = 2,
            # colours of elipses:
            col = rep(c('green', 'red'), each = 2))

# add points to plot with colours specific to sites:
points(ord,
       disp = "sites",
       # shape of points:
       pch = ifelse(veg_env$limb == 'arm', 21, 24),
       # colour of points:
       bg = ifelse(veg_env$side == 'left', 'green', 'red'),
       # size of points:
       cex = 2,
       # outline colour of points:
       col = 'black',
       # line width of points outline:
       lwd = 0.5)

# add sample names as text to points:
text(ord$points[,1:2],
     # size of text:
     cex = 1,
     # label names to use:
     labels = veg_env$site,
     # position of the text around point:
     pos = 4)

# add box around the plot:
box('plot',
    # width of box line:
    lwd = 2)

dev.off()

# <font color='Purple'>6. Session complete</font>

<hr>

You have just made some manuscript quality figures for your Genetic Analysis assignment.  

## <font color='Purple'>Well done!</font>

### <font color='Purple'>Before you leave...</font>

Make sure you have downloaded all your figures.  
You don't want to have lost them to the ether!  

**Ask a demonstrator if there are any issues.**  

### <font color='Purple'>Interesting point...</font>
All your figures have been made in an entirely reproducible manner.

### <font color='Purple'>Welcome to the land of proper science!</font>  

You can, at a `ctrl` + `return` press, vomit the plots out whenever needed! Good right? I'd say so!!!

Anyway, until tomorrow...  
# <font color='Purple'>dev.off()</font>

