# BEES1041 Exploring the Natural World #
# Week 4 Computer Exercise - Tree locations #
***
Last week, you looked at the diameter (DBH) and height data from the trees measured in Centennial Park. This week you will look at the locations of the trees that you recorded using your phones. Firstly, you need to remove the repeated measurements from the data, and summarise the tree locations. You can do this by calculating the median value of latitude and longitude for each tree name. The median is less influenced by outliers than the mean, so wont be as affected by incorrect locations.

***

The first section of code repeats what you did last week to remove outliers and create a dataframe with a subset of the measurements.

In [None]:
library(dplyr)

# Read in the data 
tree_data <- read.csv('bees1041_tree_data_2024.csv')

# Remove the DBH outliers
trees_median_dbh <- aggregate(tree_data$dbh_cm, by = list(tree_data$Name), FUN = median)
colnames(trees_median_dbh) <- c("Name", "median_dbh_cm")
tree_data_merged <- merge(tree_data, trees_median_dbh, by = 'Name')
tree_data_merged$dbh_dif <- abs(tree_data_merged$dbh_cm - tree_data_merged$median_dbh_cm) # calculate the difference between the values and the medians
tree_data_subset <- subset(tree_data_merged, dbh_dif <= 5) # only keep data where the difference is less than 5 cm

# Remove the height outliers
trees_median_height <- aggregate(tree_data_subset$Tree_height_m, by = list(tree_data_subset$Name), FUN = median)
colnames(trees_median_height) <- c("Name", "median_tree_height_m")
tree_data_subset <- merge(tree_data_subset, trees_median_height, by = 'Name')
tree_data_subset$height_dif <- abs(tree_data_subset$Tree_height_m - tree_data_subset$median_tree_height_m) # calculate the difference between the values and the medians
tree_data_subset <- subset(tree_data_subset, height_dif <= 10) # only keep data where the difference is less than 10 m

# Remove trees that were only measured once
tree_count <- aggregate(tree_data_subset$dbh_cm, by = list(tree_data_subset$Name), FUN = NROW)
colnames(tree_count) <- c("Name", "times_measured")
tree_data_subset <- merge(tree_data_subset, tree_count, by = 'Name')
tree_data_subset <- subset(tree_data_subset, times_measured > 1)

Now we should check the latitude and longitude values. These should all be within the bounding box of the site, which is:
* Latitude should be between -33.893244 and -33.891966 
* Longitude should be between 151.235847 and 151.237977

We can remove values outside the bounding box by setting them to `NA`.

In [None]:
tree_data_subset$Latitude[tree_data_subset$Latitude <  -33.893244] <- NA
tree_data_subset$Latitude[tree_data_subset$Latitude >  -33.891966] <- NA
tree_data_subset$Longitude[tree_data_subset$Longitude < 151.235847] <- NA
tree_data_subset$Longitude[tree_data_subset$Longitude > 151.237977] <- NA

Now we can create a dataframe calculating the median value for DBH, height, latitude and longitude and save it as a CSV file.

In [None]:
tree_data_median <- aggregate(cbind(tree_data_subset$dbh_cm, tree_data_subset$Tree_height_m, tree_data_subset$Latitude, tree_data_subset$Longitude),
                              by = list(tree_data_subset$Name), FUN = median, na.rm=TRUE, na.action = na.pass)
colnames(tree_data_median) <- c("Name", "median_dbh_cm", "median_height_m", "median_latitude", "median_longitude")
write.csv(tree_data_median, "bees1041_tree_medians_2024.csv", row.names=FALSE)

Lastly, we can calculate the median difference between the tree locations and the median locations.

In [None]:
# Merge the median values into the original data
tree_data_merged <- merge(tree_data_subset, tree_data_median, by = 'Name')     

# Caluclate the difference in location
tree_data_merged$loc_diff <- sqrt((tree_data_merged$median_latitude - tree_data_merged$Latitude)^2 +
                                  (tree_data_merged$median_longitude - tree_data_merged$Longitude)^2)

# Calculate the median difference in location (degrees)
median_degrees_loc_dif <- median(tree_data_merged$loc_diff, na.rm=TRUE)

# Approximate conversion to metres (only works at Sydney's latitude)
median_degrees_loc_dif * 100000                             