# Integrate biogeographic traits with a geographic map

To enable microbiome data analysis in conjunction with the metadata extracted from a geographic map, we also implemented several functions to merge microbial traits with a `SpatialPolygonsDataFrame` or extract a new metadata from a `SpatialPolygonsDataFrame`. By using these functions, we can perform the statistical comparing based on administrative areas or grids. 

Here we need three R packages for this section of [microgeo](https://github.com/ChaonanLi/microgeo) R package tutorial. Just run the following codes to import them into R environment.

In [1]:
suppressMessages(require("magrittr")) 
require("ggplot2")  %>% suppressMessages()
require("microgeo") %>% suppressMessages()

If the Chinese characters cannot be displayed correctly, please run the following codes to set locale to `UTF-8`:

In [2]:
prev_locale <- Sys.setlocale("LC_CTYPE", "C.UTF-8") 

We need a standard microgeo dataset for the presentations in the section of tutorial.

In [3]:
# Use the map downloaded from DataV.GeoAtlas
data(qtp)
map <- read_aliyun_map(adcode = c(540000, 630000, 510000)) %>% suppressMessages() 
dataset.dts.aliyun <- create_dataset(mat = qtp$asv, ant = qtp$tax, met = qtp$met, map = map,
                                     phy = qtp$tre, env = qtp$env, lon = "longitude", lat = "latitude")
dataset.dts.aliyun %<>% rarefy_count_table()
dataset.dts.aliyun %<>% tidy_dataset()
dataset.dts.aliyun %<>% calc_alpha_div(measures = c("observed", "shannon")) 
dataset.dts.aliyun %<>% calc_beta_div(measures = c("bray", "jaccard")) 
dataset.dts.aliyun %>% show_dataset()

[36mℹ[39m [2024-01-09 22:08:12] [34m[3m[34mINFO[34m[23m[39m ==> all samples fall within the map area!

[36mℹ[39m [2024-01-09 22:08:12] [34m[3m[34mINFO[34m[23m[39m ==> dataset has been created successfully!

[36mℹ[39m [2024-01-09 22:08:12] [34m[3m[34mINFO[34m[23m[39m ==> use `object %>% show_dataset()` to check the summary of dataset.

[36mℹ[39m [2024-01-09 22:08:16] [34m[3m[34mINFO[34m[23m[39m ==> the ASV/gene abundance table has been rarefied with a sub-sample depth of 5310

[32m✔[39m [2024-01-09 22:08:20] [32m[3m[32mSAVE[32m[23m[39m ==> results have been saved to: object$div$alpha

[32m✔[39m [2024-01-09 22:09:04] [32m[3m[32mSAVE[32m[23m[39m ==> results have been saved to: object$div$beta



[34m──[39m [34mThe Summary of Microgeo Dataset[39m [34m─────────────────────────────────────────────[39m


[36mℹ[39m object$mat: 6808 ASVs/genes and 1244 samples [32m[3m[32m[subsample depth: 5310][32m[23m[39m

[36mℹ[39m object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

[36mℹ[39m object$met: 1244 samples and 2 variables (longitude, latitude)

[36mℹ[39m object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

[36mℹ[39m object$phy: a phylogenetic tree with 6808 tip labels

[36mℹ[39m object$env: 1244 samples and 10 variables




[30m──[39m [30mThe Summary of Biogeographic Traits[39m [30m─────────────────────────────────────────[39m


[32m✔[39m object$div$alpha: 2 alpha diversity index/indices (observed, shannon)

[32m✔[39m object$div$beta: 2 beta diversity distance matrix/matrices (bray, jaccard)




[44m• To check the summary of dataset, Replace `object` with the variable name of your dataset[49m
[44m• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`[49m


Now, let's go through each of these functions and see how they are used.

## 1. Merge a `data.frame` with a map

Firstly, we check the `data.frame` of alpha diversity indices, and the `SpatialPolygonsDataFrame`.

In [4]:
# Check the data.frame of alpha diversity indices 
head(dataset.dts.aliyun$div$alpha)

Unnamed: 0_level_0,observed,shannon
Unnamed: 0_level_1,<dbl>,<dbl>
s1,996,6.358363
s2,944,6.247542
s3,825,6.115851
s4,1063,6.303401
s5,889,6.189555
s6,976,6.220709


In [5]:
# Check the `SpatialPolygonsDataFrame`
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,西藏自治区,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,青海省,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,四川省,102.69345,30.67454


In [6]:
# Change the names of Polygons
dataset.dts.aliyun$map@data$NAME <- c("Tibet", "Qinghai", "Sichuan") 
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454


Then, we merge the `data.frame` of alpha diversity indices with a `SpatialPolygonsDataFrame`.

In [7]:
# Merge data to a `SpatialPolygonsDataFrame`
common.map.mean4df <- merge_dfs_to_map(map = dataset.dts.aliyun$map, dat = dataset.dts.aliyun$div$alpha, 
                                       met = dataset.dts.aliyun$met, med = 'mean')
head(common.map.mean4df@data[,1:12])
# Now, you can visualize the microbial traits (alpha diversity indices) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375,663.5305,5.847504,239.0472,0.4765782,10.44282,0.02081942,524
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264,648.2873,5.838381,246.935,0.5385664,11.7455,0.02561699,442
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278


We also can merge the `data.frame` of alpha diversity indices with a gridded `SpatialPolygonsDataFrame`. 

In [8]:
# Grid the map [`SpatialPolygonsDataFrame`]
gridded.map <- grid_map(map = dataset.dts.aliyun$map, res = 1.5) %>% suppressMessages
head(gridded.map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,Gridded.Map,microgeo,1,83.74702,29.73742
2,Gridded.Map,microgeo,2,85.46302,28.50944
3,Gridded.Map,microgeo,3,86.67299,28.33105
4,Gridded.Map,microgeo,4,89.49169,28.25211
5,Gridded.Map,microgeo,5,88.12468,28.29693
6,Gridded.Map,microgeo,6,85.14919,29.46224


In [9]:
# Merge data to a gridded map
gridded.map.mean4df <- merge_dfs_to_map(map = gridded.map, dat = dataset.dts.aliyun$div$alpha, 
                                        met = dataset.dts.aliyun$met, med = 'mean')
head(gridded.map.mean4df@data[,1:12])
# Now, you can visualize the microbial traits (alpha diversity indices) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,Gridded.Map,microgeo,1,83.74702,29.73742,590.7692,5.808356,149.3855,0.2288053,41.43208,0.06345917,13
2,Gridded.Map,microgeo,2,85.46302,28.50944,691.0,5.942714,192.7978,0.2783227,111.31187,0.16068969,3
3,Gridded.Map,microgeo,3,86.67299,28.33105,551.1538,5.772307,146.3824,0.2845499,40.59917,0.07891995,13
4,Gridded.Map,microgeo,4,89.49169,28.25211,664.75,5.780116,304.9572,0.5361339,152.47862,0.26806695,4
5,Gridded.Map,microgeo,5,88.12468,28.29693,470.6,5.425728,152.2009,0.4079441,39.2981,0.10533071,15
6,Gridded.Map,microgeo,6,85.14919,29.46224,556.2083,5.7544,148.4866,0.2343524,30.30969,0.04783698,24


## 2. Merge a `distance matrix` with a map

Firstly, we check the distance `matrix`, and the `SpatialPolygonsDataFrame`.

In [10]:
# Check the distance matrix 
dataset.dts.aliyun$div$beta$bray[1:5, 1:5]

Unnamed: 0,s1,s2,s3,s4,s5
s1,0.0,0.4721281,0.5806026,0.4804143,0.4103578
s2,0.4721281,0.0,0.4896422,0.4080979,0.4020716
s3,0.5806026,0.4896422,0.0,0.4062147,0.4887006
s4,0.4804143,0.4080979,0.4062147,0.0,0.4148776
s5,0.4103578,0.4020716,0.4887006,0.4148776,0.0


In [11]:
# Check the `SpatialPolygonsDataFrame`
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454


Then, we merge the distance `matrix` with a `SpatialPolygonsDataFrame`. 

In [12]:
# Merge distance matrix to a common map
common.map.mean4mx <- merge_mtx_to_map(map = dataset.dts.aliyun$map, dat = dataset.dts.aliyun$div$beta$bray, 
                                        met = dataset.dts.aliyun$met, var = 'bray', med = 'mean')
head(common.map.mean4mx@data[,1:9])
# Now, you can visualize the microbial traits (beta diversity distance matrix) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375,0.8130689,0.1043734,0.0002819604,524
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264,0.7974938,0.1222063,0.0003914517,442
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454,0.7217441,0.1389409,0.0007080806,278


We also can merge a distance `matrix` with a gridded `SpatialPolygonsDataFrame`.

In [13]:
# Grid the map 
gridded.map <- grid_map(map = dataset.dts.aliyun$map, res = 1.5) %>% suppressMessages
head(gridded.map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,Gridded.Map,microgeo,1,83.74702,29.73742
2,Gridded.Map,microgeo,2,85.46302,28.50944
3,Gridded.Map,microgeo,3,86.67299,28.33105
4,Gridded.Map,microgeo,4,89.49169,28.25211
5,Gridded.Map,microgeo,5,88.12468,28.29693
6,Gridded.Map,microgeo,6,85.14919,29.46224


In [14]:
# Merge distance matrix to a gridded map
gridded.map.mean4mx <- merge_mtx_to_map(map = gridded.map, dat = dataset.dts.aliyun$div$beta$bray, 
                                        met = dataset.dts.aliyun$met, var = 'bray', med = 'mean')
head(gridded.map.mean4mx@data[,1:9])
# Now, you can visualize the microbial traits (beta diversity distance matrix) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,Gridded.Map,microgeo,1,83.74702,29.73742,0.6597953,0.09339442,0.010574836,13
2,Gridded.Map,microgeo,2,85.46302,28.50944,0.6454488,0.09660601,0.055775505,3
3,Gridded.Map,microgeo,3,86.67299,28.33105,0.7303081,0.13149567,0.014888953,13
4,Gridded.Map,microgeo,4,89.49169,28.25211,0.7624294,0.23476199,0.095841182,4
5,Gridded.Map,microgeo,5,88.12468,28.29693,0.7764918,0.13659809,0.013330609,15
6,Gridded.Map,microgeo,6,85.14919,29.46224,0.7703294,0.10821025,0.006513491,24


## 3. Extract metadata table from a map

In [15]:
# Extract metadata from a common map
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis
metadata <- dataset.dts.aliyun$map %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>
s1,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s2,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s3,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s4,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s5,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s6,98.20639,33.1028,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454


In [16]:
# Extract metadata from a common map with additional data
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis 
metadata.from.c.df <- common.map.mean4df %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata.from.c.df)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
s1,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278
s2,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278
s3,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278
s4,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278
s5,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278
s6,98.20639,33.1028,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.2662,5.975532,198.4432,0.3661972,11.90183,0.02196305,278


In [17]:
# Extract metadata from a gridded map
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis
metadata.from.g.mx <- gridded.map.mean4mx %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata.from.g.mx)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
s1,98.20894,33.10321,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
s2,98.20894,33.10321,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
s3,98.20894,33.10321,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
s4,98.20894,33.10321,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
s5,98.20894,33.10321,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
s6,98.20639,33.1028,101,Gridded.Map,microgeo,98.64554,32.45835,0.657489,0.1571897,0.004995818,45
