# Integrate biogeographic traits with a geographic map

To enable microbiome data analysis in conjunction with the metadata extracted from a geographic map, we also implemented several functions to merge microbial traits with a `SpatialPolygonsDataFrame` or extract a new metadata from a `SpatialPolygonsDataFrame`. By using these functions, we can perform the statistical comparing based on administrative areas or grids. 

Here we need three R packages for this section of [microgeo](https://github.com/ChaonanLi/microgeo) R package tutorial. Just run the following codes to import them into R environment.

In [1]:
suppressMessages(require("magrittr")) 
require("ggplot2")  %>% suppressMessages()
require("microgeo") %>% suppressMessages()

If the Chinese characters cannot be displayed correctly, please run the following codes to set locale to `UTF-8`:

In [2]:
prev_locale <- Sys.setlocale("LC_CTYPE", "C.UTF-8") 

We need a standard microgeo dataset for the presentations in the section of tutorial.

In [3]:
# Use the map downloaded from DataV.GeoAtlas
data(qtp)
map <- read_aliyun_map(adcode = c(540000, 630000, 510000)) %>% suppressMessages() 
dataset.dts.aliyun <- create_dataset(mat = qtp$asv, ant = qtp$tax, met = qtp$met, map = map,
                                     phy = qtp$tre, env = qtp$env, lon = "longitude", lat = "latitude")
dataset.dts.aliyun %<>% rarefy_count_table()
dataset.dts.aliyun %<>% tidy_dataset()
dataset.dts.aliyun %<>% calc_alpha_div(measures = c("observed", "shannon")) 
dataset.dts.aliyun %<>% calc_beta_div(measures = c("bray", "jaccard")) 
dataset.dts.aliyun %>% show_dataset()

[36mℹ[39m [2024-01-12 21:56:07] [34m[3m[34mINFO[34m[23m[39m ==> all samples fall within the map area!

[36mℹ[39m [2024-01-12 21:56:07] [34m[3m[34mINFO[34m[23m[39m ==> dataset has been created successfully!

[36mℹ[39m [2024-01-12 21:56:07] [34m[3m[34mINFO[34m[23m[39m ==> use `object %>% show_dataset()` to check the summary of dataset.

[36mℹ[39m [2024-01-12 21:56:11] [34m[3m[34mINFO[34m[23m[39m ==> the ASV/gene abundance table has been rarefied with a sub-sample depth of 5310

[32m✔[39m [2024-01-12 21:56:16] [32m[3m[32mSAVE[32m[23m[39m ==> results have been saved to: object$div$alpha

[32m✔[39m [2024-01-12 21:57:01] [32m[3m[32mSAVE[32m[23m[39m ==> results have been saved to: object$div$beta



[34m──[39m [34mThe Summary of Microgeo Dataset[39m [34m─────────────────────────────────────────────[39m


[36mℹ[39m object$mat: 6808 ASVs/genes and 1244 samples [32m[3m[32m[subsample depth: 5310][32m[23m[39m

[36mℹ[39m object$ant: 6808 ASVs/genes and 7 annotation levels (Kingdom, Phylum, Class, Order, Family, Genus, Species)

[36mℹ[39m object$met: 1244 samples and 2 variables (longitude, latitude)

[36mℹ[39m object$map: a SpatialPolygonsDataFrame with the CRS of '+proj=longlat +datum=WGS84 +no_defs'

[36mℹ[39m object$phy: a phylogenetic tree with 6808 tip labels

[36mℹ[39m object$env: 1244 samples and 10 variables




[30m──[39m [30mThe Summary of Biogeographic Traits[39m [30m─────────────────────────────────────────[39m


[32m✔[39m object$div$alpha: 2 alpha diversity index/indices (observed, shannon)

[32m✔[39m object$div$beta: 2 beta diversity distance matrix/matrices (bray, jaccard)




[44m• To check the summary of dataset, Replace `object` with the variable name of your dataset[49m
[44m• For example, if the variable name is `dataset.dts`you can run `head(dataset.dts$met)` to check the content of `met`[49m


Now, let's go through each of these functions and see how they are used.

## 1. Merge a `data.frame` with a map

Firstly, we check the `data.frame` of alpha diversity indices, and the `SpatialPolygonsDataFrame`.

In [4]:
# Check the data.frame of alpha diversity indices 
head(dataset.dts.aliyun$div$alpha)

Unnamed: 0_level_0,observed,shannon
Unnamed: 0_level_1,<dbl>,<dbl>
s1,1002,6.330193
s2,928,6.248348
s3,851,6.133129
s4,1033,6.264501
s5,893,6.184729
s6,977,6.192993


In [5]:
# Check the `SpatialPolygonsDataFrame`
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,西藏自治区,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,青海省,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,四川省,102.69345,30.67454


In [6]:
# Change the names of Polygons
dataset.dts.aliyun$map@data$NAME <- c("Tibet", "Qinghai", "Sichuan") 
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454


Then, we merge the `data.frame` of alpha diversity indices with a `SpatialPolygonsDataFrame`.

In [7]:
# Merge data to a `SpatialPolygonsDataFrame`
common.map.mean4df <- merge_dfs_to_map(map = dataset.dts.aliyun$map, dat = dataset.dts.aliyun$div$alpha, 
                                       met = dataset.dts.aliyun$met, med = 'mean')
head(common.map.mean4df@data[,1:12])
# Now, you can visualize the microbial traits (alpha diversity indices) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375,663.8435,5.847498,239.1056,0.4768206,10.44538,0.02083001,524
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264,648.0747,5.838778,246.3986,0.5393605,11.71999,0.02565476,442
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278


We also can merge the `data.frame` of alpha diversity indices with a gridded `SpatialPolygonsDataFrame`. 

In [8]:
# Grid the map [`SpatialPolygonsDataFrame`]
gridded.map <- grid_map(map = dataset.dts.aliyun$map, res = 1.5) %>% suppressMessages
head(gridded.map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,Gridded.Map,microgeo,1,83.74702,29.73742
2,Gridded.Map,microgeo,2,85.46302,28.50944
3,Gridded.Map,microgeo,3,86.67299,28.33105
4,Gridded.Map,microgeo,4,89.49169,28.25211
5,Gridded.Map,microgeo,5,88.12468,28.29693
6,Gridded.Map,microgeo,6,85.14919,29.46224


In [9]:
# Merge data to a gridded map
gridded.map.mean4df <- merge_dfs_to_map(map = gridded.map, dat = dataset.dts.aliyun$div$alpha, 
                                        met = dataset.dts.aliyun$met, med = 'mean')
head(gridded.map.mean4df@data[,1:12])
# Now, you can visualize the microbial traits (alpha diversity indices) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,Gridded.Map,microgeo,1,83.74702,29.73742,592.3077,5.810641,146.6625,0.2239471,40.67687,0.06211176,13
2,Gridded.Map,microgeo,2,85.46302,28.50944,690.6667,5.949724,195.8988,0.2909757,113.10222,0.16799491,3
3,Gridded.Map,microgeo,3,86.67299,28.33105,552.5385,5.773249,147.3197,0.2781269,40.85914,0.07713852,13
4,Gridded.Map,microgeo,4,89.49169,28.25211,664.75,5.776899,311.4979,0.5693539,155.74893,0.28467697,4
5,Gridded.Map,microgeo,5,88.12468,28.29693,473.2667,5.426322,152.8858,0.4156586,39.47494,0.10732258,15
6,Gridded.Map,microgeo,6,85.14919,29.46224,555.7917,5.754356,148.5275,0.2327175,30.31806,0.04750326,24


## 2. Merge a `distance matrix` with a map

Firstly, we check the distance `matrix`, and the `SpatialPolygonsDataFrame`.

In [10]:
# Check the distance matrix 
dataset.dts.aliyun$div$beta$bray[1:5, 1:5]

Unnamed: 0,s1,s2,s3,s4,s5
s1,0.0,0.4868173,0.5881356,0.4919021,0.4020716
s2,0.4868173,0.0,0.4783427,0.4199623,0.3971751
s3,0.5881356,0.4783427,0.0,0.3973635,0.4883239
s4,0.4919021,0.4199623,0.3973635,0.0,0.419774
s5,0.4020716,0.3971751,0.4883239,0.419774,0.0


In [11]:
# Check the `SpatialPolygonsDataFrame`
head(dataset.dts.aliyun$map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454


Then, we merge the distance `matrix` with a `SpatialPolygonsDataFrame`. 

In [12]:
# Merge distance matrix to a common map
common.map.mean4mx <- merge_mtx_to_map(map = dataset.dts.aliyun$map, dat = dataset.dts.aliyun$div$beta$bray, 
                                        met = dataset.dts.aliyun$met, var = 'bray', med = 'mean')
head(common.map.mean4mx@data[,1:9])
# Now, you can visualize the microbial traits (beta diversity distance matrix) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,DataV.GeoAtlas,microgeo,Tibet,88.38828,31.56375,0.8127623,0.1044117,0.0002820641,524
2,DataV.GeoAtlas,microgeo,Qinghai,96.04353,35.7264,0.7973775,0.1222975,0.0003917437,442
3,DataV.GeoAtlas,microgeo,Sichuan,102.69345,30.67454,0.7220015,0.1392198,0.0007095021,278


We also can merge a distance `matrix` with a gridded `SpatialPolygonsDataFrame`.

In [13]:
# Grid the map 
gridded.map <- grid_map(map = dataset.dts.aliyun$map, res = 1.5) %>% suppressMessages
head(gridded.map@data)

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,Gridded.Map,microgeo,1,83.74702,29.73742
2,Gridded.Map,microgeo,2,85.46302,28.50944
3,Gridded.Map,microgeo,3,86.67299,28.33105
4,Gridded.Map,microgeo,4,89.49169,28.25211
5,Gridded.Map,microgeo,5,88.12468,28.29693
6,Gridded.Map,microgeo,6,85.14919,29.46224


In [14]:
# Merge distance matrix to a gridded map
gridded.map.mean4mx <- merge_mtx_to_map(map = gridded.map, dat = dataset.dts.aliyun$div$beta$bray, 
                                        met = dataset.dts.aliyun$met, var = 'bray', med = 'mean')
head(gridded.map.mean4mx@data[,1:9])
# Now, you can visualize the microbial traits (beta diversity distance matrix) onto a map

Unnamed: 0_level_0,TYPE,FMTS,NAME,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,Gridded.Map,microgeo,1,83.74702,29.73742,0.6626201,0.09315733,0.01054799,13
2,Gridded.Map,microgeo,2,85.46302,28.50944,0.6412429,0.10472639,0.060463808,3
3,Gridded.Map,microgeo,3,86.67299,28.33105,0.7313149,0.12932598,0.014643283,13
4,Gridded.Map,microgeo,4,89.49169,28.25211,0.7666039,0.23269274,0.094996411,4
5,Gridded.Map,microgeo,5,88.12468,28.29693,0.773328,0.13723699,0.013392959,15
6,Gridded.Map,microgeo,6,85.14919,29.46224,0.7700101,0.10863048,0.006538787,24


## 3. Extract metadata table from a map

In [15]:
# Extract metadata from a common map
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis
metadata <- dataset.dts.aliyun$map %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>
s1,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s2,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s3,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s4,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s5,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454
s6,98.20639,33.1028,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454


In [16]:
# Extract metadata from a common map with additional data
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis 
metadata.from.c.df <- common.map.mean4df %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata.from.c.df)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER,observed_mean,shannon_mean,observed_sd,shannon_sd,observed_se,shannon_se,sample.num
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
s1,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278
s2,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278
s3,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278
s4,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278
s5,98.20894,33.10321,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278
s6,98.20639,33.1028,Sichuan,DataV.GeoAtlas,microgeo,102.6935,30.67454,706.1151,5.974249,197.868,0.3664406,11.86733,0.02197765,278


In [17]:
# Extract metadata from a gridded map
# Rownames are sample IDs
# This new matadata table can be used for subsequent statistical analysis
metadata.from.g.mx <- gridded.map.mean4mx %>% extract_metadata_from_map(met = dataset.dts.aliyun$met)
head(metadata.from.g.mx)

Unnamed: 0_level_0,longitude,latitude,NAME,TYPE,FMTS,X.CENTER,Y.CENTER,bray_mean,bray_sd,bray_se,sample.num
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
s1,98.20894,33.10321,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
s2,98.20894,33.10321,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
s3,98.20894,33.10321,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
s4,98.20894,33.10321,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
s5,98.20894,33.10321,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
s6,98.20639,33.1028,138,Gridded.Map,microgeo,98.70691,32.50087,0.6559857,0.1575225,0.005006393,45
