Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why did the function scan_pb_poisson give different results than SatScan? #11

Open
Weiren-Wang opened this issue Apr 25, 2021 · 1 comment

Comments

@Weiren-Wang
Copy link

Weiren-Wang commented Apr 25, 2021

Hi Benjamin,
I used the scan_pb_poisson to conduct space-time analysis with my dataset and I found that the results given by scan_pb_poisson and the result given by the software SatScan were quite different.
My dataset is a day-frequency disease counts data, range from 2020/12/31 to 2021/4/14.It contains 10 locations with latitude and longitude.
For SatScan,here is the settings:
[Input]
Time precision : Day
Coordinates : Lat/Long
[Analysis]
Type of Analysis : Space-Time
Probability Model : Poissson
Scan For Area With : High rates
And for scan_pb_possion,here is my code :
`counts = SZ_counts %>%
df_to_matrix(time_col = "time", location_col = "region", value_col = "count")
population = SZ_counts %>%
df_to_matrix(time_col = "time", location_col = "region", value_col = "population")
zones = SZ_geo %>%
select(long, lat) %>%
as.matrix %>%
spDists(x = ., y = ., longlat = TRUE) %>%
dist_to_knn(k = 4) %>%
knn_zones
regions = as.character(SZ_geo$region)
result = data.frame()
newcounts = counts
newpopulation = population
poisson_result = scan_pb_poisson(counts = newcounts,
zones = zones,
population = newpopulation,
n_mcsim = 999)
topclusters = top_clusters(poisson_result, zones, k = 10, overlapping = FALSE)

                      top_regions = topclusters$zone %>%
                        purrr::map(get_zone, zones = zones) %>%
                        purrr::map(function(x) regions[x])
                      
                      new_top_regions = c()
                      for (j in 1:length(top_regions)) {
                        new_top_regions[j] = paste(top_regions[[j]], collapse = ',')
                      }
                      
                      topclusters$zonename = new_top_regions
                      topclusters$endtime = rownames(population)[53]
                      result = rbind(result, topclusters)`

    For the same dataset, the SaTScan gave following results:
              1.Location IDs included.: 5
                Coordinates / radius..: (22.726017 N, 114.254455 E) / 0 km
                Time frame............: 2021/2/21 to 2021/4/13
                Population............: 2508600
                Number of cases.......: 198
                Expected cases........: 43.90
                Annual cases / 100000.: 55.4
                Observed / expected...: 4.51
                Relative risk.........: 6.85
                Log likelihood ratio..: 174.120739
                P-value...............: < 0.00000000000000001
              
              2.Location IDs included.: 4, 6, 1
                Coordinates / radius..: (22.754466 N, 113.942560 E) / 22.26 km
                Time frame............: 2021/2/10 to 2021/4/2
                Population............: 6369300
                Number of cases.......: 22
                Expected cases........: 111.46
                Annual cases / 100000.: 2.4
                Observed / expected...: 0.20
                Relative risk.........: 0.16
                Log likelihood ratio..: 63.471627
                P-value...............: < 0.00000000000000001
              
              3.Location IDs included.: 3, 7, 8
                Coordinates / radius..: (22.528466 N, 114.061547 E) / 12.88 km
                Time frame............: 2021/1/1 to 2021/2/20
                Population............: 4265300
                Number of cases.......: 14
                Expected cases........: 73.21
                Annual cases / 100000.: 2.4
                Observed / expected...: 0.19
                Relative risk.........: 0.17
                Log likelihood ratio..: 40.022434
                P-value...............: 0.000000000000092

       While scan_pb_possion gave following results:
     zone duration           score        relrisk_in  relrisk_out          Gumbel_pvalue      zonename          endtime
       15     104         392.4982441   4.248194    0.2996086         0.0000000             5              2021/2/28
        13     104        329.5112571     3.428604   0.2993484         0.0000000             4,5       2021/2/28
     
      The 2 zones given by scan_pb_possion were totally different from the 3 clusters given by SaTScan.Why is that?

      In addition, the SatScan only gave one relative risk but scan_pb_possion give two risk:relrisk_in,relrisk_out.How could I match these results?
@mairamorenoc
Copy link

Hi,

In SatScan, are you using the prospective space-time model? Because as far as I understand, the scan_pb_possion function implemented in this package is only for prospective analysis, as he referenced Kulldorff 2001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants