# Space Time Scan

# Using SatScan

- Start SaTScan and make a new session
- Under the "Input" tab:
    - Set the "Case File" to "chicago.cas"
    - Set the "Coordinates Files" to "chicago.geo"
    - Set "Coordinates" to "Cartesian"
    - Set "Time Precision" to Day
    - Set the "Study Period" from 2011-03-01 to 2011-09-27 (or whatever)
- Under the "Analysis" tab:
    - Select "Propsective Analysis" -> "Space-Time"
    - Select "Probability Model" -> "Space-Time Permutation"
    - Select "Time Aggregation" -> "1 Day"
- Under the "Output" tab:
    - Select the "Main Results File" to whatever
    
*Optionally* change the spatial and temporal window:

- Under "Analysis", click "Advanced":
    - Under "Spatial Window", select "is a circle with a ..."
    - Under "Temporal Window", select "Maximum Temporal Cluster Size" is ... days

## Using our library code

In [1]:
%matplotlib inline
from common import *
datadir = os.path.join("//media", "disk", "Data")
#datadir = os.path.join("..", "..", "..", "..", "..", "Data")
south_side, points = load_data(datadir)

GDAL_DATA not set and failed to find suitable location...  This is probably not a problem on linux.


In [2]:
import open_cp.stscan as stscan

In [3]:
trainer = stscan.STSTrainer()
trainer.data = points

In [4]:
scanner, _ = trainer.to_scanner()
scanner.coords.shape, scanner.timestamps.shape

((2, 3395), (3395,))

In [5]:
masks, counts, distsq = scanner.find_discs(scanner.coords.T[0])
masks.shape

(3395, 476)

In [30]:
time_masks, time_counts, times = scanner.make_time_ranges()
N = scanner.timestamps.shape[0]
centre = scanner.coords.T[0]
space_masks, space_counts, dists = scanner.find_discs(centre)
actual = scanner._calc_actual(space_masks, time_masks, time_counts)
expected = space_counts[:,None] * time_counts[None,:] / N
_mask = (actual > 1) & (actual > expected)
stats = scanner._ma_statistic(np.ma.array(actual, mask=~_mask),
                              np.ma.array(expected, mask=~_mask), N)
_mask1 = np.any(_mask, axis=1)
if not np.any(_mask1):
    raise Exception()
m = np.ma.argmax(stats, axis=1)[_mask1]
stats1 = stats[_mask1,:]
stats2 = stats1[range(stats1.shape[0]),m].data
used_dists = dists[_mask1]
used_times = times[m]


In [22]:
%timeit( scanner.find_discs(centre) )

100 loops, best of 3: 6.05 ms per loop


In [23]:
%timeit(scanner._calc_actual(space_masks, time_masks, time_counts))

1 loop, best of 3: 196 ms per loop


In [24]:
%timeit(space_counts[:,None] * time_counts[None,:] / N)

100 loops, best of 3: 3.63 ms per loop


In [25]:
%timeit((actual > 1) & (actual > expected))

1000 loops, best of 3: 513 µs per loop


In [27]:
%timeit(scanner._ma_statistic(np.ma.array(actual, mask=~_mask), np.ma.array(expected, mask=~_mask), N))

10 loops, best of 3: 80.1 ms per loop


In [28]:
%timeit(np.any(_mask, axis=1))

10000 loops, best of 3: 35.2 µs per loop


In [31]:
%timeit(np.ma.argmax(stats, axis=1)[_mask1])

1000 loops, best of 3: 1.29 ms per loop


In [32]:
%timeit(stats[_mask1,:])

1000 loops, best of 3: 327 µs per loop


In [33]:
%timeit(stats1[range(stats1.shape[0]),m].data)

10000 loops, best of 3: 162 µs per loop


In [34]:
%timeit(dists[_mask1])

The slowest run took 10.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.36 µs per loop


In [35]:
%timeit(times[m])

The slowest run took 12.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 848 ns per loop


In [38]:
def f():
    x = scanner.faster_score_all()
    return next(x)

%timeit(f())

1 loop, best of 3: 305 ms per loop


In [45]:
import itertools

centre, dists, times, stats = f()

def g():
    scores = []
    scores.extend(zip(itertools.repeat(centre[0]),itertools.repeat(centre[1]), np.sqrt(dists), times, stats))
    return scores

%timeit(g())

10000 loops, best of 3: 79.9 µs per loop


In [47]:
import datetime
x = scanner.faster_score_all()

for _ in range(20):
    now = datetime.datetime.now()
    next(x)
    print(datetime.datetime.now() - now)

0:00:00.312451
0:00:00.726980
0:00:00.655324
0:00:00.711594
0:00:00.737515
0:00:00.639638
0:00:00.827011
0:00:00.234978
0:00:00.802511
0:00:00.696016
0:00:00.623991
0:00:00.841354
0:00:00.719550
0:00:00.705551
0:00:00.871969
0:00:00.173941
0:00:00.860185
0:00:00.849753
0:00:00.844854
0:00:00.510443


In [46]:
next(scanner.find_all_clusters())

1 382
2 1570
3 2584
4 3751
5 4959
6 5807


KeyboardInterrupt: 