# Cerebral Cortex Data Analysis Algorithms
Cerebral Cortex contains a library of algorithms that are useful for processing data and converting it into features or biomarkers.  This page demonstrates a simple GPS clustering algorithm.  For more details about the algorithms that are available, please see our [documentation](https://cerebralcortex-kernel.readthedocs.io/en/latest/).  These algorithms are constantly being developed and improved through our own work and the work of other researchers.

## Initalize the system

In [1]:
%reload_ext autoreload
from util.dependencies import *
from settings import USER_ID

CC = Kernel("/home/md2k/cc_conf/")

## Generate some sample location data

This example utilizes a data generator to protect the privacy of real participants and allows for anyone utilizing this system to explore the data without required institutional review board approvals. This is disabled for this demonstration to not create too much data at once.

In [2]:
# gen_location_datastream(CC, user_id=USER_ID, stream_name="GPS--org.md2k.phonesensor--PHONE")

## Get stream data
Read the demo GPS stream and show some example values.  A typical GPS sample contains values for _latitude, longitude, altitude, speed, bearing, and accuracy_.

In [3]:
gps_stream = CC.get_stream("GPS--org.md2k.phonesensor--PHONE")
gps_stream.show(3)
gps_stream.summary()

+-------------------+-------------------+------------------+------------------+--------+--------+----------+---------+-------+--------------------+
|          timestamp|          localtime|          latitude|         longitude|altitude|   speed|   bearing| accuracy|version|                user|
+-------------------+-------------------+------------------+------------------+--------+--------+----------+---------+-------+--------------------+
|2019-09-01 17:20:59|2019-09-01 22:20:59|35.121555023690924|-89.87015080364482|      96|4.478289|339.106126|20.914927|      1|00000000-afb8-476...|
|2019-09-01 17:21:59|2019-09-01 22:21:59|35.122068188916195|-89.86843860790495|      99|3.708595|299.378306|20.123134|      1|00000000-afb8-476...|
|2019-09-01 17:22:59|2019-09-01 22:22:59|35.120601654816824|-89.87075582986049|      94|3.474958|333.673786| 18.31153|      1|00000000-afb8-476...|
+-------------------+-------------------+------------------+------------------+--------+--------+----------+----

## Cluster the location data
Cerebral Cortex makes it easy to apply built-in algorithms to data streams.  In this case, `gps_clusters` is imported from the algorithm library, then `compute` is utilized to run this algorithm on the `gps_stream` to generate a set of centroids. This is the general format for applying algorithm to datastream and makes it easy for researchers to apply validated and tested algorithms to his/her own data without the need to become an expert in the particular set of transformations needed.  

_Note:_ the `compute` method engages the parallel computation capabilities of Cerebral Cortex, which causes all the data to be read from the data storage layer and processed on every computational core available to the system.  This allows the computation to run as quickly as possible and to take advantage of powerful clusters from a relatively simple interface.  This capability is critical to working with mobile sensor big data where data sizes can exceed 100s of gigabytes per datastream for larger studies.

In [4]:
from cerebralcortex.algorithms import gps_clusters
centroids = gps_stream.compute(gps_clusters)
centroids.show(truncate=False)

+------------------------------------+---------+----------+
|user                                |latitude |longitude |
+------------------------------------+---------+----------+
|00000000-afb8-476e-9872-6472b4e66b68|35.18813 |-90.03865 |
|00000000-afb8-476e-9872-6472b4e66b68|35.120262|-89.8703  |
|00000000-afb8-476e-9872-6472b4e66b68|35.184944|-89.95288 |
|00000000-afb8-476e-9872-6472b4e66b68|35.13735 |-89.884705|
|00000000-afb8-476e-9872-6472b4e66b68|35.106743|-89.97342 |
|00000000-afb8-476e-9872-6472b4e66b68|35.150833|-89.88389 |
|00000000-afb8-476e-9872-6472b4e66b68|35.10934 |-89.98437 |
|00000000-afb8-476e-9872-6472b4e66b68|35.125534|-89.88513 |
|00000000-afb8-476e-9872-6472b4e66b68|35.190693|-89.91586 |
|00000000-afb8-476e-9872-6472b4e66b68|35.184727|-90.03771 |
|00000000-afb8-476e-9872-6472b4e66b68|35.097404|-89.87453 |
+------------------------------------+---------+----------+



## Visualize GPS Data

### GPS Stream Plot
GPS visualization requires dedicated plotting capabilities. Cerebral Cortex includes a library to allow for interactive exploration.  In this plot, use your mouse to drag the map around along with zooming in to explore the specific data points.

In [5]:
gps_stream.plot_gps_cords(zoom=8)

Map(basemap={'url': 'https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', 'max_zoom': 19, 'attribution': 'Map …

### Centroids Stream Plot
This plot shows only the centroid locations from the clustering algorithm.

In [6]:
centroids.plot_gps_cords(zoom=12)

Map(basemap={'url': 'https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', 'max_zoom': 19, 'attribution': 'Map …