## Precomputed_annotation_properties
The purpose of this notebook is to figure out how to include properties such as cell type, cell size, etc.. in a precomputed annotation layer from a list of 3d points and their properties in Python.
Refer to this document: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/annotations.md for general structure.

In [1]:
import numpy as np
import os
import csv
import struct
import json
from cloudvolume import CloudVolume
import matplotlib.pyplot as plt
import neuroglancer
%matplotlib inline

In [2]:
# get the raw-space cells file and load it in
animal_id = 4
pth=os.path.join('/jukebox/wang/Jess/lightsheet_output',
        '201904_ymaze_cfos','processed',f'an{animal_id}','clearmap_cluster_output',
        'cells.npy')
converted_points = np.load(pth)

In [3]:
converted_points

array([[ 459, 1398,   50],
       [ 459, 1443,   50],
       [ 462, 1412,   49],
       ...,
       [1546, 1242,  569],
       [1547, 1316,  570],
       [1646, 1328,  574]])

In [4]:
len(converted_points)

524170

In [None]:
np.random.shuffle(converted_points) # does it in place

## Just coordinates - something we already know how to do

In [12]:
# We already know how to encode just the coordinates. Do it like so for the first 100 points
filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_coords/spatial0/0_0_0'
coordinates = converted_points[0:1000]
total_count = len(coordinates)
with open(filename,'wb') as outfile:
    buf = struct.pack('<Q',total_count)
    pt_buf = b''.join(struct.pack('<3f',x,y,z) for (x,y,z) in coordinates)
    buf += pt_buf
    id_buf = struct.pack('<%sQ' % len(coordinates), *range(len(coordinates)))
    buf += id_buf
    outfile.write(buf)
print(f"wrote {filename}")

wrote /home/ahoag/ngdemo/demo_bucket/test_annotations/test_coords/spatial0/0_0_0


In [13]:
# and the info file needs to look like this:
info = {
  "@type": "neuroglancer_annotations_v1",
  "annotation_type": "POINT",
  "by_id": {
    "key": "by_id"
  },
  "dimensions": {
    "x": [
      "5e-06",
      "m"
    ],
    "y": [
      "5e-06",
      "m"
    ],
    "z": [
      "1e-05",
      "m"
    ]
  },
  "lower_bound": [
    0,
    0,
    0
  ],
  "properties": [],
  "relationships": [],
  "spatial": [
    {
      "chunk_size": [
        2160,
        2560,
        687
      ],
      "grid_shape": [
        1,
        1,
        1
      ],
      "key": "spatial0",
      "limit": 1
    }
  ],
  "upper_bound": [
    2160,
    2560,
    687
  ]
}

In [40]:
info_filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_coords/info'
with open(info_filename,'w') as outfile:
    json.dump(info,outfile,indent=2)

Got this to work. Next let's try to add a single property, cell type.

## Single property -- cell type

Let's make this a uint8 such that values can range from 0-31. We will randomly assign cell types to each of the 1000 cells. This comment instructs how to encode when you have multiple properties: https://github.com/google/neuroglancer/issues/227#issuecomment-913895464: 
```
In order to minimize the padding bytes required, properties that require 4 byte alignment (uint32, int32, float32) are encoded first, followed by properties that require 2 byte alignment (uint16, int16), followed by properties that require 1 byte alignment (uint8, int8, rgb, rgba). For a given alignment, the properties are encoded in which the properties are specified in the info file.
```

Since we only have a single property this doesn't matter yet. We do, however, need to know where in the byte string to put the properties. From this file: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/annotations.md#multiple-annotation-encoding the answer is in the same entries as the coordinates. 

In [49]:
filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_singleprop/spatial0/0_0_0'
coordinates = converted_points[0:1000]
total_count = len(coordinates)
cell_types = np.random.randint(0,32,(1000,1))
# combine the coordinates and cell types into a single array
cell_array = np.hstack((coordinates,cell_types))
with open(filename,'wb') as outfile:
    buf = struct.pack('<Q',total_count) # 64-bit little endian
    pt_buf = b''.join(struct.pack('<3fH2B',x,y,z,c,0,0) for (x,y,z,c) in cell_array) 
    buf += pt_buf
    id_buf = struct.pack('<%sQ' % len(coordinates), *range(len(coordinates)))
    buf += id_buf
    outfile.write(buf)
print(f"wrote {filename}")

wrote /home/ahoag/ngdemo/demo_bucket/test_annotations/test_singleprop/spatial0/0_0_0


Now write the info file. According to this: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/annotations.md#info-json-file-format, the properties key must have this structure:
```
"properties": Array of JSON objects, each with the following members:
"id": String value specifying unique identifier for the property. Must match the regular expression /^[a-z][a-zA-Z0-9_]*$/.
"type": String value specifying the property type. Must be one of: rgb (represented as 3 uint8 values), rgba (represented as 4 uint8 values), uint8, int8, uint16, int16, uint32, int32, or float32.
"description": Optional. String value specifying textual description of property shown in UI.
"enum_values": Optional. If "type" is a numeric type (not "rgb" or "rgba"), this property may specify an array of values (compatible with the specified data type). These values correspond to the labels specified by "enum_labels", which are shown in the UI.
"enum_labels": Must be specified if, and only if, "enum_values" is specified. Must be an array of strings of the same length as "enum_values" specifying the corresponding labels for each value.

```

In [50]:
info = {
  "@type": "neuroglancer_annotations_v1",
  "annotation_type": "POINT",
  "by_id": {
    "key": "by_id"
  },
  "dimensions": {
    "x": [
      "5e-06",
      "m"
    ],
    "y": [
      "5e-06",
      "m"
    ],
    "z": [
      "1e-05",
      "m"
    ]
  },
  "lower_bound": [
    0,
    0,
    0
  ],
  "properties": [
      {"id":"celltype",
      "type":"uint16"
      }
  ],
  "relationships": [],
  "spatial": [
    {
      "chunk_size": [
        2160,
        2560,
        687
      ],
      "grid_shape": [
        1,
        1,
        1
      ],
      "key": "spatial0",
      "limit": 1
    }
  ],
  "upper_bound": [
    2160,
    2560,
    687
  ]
}

In [51]:
info_filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_singleprop/info'
with open(info_filename,'w') as outfile:
    json.dump(info,outfile,indent=2)

I can see the celltype property! The cell type shows up as a float, which in my case isn't ideal since I want to show it as an integer. How can I structure the packing so that I get three floats followed by a uint16 integer? What about `<3fH` in the byte encoding and `uint16` in the props dict? Nope that didn't show any of the points.  Maybe I need a byte offset to get to multiple of 4. So two zeros?

Yes, that did it! OK, let's try multiple properties.

## Multiple properties -- cell type, cell size
For multiple properties, the order of the properties is prioritized by byte size first then order in the info file at a given byte size, as per this comment:
https://github.com/google/neuroglancer/issues/227#issuecomment-913895464: 
```
In order to minimize the padding bytes required, properties that require 4 byte alignment (uint32, int32, float32) are encoded first, followed by properties that require 2 byte alignment (uint16, int16), followed by properties that require 1 byte alignment (uint8, int8, rgb, rgba). For a given alignment, the properties are encoded in which the properties are specified in the info file.
```
So let's say we want to use a float32 for the cell size and a uint16 for the cell type. We would do:
```
struct.pack('<4fH2B',(x,y,z,cell_size,cell_type))
```
where the final `2B` are padding bytes to make the total number of bytes divisible by 4.

Let's randomly generate some cell sizes and try to write this out

In [79]:
coordinates

array([[ 870, 1762,  407],
       [ 270,  803,  226],
       [ 296, 1538,  469],
       ...,
       [ 679, 1678,  152],
       [ 883, 1923,  307],
       [ 516, 1180,  202]])

In [81]:
coordinates.

array([[ 870., 1762.,  407.],
       [ 270.,  803.,  226.],
       [ 296., 1538.,  469.],
       ...,
       [ 679., 1678.,  152.],
       [ 883., 1923.,  307.],
       [ 516., 1180.,  202.]], dtype=float32)

In [85]:
np.hstack((coordinates.astype('f'),cell_sizes.astype('f'),cell_types.astype('uint16')))

array([[ 870.       , 1762.       ,  407.       ,   44.201786 ,
          22.       ],
       [ 270.       ,  803.       ,  226.       ,   37.620766 ,
          19.       ],
       [ 296.       , 1538.       ,  469.       ,   97.47168  ,
          11.       ],
       ...,
       [ 679.       , 1678.       ,  152.       ,   36.99874  ,
          14.       ],
       [ 883.       , 1923.       ,  307.       ,    3.7889192,
          30.       ],
       [ 516.       , 1180.       ,  202.       ,   86.47215  ,
           9.       ]], dtype=float32)

In [89]:
filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/spatial0/0_0_0'
coordinates = converted_points[0:1000]
total_count = len(coordinates)
cell_sizes = np.random.uniform(0,100,(1000,1))
cell_types = np.random.randint(0,32,(1000,1))
# combine the coordinates, cell sizes, and cell types into a single array
# cell_array = np.hstack((coordinates,cell_sizes,cell_types))
cell_array = np.hstack((coordinates,cell_sizes,cell_types))
with open(filename,'wb') as outfile:
    buf = struct.pack('<Q',total_count) # 64-bit little endian
    pt_buf = b''.join(struct.pack('<4fH2B',x,y,z,s,int(c),0,0) for (x,y,z,s,c) in cell_array) 
    buf += pt_buf
    id_buf = struct.pack('<%sQ' % len(coordinates), *range(len(coordinates)))
    buf += id_buf
    outfile.write(buf)
print(f"wrote {filename}")

wrote /home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/spatial0/0_0_0


In [88]:
# Now write the info file. Not clear what the order of the properties should be. 
# I think since we have two properties at different byte sizes then the order will 
# be figured out by order of byte size so the order in the info file doesn't actually 
# matter. Can try both orders and see what happens. 
info = {
  "@type": "neuroglancer_annotations_v1",
  "annotation_type": "POINT",
  "by_id": {
    "key": "by_id"
  },
  "dimensions": {
    "x": [
      "5e-06",
      "m"
    ],
    "y": [
      "5e-06",
      "m"
    ],
    "z": [
      "1e-05",
      "m"
    ]
  },
  "lower_bound": [
    0,
    0,
    0
  ],
  "properties": [
      {"id":"celltype",
      "type":"uint16"
      },
      {"id":"size",
      "type":"float32"
      }
  ],
  "relationships": [],
  "spatial": [
    {
      "chunk_size": [
        2160,
        2560,
        687
      ],
      "grid_shape": [
        1,
        1,
        1
      ],
      "key": "spatial0",
      "limit": 1
    }
  ],
  "upper_bound": [
    2160,
    2560,
    687
  ]
}

In [90]:
info_filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/info'
with open(info_filename,'w') as outfile:
    json.dump(info,outfile,indent=2)

This order worked!!

If we had two float32 type properties, the order we would put them in the struct string would be the same as the order in which they appear in the info file. Let's try that

In [91]:
filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/spatial0/0_0_0'
coordinates = converted_points[0:1000]
total_count = len(coordinates)
cell_sizes = np.random.uniform(0,100,(1000,1))
cell_stds = np.random.uniform(0,1,(1000,1))
cell_types = np.random.randint(0,32,(1000,1))
# combine the coordinates, cell sizes, and cell types into a single array
# cell_array = np.hstack((coordinates,cell_sizes,cell_types))
cell_array = np.hstack((coordinates,cell_sizes,cell_stds,cell_types))
with open(filename,'wb') as outfile:
    buf = struct.pack('<Q',total_count) # 64-bit little endian
    pt_buf = b''.join(struct.pack('<5fH2B',x,y,z,s,std,int(c),0,0) for (x,y,z,s,std,c) in cell_array) 
    buf += pt_buf
    id_buf = struct.pack('<%sQ' % len(coordinates), *range(len(coordinates)))
    buf += id_buf
    outfile.write(buf)
print(f"wrote {filename}")

wrote /home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/spatial0/0_0_0


In [92]:
# Now write the info file. Not clear what the order of the properties should be. 
# I think since we have two properties at different byte sizes then the order will 
# be figured out by order of byte size so the order in the info file doesn't actually 
# matter. Can try both orders and see what happens. 
info = {
  "@type": "neuroglancer_annotations_v1",
  "annotation_type": "POINT",
  "by_id": {
    "key": "by_id"
  },
  "dimensions": {
    "x": [
      "5e-06",
      "m"
    ],
    "y": [
      "5e-06",
      "m"
    ],
    "z": [
      "1e-05",
      "m"
    ]
  },
  "lower_bound": [
    0,
    0,
    0
  ],
  "properties": [
      {"id":"celltype",
      "type":"uint16"
      },
      {"id":"size",
      "type":"float32"
      },
      {"id":"std",
      "type":"float32"
      }
  ],
  "relationships": [],
  "spatial": [
    {
      "chunk_size": [
        2160,
        2560,
        687
      ],
      "grid_shape": [
        1,
        1,
        1
      ],
      "key": "spatial0",
      "limit": 1
    }
  ],
  "upper_bound": [
    2160,
    2560,
    687
  ]
}

In [93]:
info_filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/info'
with open(info_filename,'w') as outfile:
    json.dump(info,outfile,indent=2)

That worked. It appears that the properties are limited to float and int types, however, there is some functionality to use the `enum_values` key which might allow for strings:
```
"enum_values": Optional. If "type" is a numeric type (not "rgb" or "rgba"), this property may specify an array of values (compatible with the specified data type). These values correspond to the labels specified by "enum_labels", which are shown in the UI.
"enum_labels": Must be specified if, and only if, "enum_values" is specified. Must be an array of strings of the same length as "enum_values" specifying the corresponding labels for each value.
```
So I think this might offer a way to render the cell types as strings without encoding anything else. Let's try this with a new info file:

In [121]:
# Now write the info file. Not clear what the order of the properties should be. 
# I think since we have two properties at different byte sizes then the order will 
# be figured out by order of byte size so the order in the info file doesn't actually 
# matter. Can try both orders and see what happens. 
info = {
  "@type": "neuroglancer_annotations_v1",
  "annotation_type": "POINT",
  "by_id": {
    "key": "by_id"
  },
  "dimensions": {
    "x": [
      "5e-06",
      "m"
    ],
    "y": [
      "5e-06",
      "m"
    ],
    "z": [
      "1e-05",
      "m"
    ]
  },
  "lower_bound": [
    0,
    0,
    0
  ],
  "properties": [
      {"id":"celltype",
      "type":"uint16",
       "enum_values":[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
       "enum_labels":['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F']
      },
      {"id":"size",
      "type":"float32"
      },
      {"id":"std",
      "type":"float32"
      }
  ],
  "relationships": [],
  "spatial": [
    {
      "chunk_size": [
        2160,
        2560,
        687
      ],
      "grid_shape": [
        1,
        1,
        1
      ],
      "key": "spatial0",
      "limit": 1
    }
  ],
  "upper_bound": [
    2160,
    2560,
    687
  ]
}

In [122]:
info_filename = '/home/ahoag/ngdemo/demo_bucket/test_annotations/test_multiprops/info'
with open(info_filename,'w') as outfile:
    json.dump(info,outfile,indent=2)

This worked! It shows the label and the enum integer value in the properties panel.

This allows us a lot of flexibility when it comes to displaying annotation properties. 