## Compute average values for statistics

The dataset was [rescaled](04-rescale-size-features.ipynb), and [inf values](05-a-remove-inf-values.ipynb) as well as the [5-, and 95-percentiles](05-b-filter-5-and-95-percentiles.ipynb) filtered.

For the subsequent statistics, the average values of the features should be plotted. Here a dataframe will be computed which contains average values from the filtered dataframe.

For the postnatal development 3 technical replicates per biological replicate were acquire. The following strategy is applied. Calculate the mean of every image/technical replicate, then compute the mean out of the means of the technical replicates.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Define path were measurements are stored
path = "../../measurements/wt-postnatal-development/"

In [3]:
# Load the measurements
measurements = pd.read_csv(path + "05-b-measurements-filtered.csv")
measurements

Unnamed: 0,label,age,biol_repl,image_id,maximum,mean,median,minimum,sigma,sum,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,1,8,1,0,803.0,314.092437,282.884766,186.0,99.098496,261639.0,...,138.287624,1.885333,8.107236,18.460443,8.872033,2.037236,85.604886,116.461211,51.389857,28.070656
1,2,8,1,0,1845.0,620.114058,466.576172,174.0,407.194822,467566.0,...,129.399980,1.823742,6.111447,8.976538,3.444867,2.044837,43.541779,42.257627,29.149001,25.408493
2,3,8,1,0,564.0,274.831858,253.494141,154.0,91.856545,62112.0,...,57.955182,1.220513,3.995289,6.273068,2.549953,1.312964,21.829812,14.018479,9.233325,7.615808
3,4,8,1,0,540.0,285.008439,268.189453,153.0,89.769054,67547.0,...,59.820799,1.240002,4.669557,6.273068,2.766553,1.117553,23.752687,16.175168,9.570308,7.986489
4,5,8,1,0,264.0,234.200000,238.798828,202.0,20.192821,5855.0,...,13.355085,0.585895,2.261000,2.777771,1.010003,0.745937,8.389508,2.426275,1.112043,0.842457
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14896,173,24,28,83,312.0,243.290323,245.492188,179.0,26.596939,60336.0,...,61.657763,1.258897,5.297594,9.737588,4.460800,1.451807,40.056207,46.705798,12.805341,8.357170
14897,174,24,28,83,685.0,346.677288,326.101562,187.0,92.460071,424333.0,...,178.734172,2.143385,8.551879,30.066147,8.226779,5.176775,131.982553,226.856733,104.262438,41.246679
14898,175,24,28,83,609.0,337.340530,326.101562,188.0,81.507414,343750.0,...,158.174366,2.016343,8.478365,21.955740,9.303697,3.873424,102.052789,176.915902,74.136187,34.338534
14899,176,24,28,83,347.0,255.431818,252.820312,193.0,37.513655,11239.0,...,19.468019,0.707387,1.910894,4.002664,1.622372,0.810159,7.356350,3.032844,1.819706,1.482724


### Average value per image

The average values in every image are calculated.

In [4]:
# Calculate the means per image id
measurements_mean_image_id = measurements.groupby("image_id", as_index=False).mean()
measurements_mean_image_id

Unnamed: 0,image_id,label,age,biol_repl,maximum,mean,median,minimum,sigma,sum,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,0,61.238095,8.0,1.0,701.495238,323.190983,291.492020,172.514286,112.388376,124832.466667,...,64.854577,1.212414,4.376663,7.274597,2.861976,1.467482,33.719630,32.989962,15.991914,10.687807
1,1,47.180556,8.0,1.0,563.236111,286.562035,266.415582,164.944444,84.951192,80982.319444,...,52.464083,1.081587,3.668398,6.456889,2.584198,1.300150,25.963713,24.344190,10.917302,7.874162
2,2,54.347222,8.0,1.0,451.486111,221.246799,203.799154,123.083333,70.283828,62899.625000,...,54.470576,1.111400,3.756347,7.154501,2.749499,1.444704,28.909043,33.165647,12.746369,8.252331
3,3,64.277778,8.0,2.0,787.185185,343.726559,306.521412,170.425926,129.625000,154824.250000,...,72.646383,1.288045,4.843015,7.531762,2.903500,1.561911,38.286478,35.257436,17.480976,12.336374
4,4,50.387097,8.0,2.0,611.774194,296.302608,270.055234,161.505376,97.886646,88663.946237,...,55.626533,1.136935,4.105999,6.274302,2.417556,1.382756,27.115502,25.051944,11.488210,8.248828
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,79,117.869565,24.0,27.0,611.717391,316.128147,292.964292,182.244565,92.345492,126006.521739,...,64.012789,1.189646,4.187781,8.855588,3.711237,1.434627,34.193113,44.043635,17.200768,10.770809
80,80,114.365714,24.0,27.0,631.914286,324.652658,300.013438,182.262857,97.438955,127163.771429,...,64.374016,1.202627,4.061732,8.476968,3.428620,1.507192,33.344003,35.997837,16.051736,10.615532
81,81,152.672566,24.0,28.0,634.973451,329.119944,306.257329,187.991150,95.008471,151482.650442,...,72.898751,1.280820,4.725444,9.222091,3.978519,1.447342,39.924579,45.754940,19.787145,12.743164
82,82,89.896000,24.0,28.0,664.736000,327.222305,304.426609,179.464000,97.605960,192412.352000,...,86.736299,1.386817,5.359738,11.984810,4.798374,1.797350,54.979065,84.957644,31.257973,16.650988


### Average per biological replicate

Now this dataframe is used to calculate the mean per biological replicate.

In [5]:
# Calculate mean in biological replicate
measurements_mean_biol_repl = measurements_mean_image_id.groupby("biol_repl", as_index=False).mean()
measurements_mean_biol_repl

Unnamed: 0,biol_repl,image_id,label,age,maximum,mean,median,minimum,sigma,sum,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,1.0,1.0,54.255291,8.0,572.072487,276.999939,253.902252,153.514021,89.207799,89571.47037,...,57.263079,1.135134,3.933802,6.961996,2.731891,1.404112,29.530795,30.166599,13.218529,8.9381
1,2.0,4.0,43.630716,8.0,642.001611,293.921538,264.767499,153.492252,105.143326,100047.019958,...,59.336854,1.169092,4.117962,6.532768,2.525268,1.426223,28.903283,25.224753,12.442629,9.116965
2,3.0,7.0,50.067179,8.0,623.870848,291.964848,263.895176,157.205173,100.655201,82353.442963,...,55.545579,1.138754,4.040329,6.29017,2.441629,1.356562,26.698815,21.851104,10.956077,8.114211
3,4.0,10.0,50.281831,8.0,510.612076,233.010028,210.645494,123.954499,80.458537,79526.650194,...,60.330089,1.174022,4.137301,6.858062,2.631444,1.437465,29.956208,28.430755,13.69187,9.468924
4,5.0,13.0,86.511799,10.0,426.188669,223.960324,207.407505,130.204908,66.522749,43070.950822,...,43.972786,1.013614,3.286424,5.561892,2.154343,1.245811,19.189221,14.266504,7.558345,5.741588
5,6.0,16.0,101.040043,10.0,444.09734,234.782142,217.308213,138.078292,69.76719,42249.313234,...,41.656761,0.987452,3.27971,5.149347,2.033763,1.145403,17.11263,11.7798,6.650701,5.277719
6,7.0,19.0,122.760605,10.0,296.683127,156.582861,145.243371,90.576198,46.134377,29081.355276,...,43.664507,1.01285,3.374919,5.403251,2.149089,1.17614,18.805765,12.986054,7.247627,5.646304
7,8.0,22.0,121.242649,12.0,708.948804,327.162766,294.613773,173.749242,115.357722,97428.128465,...,54.737015,1.116399,3.803847,6.773311,2.672986,1.40439,27.737735,27.493027,12.148407,8.218802
8,9.0,25.0,122.937528,12.0,802.868097,351.689919,312.281348,175.014586,134.624877,125284.872246,...,59.799843,1.158854,3.807037,7.248914,2.823805,1.49845,30.254949,31.855411,13.854024,9.532479
9,10.0,28.0,99.605241,12.0,367.077671,200.070006,186.465406,119.939012,56.161829,39004.644993,...,43.384532,1.000518,3.372486,5.349196,2.10905,1.168569,19.182869,14.767607,7.450016,5.722417


### Filter columns: image_id and label

The 2 columns `image_id`, and `label` contain misleading values now. The average image_id per biological replicate is redundant. For the label instead of the mean number of label per image, the max number of labels per image should be determined. This will be done in the next notebook.

In [6]:
# Filter image_id and label
measurements_filtered = measurements_mean_biol_repl.drop(labels=["image_id", "label"], axis=1)
measurements_filtered

Unnamed: 0,biol_repl,age,maximum,mean,median,minimum,sigma,sum,variance,flatness,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,1.0,8.0,572.072487,276.999939,253.902252,153.514021,89.207799,89571.47037,12483.492986,1.729285,...,57.263079,1.135134,3.933802,6.961996,2.731891,1.404112,29.530795,30.166599,13.218529,8.9381
1,2.0,8.0,642.001611,293.921538,264.767499,153.492252,105.143326,100047.019958,16892.070744,1.665311,...,59.336854,1.169092,4.117962,6.532768,2.525268,1.426223,28.903283,25.224753,12.442629,9.116965
2,3.0,8.0,623.870848,291.964848,263.895176,157.205173,100.655201,82353.442963,14736.391053,1.705281,...,55.545579,1.138754,4.040329,6.29017,2.441629,1.356562,26.698815,21.851104,10.956077,8.114211
3,4.0,8.0,510.612076,233.010028,210.645494,123.954499,80.458537,79526.650194,10597.408317,1.698195,...,60.330089,1.174022,4.137301,6.858062,2.631444,1.437465,29.956208,28.430755,13.69187,9.468924
4,5.0,10.0,426.188669,223.960324,207.407505,130.204908,66.522749,43070.950822,6669.229487,1.623627,...,43.972786,1.013614,3.286424,5.561892,2.154343,1.245811,19.189221,14.266504,7.558345,5.741588
5,6.0,10.0,444.09734,234.782142,217.308213,138.078292,69.76719,42249.313234,7139.497653,1.673945,...,41.656761,0.987452,3.27971,5.149347,2.033763,1.145403,17.11263,11.7798,6.650701,5.277719
6,7.0,10.0,296.683127,156.582861,145.243371,90.576198,46.134377,29081.355276,3095.41132,1.695566,...,43.664507,1.01285,3.374919,5.403251,2.149089,1.17614,18.805765,12.986054,7.247627,5.646304
7,8.0,12.0,708.948804,327.162766,294.613773,173.749242,115.357722,97428.128465,20513.213634,1.677384,...,54.737015,1.116399,3.803847,6.773311,2.672986,1.40439,27.737735,27.493027,12.148407,8.218802
8,9.0,12.0,802.868097,351.689919,312.281348,175.014586,134.624877,125284.872246,27550.673411,1.639469,...,59.799843,1.158854,3.807037,7.248914,2.823805,1.49845,30.254949,31.855411,13.854024,9.532479
9,10.0,12.0,367.077671,200.070006,186.465406,119.939012,56.161829,39004.644993,4832.335376,1.689114,...,43.384532,1.000518,3.372486,5.349196,2.10905,1.168569,19.182869,14.767607,7.450016,5.722417


In [7]:
measurements_filtered.to_csv(path + "06-measurements-average.csv", index=False)