## Data filtering - inf values

The feature extraction with porespy added under certain cases inf values for some 3D measurements (sphericity or convex volume). It is expected, that the classifier still segments smaller particles in 2D. When porespy tries to extract a 3D feature from a 2D label the inf value is returned. 

Since these inf values could harm the downstream analysis process, they will be removed here.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Define path were measurements are stored
path = "../../measurements/wt-postnatal-development/"

In [4]:
# Load the measurements
measurements = pd.read_csv(path + "04-measurements-rescaled.csv")
measurements

Unnamed: 0,label,age,biol_repl,image_id,maximum,mean,median,minimum,sigma,sum,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,1,8,1,0,803.0,314.092437,282.884766,186.0,99.098496,261639.0,...,138.287624,1.885333,8.107236,18.460443,8.872033,2.037236,85.604886,116.461211,51.389857,28.070656
1,2,8,1,0,1845.0,620.114058,466.576172,174.0,407.194822,467566.0,...,129.399980,1.823742,6.111447,8.976538,3.444867,2.044837,43.541779,42.257627,29.149001,25.408493
2,3,8,1,0,564.0,274.831858,253.494141,154.0,91.856545,62112.0,...,57.955182,1.220513,3.995289,6.273068,2.549953,1.312964,21.829812,14.018479,9.233325,7.615808
3,4,8,1,0,540.0,285.008439,268.189453,153.0,89.769054,67547.0,...,59.820799,1.240002,4.669557,6.273068,2.766553,1.117553,23.752687,16.175168,9.570308,7.986489
4,5,8,1,0,264.0,234.200000,238.798828,202.0,20.192821,5855.0,...,13.355085,0.585895,2.261000,2.777771,1.010003,0.745937,8.389508,2.426275,1.112043,0.842457
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18703,174,24,28,83,685.0,346.677288,326.101562,187.0,92.460071,424333.0,...,178.734172,2.143385,8.551879,30.066147,8.226779,5.176775,131.982553,226.856733,104.262438,41.246679
18704,175,24,28,83,609.0,337.340530,326.101562,188.0,81.507414,343750.0,...,158.174366,2.016343,8.478365,21.955740,9.303697,3.873424,102.052789,176.915902,74.136187,34.338534
18705,176,24,28,83,347.0,255.431818,252.820312,193.0,37.513655,11239.0,...,19.468019,0.707387,1.910894,4.002664,1.622372,0.810159,7.356350,3.032844,1.819706,1.482724
18706,177,24,28,83,345.0,247.479915,245.492188,176.0,32.455513,117058.0,...,94.825995,1.561205,7.951313,19.177968,9.110146,2.632142,66.644046,116.764495,44.919790,15.939280


### Identify inf values

Check if there are inf values in the dataframe and how many rows contain inf values.

In [5]:
# Check for inf values
np.isinf(measurements.values).any()

True

In [6]:
# Check how many rows contain inf values
(measurements.isin([np.inf, -np.inf])).any(axis=1).sum()

2008

### Remove inf values from the dataframe

In [7]:
# Remove inf values
measurements_without_inf = measurements[~measurements.isin([np.inf, -np.inf]).any(axis=1)]
measurements_without_inf

Unnamed: 0,label,age,biol_repl,image_id,maximum,mean,median,minimum,sigma,sum,...,equivalent_spherical_perimeter_rescaled,equivalent_spherical_radius_rescaled,feret_diameter_rescaled,perimeter_2d_rescaled,major_axis_length_2d_rescaled,minor_axis_length_2d_rescaled,surface_area_rescaled,bbox_volume_rescaled,convex_volume_rescaled,volume_rescaled
0,1,8,1,0,803.0,314.092437,282.884766,186.0,99.098496,261639.0,...,138.287624,1.885333,8.107236,18.460443,8.872033,2.037236,85.604886,116.461211,51.389857,28.070656
1,2,8,1,0,1845.0,620.114058,466.576172,174.0,407.194822,467566.0,...,129.399980,1.823742,6.111447,8.976538,3.444867,2.044837,43.541779,42.257627,29.149001,25.408493
2,3,8,1,0,564.0,274.831858,253.494141,154.0,91.856545,62112.0,...,57.955182,1.220513,3.995289,6.273068,2.549953,1.312964,21.829812,14.018479,9.233325,7.615808
3,4,8,1,0,540.0,285.008439,268.189453,153.0,89.769054,67547.0,...,59.820799,1.240002,4.669557,6.273068,2.766553,1.117553,23.752687,16.175168,9.570308,7.986489
4,5,8,1,0,264.0,234.200000,238.798828,202.0,20.192821,5855.0,...,13.355085,0.585895,2.261000,2.777771,1.010003,0.745937,8.389508,2.426275,1.112043,0.842457
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18702,173,24,28,83,312.0,243.290323,245.492188,179.0,26.596939,60336.0,...,61.657763,1.258897,5.297594,9.737588,4.460800,1.451807,40.056207,46.705798,12.805341,8.357170
18703,174,24,28,83,685.0,346.677288,326.101562,187.0,92.460071,424333.0,...,178.734172,2.143385,8.551879,30.066147,8.226779,5.176775,131.982553,226.856733,104.262438,41.246679
18704,175,24,28,83,609.0,337.340530,326.101562,188.0,81.507414,343750.0,...,158.174366,2.016343,8.478365,21.955740,9.303697,3.873424,102.052789,176.915902,74.136187,34.338534
18705,176,24,28,83,347.0,255.431818,252.820312,193.0,37.513655,11239.0,...,19.468019,0.707387,1.910894,4.002664,1.622372,0.810159,7.356350,3.032844,1.819706,1.482724


In [8]:
# Any infs left?
np.isinf(measurements_without_inf.values).any()

False

In [9]:
measurements_without_inf.to_csv(path + "05-a-measurements-inf-reduced.csv", index = False)