08.02.2023 Area A Data Modeling #30
Replies: 3 comments 5 replies
-
I'm working on the implementation of the array data we discussed last week. I am wondering how we thought to deal with the percentiles (Median, First Quartile, Third Quartile) for arrays with dim>1? We could compute them along each axis but that defeats the purpose of ending up with scalar descriptors. I'm guessing there is a reason that numpy has class methods for mean, min, max, std and shape of an array whilst the percentiles need to be calculated along a specified array with np.percentile(). My suggestion would be to not include these properties in the precomputed values. |
Beta Was this translation helpful? Give feedback.
-
Since any of the quantiles needs an ordered list of the observed values, shape of the array does not matter. |
Beta Was this translation helpful? Give feedback.
-
You may want to swap / add median for the 0.5 quantile. I like the result 🙂 👍 |
Beta Was this translation helpful? Give feedback.
-
Meeting Notes
Tamas raised the question of whether the statistical parameters should be stored or be extracted on the fly from the data. Sandor raised the point that the data is not always known and then we need to store the statistical parameters.
Sandor mentioned that an array can be a subset of a larger array.
Goals:
Tamas made the analogy to "summary" of dataframe in the R programming language. There the following descriptors are used:
Numerical Array
Additionally we might need:
Context Array
Examples of implementations: xarray.DataArray, pandas.Series, NXData
Non-statistical labels we need for an array in a context:
Table of Context Array
Examples of implementations: xarray.Dataset, pandas.DataFrame
Do we need this? → Leave it for the Application defintion/schema
Future Discussion Points
ContextArray
should be usedTasks
NumericalArray
without referenceBeta Was this translation helpful? Give feedback.
All reactions