Table of contents:

[Data preparation](#data-preparation)

[Checking the standard deviation boundary](#checking-the-standard-deviation-boundary)

[Checking the mean value](#checking-the-mean-value)

[Checking the individual simulations](#checking-the-individual-simulations)

[Norm limit test](#norm-limit-test)
  - [Norm Linf](#norm-linf)
  - [Norm L2](#norm-l2)

[Set multiple tests](#set-multiple-tests)

In [36]:
from citros_data_analysis import data_access as da
from citros_data_analysis import validation as va

## Data preparation

Globular clusters belongs to the halo part of the Galaxy. We can check if the 'z' coordinate, that is the vertical coordinate of the cluster from the galactic plane
, exceed the thickness of the Galaxy thick disc D = 1.2 kpc in all simulations.

Let's download data, columns 'data.data[0]' and 'data.data[5]', that are time and vertical coordinate correspondingly:

In [37]:
#create `CitrosDB` object to download data:
citros = da.CitrosDB(database = 'lulav')

#download data
df = citros.batch('galactic orbits_1').topic('/gal_orbits').data(['data.data[0]', 'data.data[5]'])
df.rename({'data.data[0]': 't', 'data.data[5]': 'z'}, axis = 1, inplace = True)
df.head(5)

Construct Validation object. It determines how the data will be preprocessed:

- `data_label` determine data columns, 
- `param_label` is for independent variable that will be used for setting correspondence between different sids and 
- `method` determines the method of index assignment:
  - 'scale': by scaling 'param_label' to unit interval and interpolating data on this interval,
  - 'bin': by deviding 'param_label' on bins and calculating mean data values among points falled in each bin, 
- `num` determines number of points if method set as 'scale' or bins if 'method' set as 'bin':

In [38]:
V = va.Validation(df, data_label = 'z', param_label = 't', method = 'scale', num = 50, units = 'kpc')

## Checking the standard deviation boundary

Test whether the standard deviation boundary is within the limits

- `limits`: 
  - a one value to set the same +-limits to elements of the vector, for examples limits = 0.25
  - list of values to set +-limits for each vector element, for examples limits = [0.25, 0.5, 100]
  - list of lists to set lower and upper intervals separately, for examples limits = [0.25, [-0.3, 0.8], [-150, 100]]
- `n_std`: number of standard deviations in standard deviation boundary
- `nan_passed`: whether nan values are treated as passed test or not
- to style the plot:
  - `std_area` - set True to fill with color standard deviation boundary
  - `std_lines` - set False to remove standard deviation boundary lines
  - `std_color` - set standard deviation boundary color, default 'b'

 Let's test if the 3 standard deviation boundary exceeds the thickness of the disc D = 1.2 kpc:

In [40]:
log, table, fig = V.std_bound_test(limits = 1.2, n_std = 3, nan_passed = True, std_area = True, std_lines = False, std_color = 'b')

Print the report of the test:

In [41]:
log.print()

As predicted, test is not passed because z coordinate exceed the limits. Let's printindex and corresponding to this index t value by:

In [42]:
print('\nvalue of the z that does not passed the test:')

log['z']['failed'].print()

DataFrame table that for each point indicates whether it passes the test or not:

In [43]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Checking the mean value

Define limits within which the mean value should be, for example |z| < 1.2

In [45]:
log, table, fig = V.mean_test(limits = 1.2)

Print the report of the test:

In [44]:
log.print()

DataFrame table that for each point indicates whether it passes the test or not:

In [46]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Checking the individual simulations

In [47]:
log, table, fig = V.sid_test(limits = 1.2)

Print the report of the test:

In [48]:
log.print()

A lot of points of the simulations do not pass the test
Print its indeces and corresponding to them independent value, for examples, for the simulation with sid = 3:

In [50]:
print('\nvalues of the data.dyn_inertial.f_b[0], simulation sid = 3, that does not passed the test:')

log['z']['failed'][3].print()

DataFrame table that for each point indicates whether it passes the test or not:

In [51]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Norm limit test

Test whether the norm of each simulation do not exceed the given limit

### Norm Linf

`norm_type` = 'Linf' - test whether absolute maximum of each simulation is less then the limits

In [None]:
log, table, fig = V.norm_test(norm_type = 'Linf', limits = 2)

Print the report of the test:

In [None]:
log.print()

DataFrame table that for each point indicates whether it passes the test or not:

In [None]:
print(table.head(5))

DataFrame table that for each point indicates whether it passes the test or not:

### Norm L2

`norm_type` = 'L2' - test whether for each simulation the Euclidean norm (square root of the sum of the squares) does not exceed the given limit:

In [None]:
log, table, fig = V.norm_test(norm_type = 'L2', limits = 0.2)

Print the report of the test:

In [None]:
log.print()

Some simulations do not pass the test. Let's check their sid for the 'R':

In [None]:
print('\nSimulations of z, that do not pass L2 norm test:')

print(log['z']['failed'])

print()

DataFrame table that for each point indicates whether it passes the test or not

In [None]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Set multiple tests

Set the test listed above by set_tests() method.

Pass parameters of the tests as dictionaries with key being the test name ('std_bound', 'mean', 'sid', 'norm_L2', 'norm_Linf'):

In [None]:
logs, tables, figs = V.set_tests(test_method = 
                                 {'std_bound' : {'limits' : 0.2, 'n_std': 3},
                                  'norm_Linf' : {'limits' : 2}})

logs, tables, figs are the dictionaries with the corresponding to each test log, table and fig, where key of the dictionary is the name of the test:

In [None]:
logs['std_bound'].print()
print(tables['norm_Linf'])