Table of contents:

[Data preparation](#data-preparation)

[Checking the standard deviation boundary](#checking-the-standard-deviation-boundary)

[Checking the mean value](#checking-the-mean-value)

[Checking the individual simulations](#checking-the-individual-simulations)

[Checking the standard deviation](#checking-the-standard-deviation)

[Norm limit test](#norm-limit-test)
  - [Norm Linf](#norm-linf)
  - [Norm L2](#norm-l2)

[Set multiple tests](#set-multiple-tests)

In [None]:
from citros import CitrosDB, Validation

## Data preparation

Globular clusters belongs to the halo part of the Galaxy. We can check if the 'z' coordinate, that is the vertical coordinate of the cluster from the galactic plane
, exceed the thickness of the Galaxy thick disc D = 1.2 kpc in all simulations.

Let's download data, columns 'data.data[0]' and 'data.data[5]', that are time and vertical coordinate correspondingly:

In [None]:
#create `CitrosDB` object to download data:
citros = CitrosDB(simulation = 'simulation_gal_orbits', batch = 'galactic_orbits')

#download data
df = citros.topic('/gal_orbits').data(['data.data[0]', 'data.data[5]'])
df.rename({'data.data[0]': 't', 'data.data[5]': 'z'}, axis = 1, inplace = True)
df.head(5)

Construct Validation object. It determines how the data will be preprocessed:

- `data_label` determine data columns, 
- `param_label` is for independent variable that will be used for setting correspondence between different sids and 
- `method` determines the method of index assignment:
  - 'scale': by scaling 'param_label' to unit interval and interpolating data on this interval,
  - 'bin': by deviding 'param_label' on bins and calculating mean data values among points falled in each bin, 
- `num` determines number of points if method set as 'scale' or bins if 'method' set as 'bin':

In [None]:
V = Validation(df, data_label = 'z', param_label = 't', method = 'scale', num = 50, units = 'kpc')

## Checking the standard deviation boundary

Test whether the standard deviation boundary is within the limits

- `limits`: 
  - a one value to set the same +-limits to elements of the vector, for examples limits = 0.25
  - list of values to set +-limits for each vector element, for examples limits = [0.25, 0.5, 100]
  - list of lists to set lower and upper intervals separately, for examples limits = [0.25, [-0.3, 0.8], [-150, 100]]
- `n_std`: number of standard deviations in standard deviation boundary
- `nan_passed`: whether nan values are treated as passed test or not
- to style the plot:
  - `std_area` - set True to fill with color standard deviation boundary
  - `std_lines` - set False to remove standard deviation boundary lines
  - `std_color` - set standard deviation boundary color, default 'b'

 Let's test if the 3 standard deviation boundary exceeds the thickness of the disc D = 1.2 kpc:

In [None]:
log, table, fig = V.std_bound_test(limits = 1.2, n_std = 3, nan_passed = True, std_area = True, std_lines = False, std_color = 'b')

Print the report of the test:

In [None]:
log.print()

As predicted, test is not passed because z coordinate exceed the limits. Let's printindex and corresponding to this index t value by:

In [None]:
print('\nvalue of the z that does not passed the test:')

log['z']['failed'].print()

The DataFrame table indicates whether each point passes the test or not:

In [None]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Checking the mean value

Define limits within which the mean value should be, for example |z| < 1.2

In [None]:
log, table, fig = V.mean_test(limits = 1.2)

Print the report of the test:

In [None]:
log.print()

DataFrame table that for each point indicates whether it passes the test or not:

In [None]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Checking the individual simulations

In [None]:
log, table, fig = V.sid_test(limits = 1.2)

Print the report of the test:

In [None]:
log.print()

A lot of points of the simulations do not pass the test.
Print its indices and corresponding to them independent value, for examples, for the simulation with sid = 3:

In [None]:
print('\nvalues of the data.dyn_inertial.f_b[0], simulation sid = 3, that does not passed the test:')

log['z']['failed'][3].print()

DataFrame table that for each point indicates whether it passes the test or not:

In [None]:
print(table.head(5)) #method head(n) shows first n rows of the DataFrame table

## Checking the standard deviation

To check if the results obtained in different simulation runs do not differ too much, we can check whether the 
standard deviation exceeds the limits, for example let's check that 1 sigma standard deviations for 
x and y parameters ('data.data[9]', 'data.data[10]') are less then 1.5 kpc:

In [None]:
df_xy = citros.topic('/gal_orbits').data(['data.data[0]', 'data.data[9]', 'data.data[10]'])
df_xy.rename({'data.data[0]': 't', 'data.data[9]': 'xg', 'data.data[10]': 'yg'}, axis = 1, inplace = True)
V_xy = Validation(df_xy, data_label = ['xg', 'yg'], param_label = 't', method = 'scale', num = 50, units = 'kpc')

log, table, fig = V_xy.std_test(limits = 1.3, n_std = 1, nan_passed = True, std_area = True, std_lines = False, std_color = 'b')

## Norm limit test

Test whether the parameter norm for each simulation do not exceed the given limit.

### Norm Linf

`norm_type` = 'Linf' - test whether the absolute maximum of the parameter of each simulation is less then the given limits

Let's check that the cluster is not goes too far from the Galactic center. We can query R and z ('data.data[1]' and 'data.data[5]') -  distance from the galactic axis and vertical distance from the galactic plane and d - distance from the Galactic center and check if its maximum value in all simulations do not exceed, for example, 4 kp:

In [None]:
import numpy as np
df_dist = citros.topic('/gal_orbits').data(['data.data[0]', 'data.data[1]', 'data.data[5]'])
df_dist.rename({'data.data[0]': 't', 'data.data[1]': 'R', 'data.data[5]': 'z'}, axis = 1, inplace = True)
df_dist['d'] = np.sqrt(df_dist['R']**2 + df_dist['z']**2)
V_dist = Validation(df_dist, data_label = 'd', param_label = 't', method = 'scale', num = 50, units = 'kpc')

In [None]:
log, table, fig = V_dist.norm_test(norm_type = 'Linf', limits = 4)

Print the report of the test:

In [None]:
log.print()

Display DataFrame table that for each simulation indicates whether it passes the test or not:

In [None]:
print(table.head(5))

### Norm L2

`norm_type` = 'L2' - test whether for each simulation the Euclidean norm (square root of the sum of the squares) does not exceed the given limit.

Using this type of norm, we can determine which simulation deviates the most from the galactic plane. We can do this by examining the 'z' parameter ('data.data[5]') and checking its deviation from z = 0:

In [None]:
df_z = citros.topic('/gal_orbits').data(['data.data[0]', 'data.data[5]'])
df_z.rename({'data.data[0]': 't', 'data.data[5]': 'z'}, axis = 1, inplace = True)
V_z = Validation(df_z, data_label = 'z', param_label = 't', method = 'scale', num = 50, units = 'kpc')
log, table, fig = V_z.norm_test(norm_type = 'L2', limits = 7)

In report we can see that the maximum deviation is in simulation sid = 4:

In [None]:
log.print()

## Set multiple tests

Set the test listed above by set_tests() method.

Pass parameters of the tests as dictionaries with key being the test name ('std_bound', 'mean', 'sid', 'norm_L2', 'norm_Linf').

For example, to set tests for z parameter:

In [None]:
logs, tables, figs = V.set_tests(test_method = 
                                 {'std_bound' : {'limits' : 2, 'n_std': 3},
                                  'norm_Linf' : {'limits' : 2}})

logs, tables, figs are the dictionaries with the corresponding to each test log, table and fig, where key of the dictionary is the name of the test:

In [None]:
logs['std_bound'].print()
print(tables['norm_Linf'])