# AllInOne Files

### Faking nc=0 for some spaces:

Faking is done at the **ens** and **ensAvg** levels (and directories) not at the **segment** and **whole** levles, so the **segment stamps** are not modified byt the **whole** and **ens** ones are modified.

### To-do list
- [x] **space_tseries** fucntion for all the timeseries in one **space**.
- [ ] **space** fucntion for all the distributions in one **space**.
- [ ] **all-in-one** fucntion for all the timeseries in all **space**s in a **project**.
- [ ] **all-in-one** fucntion for all the distributions in all **space**s in a **project**.

### Naming convention:

This is the pattern of file or directory names:

1. **whole** files: whole-group-property_[-measure][-stage][.ext]
2. **ensemble** files: ensemble-group-property_[-measure][-stage][.ext]
3. **ensemble_long** files: ensemble_long-group-property_[-measure][-stage][.ext]
4. **space** files: space-group-property_[-measure][-stage][.ext]
5. **all in one** files: space-group-**species**-**allInOne**-property-_[-measure][-stage][.ext]

[keyword] means that the keyword in the file name is option. [-measure] is a physical measurement such as the auto correlation function (AFC) done on the physical 'property_'.

## CUATION: How to do:

1. Choose the project setting
2. Choose the space
3. Check the unique properties, and the ACF measures per unique properties.
4. Check the name, format, and extension of the Pandas writer.

#### Imports

In [8]:
from glob import glob
from itertools import product
import pandas as pd
import numpy as np
from polyphys.manage import organizer
from polyphys.manage.parser import SumRuleCyl, TransFociCyl, TransFociCubic
from polyphys.analyze import measurer
import polyphys.api as api
import warnings
warnings.filterwarnings('ignore')

# List of physical properties: Set the project hierarchy
#project = 'SumRuleCyl'
project = 'TransFociCyl'
project = 'TransFociCubic'
project_details ={
    'SumRuleCyl':{
        'parser': SumRuleCyl,
        'space_pat': 'N*D*ac*',
        'hierarchy': 'N*',
        'space_hierarchy': 'N*',
        'attributes': ['space', 'ensemble_long', 'ensemble', 'nmon', 'dcyl',
                       'dcrowd','phi_c_bulk'
                      ],
        'time_varying_props': [ 'asphericityTMon', 'fsdTMon', 'gyrTMon',
                               'rfloryTMon','shapeTMon','transSizeTMon'],
        'equil_measures': [np.mean, np.var, measurer.sem],
        'equil_attributes': ['space', 'ensemble_long', 'ensemble', 'nmon',
                             'dcyl','dcrowd', 'phi_c_bulk', 
                             'phi_c_bulk_round'
                            ],
        'equil_properties': ['asphericityMon-mean', 'asphericityMon-var',
                             'asphericityMon-sem', 'fsdMon-mean',
                             'fsdMon-var', 'fsdMon-sem', 'gyrMon-mean',
                             'gyrMon-var', 'gyrMon-sem', 'rfloryMon-mean',
                             'rfloryMon-var', 'rfloryMon-sem',
                             'shapeMon-mean', 'shapeMon-var', 'shapeMon-sem','transSizeMon-mean', 'transSizeMon-var', 'transSizeMon-sem'],
        'rhosPhisNormalizedScaled': [('Mon', 'dmon'), ('Crd', 'dcrowd')]
    },
    'TransFociCyl':{
        'parser': TransFociCyl,
        'space_pat': 'ns*nl*al*D*ac*',
        'hierarchy': 'eps*',
        'space_hierarchy': 'ns*',
        'attributes': ['space', 'ensemble_long', 'ensemble', 'nmon_small',
                       'nmon_large','dmon_large', 'dcyl', 'dcrowd',
                       'phi_c_bulk'
                      ],
        'time_varying_props': ['asphericityTMon', 'fsdTMon', 'gyrTMon',
                               'shapeTMon'],
        'equil_measures': [np.mean, np.var, measurer.sem],
        'equil_attributes': ['ensemble_long', 'ensemble', 'space', 'dcyl',
                             'dmon_large', 'nmon_large', 'nmon_small',
                             'dcrowd', 'phi_c_bulk', 'phi_c_bulk_round'],
        'equil_properties': ['asphericityMon-mean', 'asphericityMon-var',
                             'asphericityMon-sem', 'fsdMon-mean',
                             'fsdMon-var', 'fsdMon-sem', 'gyrMon-mean',
                             'gyrMon-var', 'gyrMon-sem', 'shapeMon-mean',
                             'shapeMon-var', 'shapeMon-sem'],
        'rhosPhisNormalizedScaled': [('Mon', 'dmon_small'), ('Crd', 'dcrowd'), ('Foci', 'dmon_large')]
    },
    'TransFociCubic':{
        'parser': TransFociCubic,
        'space_pat': 'ns*nl*al*ac*',
        'hierarchy': 'ns*',
        'space_hierarchy': 'ns*',
        'attributes': ['space', 'ensemble_long', 'ensemble', 'nmon_small',
                       'nmon_large','dmon_large', 'dcyl', 'dcrowd',
                       'phi_c_bulk'
                      ],
        'time_varying_props': ['asphericityTMon', 'fsdTMon', 'gyrTMon',
                               'shapeTMon'],
        'equil_measures': [np.mean, np.var, measurer.sem],
        'equil_attributes': ['ensemble_long', 'ensemble', 'space', 'dcyl',
                             'dmon_large', 'nmon_large', 'nmon_small',
                             'dcrowd', 'phi_c_bulk', 'phi_c_bulk_round'],
        'equil_properties': ['asphericityMon-mean', 'asphericityMon-var',
                             'asphericityMon-sem', 'fsdMon-mean',
                             'fsdMon-var', 'fsdMon-sem', 'gyrMon-mean',
                             'gyrMon-var', 'gyrMon-sem', 'shapeMon-mean',
                             'shapeMon-var', 'shapeMon-sem'],
        'rhosPhisNormalizedScaled': [('Mon', 'dmon_small'), ('Crd', 'dcrowd'), ('Foci', 'dmon_large')]
    }
}

## allInOne *whole* and *ensAvg* stamps per project

### **ensemble-averaged** stamps per project:

In [4]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
phase="ensAvg"
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
allInOne_stamps = []
for space_db in ens_avg_space_dbs:
    stamp_path = project_details[project]['space_hierarchy'] + 'stamps*' 
    stamp_path = glob(space_db + "/" + stamp_path + '.csv')[0]
    space_stamps = pd.read_csv(stamp_path)
    allInOne_stamps.append(space_stamps)
allInOne_stamps = pd.concat(allInOne_stamps, axis=0)
allInOne_stamps.reset_index(inplace=True, drop=True)
output = analysis_db + "allInOne-" + project + "-stamps-" + phase + ".csv"
allInOne_stamps.to_csv(output, index=False)

### **whole** stamps per project

In [5]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
phase='ens'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
phase_space_dbs = [
    space_db for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
allInOne_stamps = []
for space_db in phase_space_dbs:
    stamp_path = project_details[project]['space_hierarchy'] + 'stamps*'
    stamp_path = glob(space_db + "/" + stamp_path + '.csv')[0]
    space_stamps = pd.read_csv(stamp_path)
    allInOne_stamps.append(space_stamps)
allInOne_stamps = pd.concat(allInOne_stamps, axis=0)
allInOne_stamps.reset_index(inplace=True, drop=True)
output = analysis_db + "allInOne-" + project + "-stamps-" + phase +".csv"
allInOne_stamps.to_csv(output, index=False)

## **ensAvg** timeseries and their associated measures 

### **Measures** of chainsize timeseries properties per space

#### **allInONe** ACFs of the chain-size properties per **space**

In [6]:
%%time
# Wall time: 60 s for TransFoci
# Wall time: 4 min for SumRule
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)
# list of unique property_measures:
filepath = ens_avg_space_dbs[0] + '*' + project_details[project]['hierarchy'] + '.csv'  # physical properties in all the 
_, uniq_props_measures = organizer.unique_property(
    filepath, 2, ["-" + phase], drop_properties=['stamps'])
print(uniq_props_measures)
for ens_avg_space_db in ens_avg_space_dbs:
    ens_avgs = list()
    space = ens_avg_space_db.split('/')[-2].split('-')[0]
    for property_ in uniq_props_measures:
        ens_avg = organizer.space_tseries(
            ens_avg_space_db,
            property_,
            project_details[project]['parser'],
            project_details[project]['hierarchy'],
            project_details[project]['attributes'],
            group,
            geometry,
            is_save = False  # if True, save per property per space
        )
        ens_avgs.append(ens_avg)
    ens_avgs = pd.concat(ens_avgs,axis=1)
    # drop duplicated columns:
    ens_avgs = ens_avgs.loc[:,~ens_avgs.columns.duplicated()]
    output_name = analysis_db +  "-".join(
        [space,  group, "chainSize-acf.parquet.brotli"]
    )
    ens_avgs.to_parquet(output_name, index=False, compression='brotli')

['/Users/amirhsi_mini/research_data/analysis/ns400nl5al5.0ac1.0-bug-ensAvg/']
['asphericityTMon-acf', 'asphericityTMon-acfLowerCi', 'asphericityTMon-acfUpperCi', 'gyrTMon-acf', 'gyrTMon-acfLowerCi', 'gyrTMon-acfUpperCi', 'shapeTMon-acf', 'shapeTMon-acfLowerCi', 'shapeTMon-acfUpperCi']


ValueError: No objects to concatenate

#### **allInOne** the chian-size properties per **space**

In [None]:
%%time
# Wall time: 2 min s for TransFoci
# Wall time: 30 min for SumRule
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)
# list of unique property_measures:
filepath = ens_avg_space_dbs[0] + '*' + project_details[project]['hierarchy'] + '.csv'  # physical properties in all the 
_, uniq_props_measures = organizer.unique_property(
    filepath, 2, ["-" + phase], drop_properties=['stamps'])
props_tseries = list(
    set(
        [prop.split("-acf")[0] for prop in uniq_props_measures]
    )
)
print(props_tseries)
for ens_avg_space_db in ens_avg_space_dbs:
    ens_avgs = list()
    space = ens_avg_space_db.split('/')[-2].split('-')[0]
    for property_ in props_tseries:
        ens_avg = organizer.space_tseries(
            ens_avg_space_db,
            property_,
            project_details[project]['parser'],
            project_details[project]['hierarchy'],
            project_details[project]['attributes'],
            group,
            geometry,
            is_save = False  # if True, save per property per space
        )
        ens_avgs.append(ens_avg)
    ens_avgs = pd.concat(ens_avgs,axis=1)
    # drop duplicated columns:
    ens_avgs = ens_avgs.loc[:,~ens_avgs.columns.duplicated()]
    output_name = analysis_db +  "-".join(
        [space,  group, "chainSize.parquet.brotli"]
    )
    ens_avgs.to_parquet(output_name, index=False, compression='brotli')

### Pair distance time-series per project: For TransFoci project

In [None]:
%%time
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)
tseries_foci_props = ['pairDistTFoci']
project_ens_avgs = []
for prop in tseries_foci_props:
    prop_ens_avgs = list()
    for ens_avg_space_db in ens_avg_space_dbs:
        space = ens_avg_space_db.split('/')[-2].split('-')[0]
        ens_avg = organizer.space_tseries(
            ens_avg_space_db,
            prop,
            project_details[project]['parser'],
            project_details[project]['hierarchy'],
            project_details[project]['attributes'],
            group,
            geometry,
            is_save = False  # if True, save per property per space
        )
        prop_ens_avgs.append(ens_avg)
    prop_ens_avgs = pd.concat(prop_ens_avgs,axis=0)
    # drop duplicated columns:
    prop_ens_avgs = prop_ens_avgs.loc[:, ~prop_ens_avgs.columns.duplicated()]
    prop_ens_avgs.reset_index(inplace=True, drop=True)
    project_ens_avgs.append(prop_ens_avgs)
project_ens_avgs = pd.concat(project_ens_avgs,axis=1)
project_ens_avgs = \
    project_ens_avgs.loc[:, ~project_ens_avgs.columns.duplicated()]
project_ens_avgs.reset_index(inplace=True, drop=True)
output ='-'.join(['allInOne', project, group, 'pairDistT.parquet.brotli'])
output = analysis_db + output
project_ens_avgs.to_parquet(output, index=False, compression='brotli')

## Equilibrium timeseries properties per space AND per project

### Whole quilibrium properties allInOne

In [9]:
%%time
# Wall time: 23 s for TransFoci
# Wall time: 10 min s for SumRule
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
spaces = glob(analysis_db + project_details[project]['space_pat'])
spaces = list(set([space.split('/')[-1].split('-')[0] for space in spaces]))
save_space = True
equili_props_wholes = api.allInOne_equil_tseries(
    project,
    analysis_db,
    group,
    spaces,
    project_details[project]['time_varying_props'],
    project_details[project]['equil_measures'],
    save_space=save_space,
    divisor = 0.025,
    round_to = 3,
    save_to=analysis_db
)

ens_avg = api.allInOne_equil_tseries_ensAvg(
    project,
    equili_props_wholes,
    group,
    project_details[project]['equil_properties'],
    project_details[project]['equil_attributes'],
    save_to=analysis_db
)

ValueError: No objects to concatenate

## Distributions

### Clusters and bonds per project

- The histograms of **Clusters** and **bonds** can **not** be combined in **one** dataset.
- Since **per project** datasets are small, we create **one** per project dataset for each property.

In [None]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)

nmon_large = 5
hist_t_foci_bin_centers = {
   'bondsHistFoci': np.arange(nmon_large),
   'clustersHistFoci': np.arange(1, nmon_large + 1)
}
# Separate dataset for bonds and clusters per 
for prop, bin_center in hist_t_foci_bin_centers.items():
    ens_avgs = list()
    for ens_avg_space_db in ens_avg_space_dbs:
        space = ens_avg_space_db.split('/')[-2].split('-')[0]
        ens_avg = organizer.space_hists(
            ens_avg_space_db,
            prop,
            project_details[project]['parser'],
            project_details[project]['hierarchy'],
            project_details[project]['attributes'],
            group,
            geometry,
            bin_center=bin_center,
            normalize=True,
            is_save = False
        )
        ens_avgs.append(ens_avg)
    ens_avgs = pd.concat(ens_avgs,axis=0)
    # drop duplicated columns:
    ens_avgs = ens_avgs.loc[:, ~ens_avgs.columns.duplicated()]
    ens_avgs.reset_index(inplace=True, drop=True)
    output =  "-".join(['allInOne', project, group, prop + ".parquet.brotli"])
    output = analysis_db + output
    ens_avgs.to_parquet(output, index=False, compression='brotli')

### Pair Distance Statisitcs per project

- These **properties** can be **combined** in one file per project.
- Since **per project** datasets are small, we create **one** per project dataset for **all** properties.

In [None]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)

hist_foci_props = ['pairDistHistFoci', 'pairDistRdfFoci']
# One per-project database for both rpeorty since they are samll and related 
project_ens_avgs = []
for prop in hist_foci_props:
    prop_ens_avgs = list()
    for ens_avg_space_db in ens_avg_space_dbs:
        space = ens_avg_space_db.split('/')[-2].split('-')[0]
        ens_avg = organizer.space_hists(
            ens_avg_space_db,
            prop,
            project_details[project]['parser'],
            project_details[project]['hierarchy'],
            project_details[project]['attributes'],
            group,
            geometry,
            bin_center=None,
            normalize=False,
            is_save=False
        )
        prop_ens_avgs.append(ens_avg)
    prop_ens_avgs = pd.concat(prop_ens_avgs,axis=0)
    # drop duplicated columns:
    prop_ens_avgs = prop_ens_avgs.loc[:, ~prop_ens_avgs.columns.duplicated()]
    prop_ens_avgs.reset_index(inplace=True, drop=True)
    project_ens_avgs.append(prop_ens_avgs)
project_ens_avgs = pd.concat(project_ens_avgs,axis=1)
# drop duplicated columns:
project_ens_avgs = project_ens_avgs.loc[:, ~project_ens_avgs.columns.duplicated()]
project_ens_avgs.reset_index(inplace=True, drop=True)
output = '-'.join(
    ['allInOne', project, group, 'pairDistStats.parquet.brotli']
)
output = analysis_db + output
project_ens_avgs.to_parquet(output, index=False, compression='brotli')


### Local Spatial distributions

#### allInOne Local Distributions: ensAvg of Hists, Rhos, Phis with var and sem per project: Do not need to run this as the information already exist in the "allIneOne Sum-Rule" section

In [None]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'all'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)
# list of unique properties and property_measures:
# Local distributions do not have any property_measures:
uniq_props, _ = organizer.unique_property(
    ens_avg_space_dbs[0] + '*' + \
        project_details[project]['hierarchy'] + '.csv',
    2,
    ["-" + phase],
    drop_properties=["stamps"])
print(uniq_props)

In [None]:
directions = ['theta', 'z', 'r']
for direction in directions:
    props_by_dir = [prop for prop in uniq_props if prop.startswith(direction)]
    dir_ens_avgs = list()
    for prop in props_by_dir:
        prop_ens_avgs = list()
        for ens_avg_space_db in ens_avg_space_dbs:
            space = ens_avg_space_db.split('/')[-2].split('-')[0]
            ens_avg = organizer.space_hists(
                ens_avg_space_db,
                prop,
                project_details[project]['parser'],
                project_details[project]['hierarchy'],
                project_details[project]['attributes'],
                group,
                geometry,
                normalize=True,
                is_save=False
            )
            prop_ens_avgs.append(ens_avg)
        prop_ens_avgs = pd.concat(prop_ens_avgs,axis=0)
        # drop duplicated columns:
        prop_ens_avgs = \
            prop_ens_avgs.loc[:, ~prop_ens_avgs.columns.duplicated()]
        prop_ens_avgs.reset_index(inplace=True, drop=True)
        dir_ens_avgs.append(prop_ens_avgs)
    dir_ens_avgs = pd.concat(dir_ens_avgs,axis=1)
        # drop duplicated columns:
    dir_ens_avgs = dir_ens_avgs.loc[:, ~dir_ens_avgs.columns.duplicated()]
    dir_ens_avgs.reset_index(inplace=True, drop=True)
    output = analysis_db +  "-".join([
        'allInOne', project,  group,  direction + "LocalDist.parquet.brotli"
    ])
    dir_ens_avgs.to_parquet(output, index=False, compression='brotli')

#### allInONe Sum-Rule: 

In [None]:
analysis_db = '/Users/amirhsi_mini/research_data/analysis/'
group = 'all'
geometry = 'biaxial'
phase = 'ensAvg'
space_dbs = glob(analysis_db + project_details[project]['space_pat'])
ens_avg_space_dbs = [
    space_db + "/" for space_db in space_dbs if space_db.endswith(
        group + '-' + phase
    )
]
print(ens_avg_space_dbs)
species_dict = project_details[project]['rhosPhisNormalizedScaled']
print('species_dict: ', species_dict)
directions = ['r', 'z']
props= ['Rho', 'Phi']
dir_prop_pairs = list(product(props, directions))
print('dir_prop_pairs: ', dir_prop_pairs)

In [None]:
for (prop, direction) in dir_prop_pairs:
    all_in_one = list()
    for (species, size_attr) in species_dict:
        per_species = list()
        for ens_avg_space_db in ens_avg_space_dbs:
            space = ens_avg_space_db.split('/')[-2].split('-')[0]
            per_space = organizer.space_sum_rule(
                ens_avg_space_db,
                prop,
                project_details[project]['parser'],
                project_details[project]['hierarchy'],
                project_details[project]['attributes'],
                species,
                size_attr,
                group,
                geometry,
                direction,
                is_save=False
            )
            per_species.append(per_space)
        per_species = pd.concat(per_species,axis=0)
        per_species = per_species.loc[:, ~per_species.columns.duplicated()]
        per_species.reset_index(inplace=True, drop=True)
        all_in_one.append(per_species)
    all_in_one = pd.concat(all_in_one,axis=1)
    all_in_one = all_in_one.loc[:, ~all_in_one.columns.duplicated()]
    all_in_one.reset_index(inplace=True, drop=True)
    output = '-'.join(['allInOne', project, group, direction + prop])
    output += '-NormalizedScaled.parquet.brotli'
    output = analysis_db + output
    all_in_one.to_parquet(output, index=False, compression='brotli')