# AllInOne Files

### Faking nc=0 for some spaces:

Faking is done at the **ens** and **ensAvg** levels (and directories) not at the **segment** and **whole** levles, so the **segment stamps** are not modified byt the **whole** and **ens** ones are modified.

### To-do list
- [x] **space_tseries** fucntion for all the timeseries in one **space**.
- [ ] **space** fucntion for all the distributions in one **space**.
- [ ] **all-in-one** fucntion for all the timeseries in all **space**s in a **project**.
- [ ] **all-in-one** fucntion for all the distributions in all **space**s in a **project**.

### Naming convention:

This is the pattern of file or directory names:

1. **whole** files: whole-group-property_[-measure][-stage][.ext]
2. **ensemble** files: ensemble-group-property_[-measure][-stage][.ext]
3. **ensemble_long** files: ensemble_long-group-property_[-measure][-stage][.ext]
4. **space** files: space-group-property_[-measure][-stage][.ext]
5. **all in one** files: space-group-**species**-**allInOne**-property-_[-measure][-stage][.ext]

[keyword] means that the keyword in the file name is option. [-measure] is a physical measurement such as the auto correlation function (AFC) done on the physical 'property_'.

## CUATION: How to do:

1. Choose the project setting
2. Choose the space
3. Check the unique properties, and the ACF measures per unique properties.
4. Check the name, format, and extension of the Pandas writer.

#### Imports

In [None]:
# settings for testing and running on a PC.
from glob import glob
import pathlib
import pandas as pd
import numpy as np
import re
from polyphys.manage import organizer
from polyphys.manage.parser import SumRule, TransFoci
from polyphys.analyze import analyzer
from polyphys.analyze import measurer
import warnings
warnings.filterwarnings("ignore")

### List of physical properties: Set the project hierarchy

In [None]:
# list of unique property_measures:
#project_name = "TransFoci" 
#project_parser=TransFoci
#project_hierarchy = "ns*"
#project_attrs = ['space', 'ensemble_long', 'ensemble', 'nmon_small', 'nmon_large','dmon_large', 'dcyl', 'dcrowd', 'phi_c_bulk']

project_name = "SumRule"
project_parser = SumRule
project_hierarchy = "N*"
project_attrs = ['space', 'ensemble_long', 'ensemble', 'nmon', 'dcyl', 'dcrowd','phi_c_bulk']

species = 'Mon'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'

#space = "ns400nl5al5D20ac1"
#space = 'N2000D30.0ac4.0'
space = 'N2000D30.0ac6.0'
analysis_database = '/Users/amirhsi_mini/research_data/analysis/'  # path to the "analysis" phase directory
space_ensAvg_path = analysis_database + space + "-" + group + "-" + phase
filepathes = analysis_database + "/" + project_hierarchy + "-ensAvg" + "/" + project_hierarchy + ".csv"  # physical properties in all the 
uniq_props, uniq_props_stats = organizer.unique_property(filepathes, 2, "-" + phase, drop_properties=["stamps"])
print(uniq_props, uniq_props_stats)

## allInONe ensAvg stamps

In [None]:
spaces_stamps = glob(analysis_database + "/" + project_hierarchy + "-ensAvg/" + project_hierarchy + "-stamps-ensAvg.csv")
allInOne_stamps = []
for space_path in spaces_stamps:
    space_stamps = pd.read_csv(space_path)
    allInOne_stamps.append(space_stamps)
allInOne_stamps = pd.concat(allInOne_stamps, axis=0)
allInOne_stamps.to_csv(analysis_database + "allInOne-" + project_name + "-stamps-ensAvg.csv", index=False)

## timeseries and their associated measures: **space** files and **allInOne** files

### **Measures** of properties

#### settings per project:

In [None]:
# allInOne timeseries for chain-size statistics
#project_name = "TransFoci" 
#project_parser=TransFoci
#project_hierarchy = "/ns*"
#project_attrs_all_in_one = ['space', 'ensemble_long', 'ensemble', 'nmon_small', 'nmon_large','dmon_large', 'dcyl', 'dcrowd', 'phi_c_bulk'] # for merging several spaces into one
#project_attrs_per_space = ['ensemble_long', 'phi_c_bulk'] # for merginsgseveral ensAvgs of one space


project_name = "SumRule"
project_parser = SumRule
project_hierarchy = "/N*"
project_attrs_all_in_one = ['space', 'ensemble_long', 'ensemble', 'nmon', 'dcyl', 'dcrowd','phi_c_bulk'] # for merging several spaces into one
#project_attrs_per_space = ['ensemble_long', 'phi_c_bulk'] # for merging several ensAvgs of one space

species = 'Mon'
group = 'bug'
geometry = 'biaxial'
phase = 'ensAvg'

#space = "ns400nl5al5D20ac1"
#space = 'N2000D30.0ac4.0'
space = 'N2000D30.0ac6.0'

analysis_database = '/Users/amirhsi_mini/research_data/analysis/'  # path to 
space_ensAvg_path = analysis_database + space + "-" + group + "-" + phase

#### ACF of chain-size timeseires per project

In [None]:
# separating property_measures of theses two kinds: timeseries and timesseries acfs:
bug_properties_acfs = list()
for property_measure in uniq_props_stats:
    if "-acf" in property_measure:
        bug_properties_acfs.append(property_measure)
bug_properties_acfs.sort()
print(bug_properties_acfs)

##### Generating the allInOne ACF file

In [None]:
%%time
# 10 sec for N2000.. with nlags=20000
ensAvgs = list()
for property_ in bug_properties_acfs:
    ensAvg = organizer.space_tseries(
        space_ensAvg_path,
        property_,
        project_parser,
        project_hierarchy,
        project_attrs_all_in_one,
        species,
        group,
        geometry,
        is_save = False
    )
    ensAvgs.append(ensAvg)
ensAvgs = pd.concat(ensAvgs,axis=1)
# drop duplicated columns:
ensAvgs = ensAvgs.loc[:,~ensAvgs.columns.duplicated()]
output_name = analysis_database +  "-".join([space,  group,  species, "allInOne", "chainSize-acf.parquet.brotli"])
#ensAvgs.to_csv(output_name, index=False)
ensAvgs.to_parquet(output_name, index=False, compression='brotli')

#### Properties time series per project

In [None]:
%%time
# 40 sec N2000
ensAvgs = list()
for property_ in uniq_props:
    ensAvg = organizer.space_tseries(
        space_ensAvg_path,
        property_,
        project_parser,
        project_hierarchy,
        project_attrs_all_in_one,
        species,
        group,
        geometry,
        is_save = False
    )
    ensAvgs.append(ensAvg)
ensAvgs = pd.concat(ensAvgs,axis=1)
# drop duplicated columns:
ensAvgs = ensAvgs.loc[:,~ensAvgs.columns.duplicated()]
output_name = analysis_database +  "-".join([space,  group,  species, "allInOne", "chainSize.parquet.brotli"])
#ensAvgs.to_csv(output_name, index=False)
ensAvgs.to_parquet(output_name, index=False, compression='brotli')

###### dask version: memeory leak issue

In [None]:
# parallel version has memory leak issue.
%%time
# This has memory leaking issue
group = 'bug'
geometry = 'biaxial'
ensAvg_path = "/Users/amirhsi_mini/analysis/N2000D30.0ac4.0-bug-ensAvg"
all_in_one_computed = []
for property_measure in bug_property_measures:
    all_in_one_delayed = delayed(organizer.all_in_one_tseries)(
        ensAvg_path,
        property_measure,
        group = group,
        geometry = geometry,
        save_to = database
    )
    all_in_one_computed.append(all_in_one_delayed)
_ = compute(all_in_one_computed)

## Equilibrium timeseries properties per space AND per project

#### Imports:

In [None]:
# settings for testing and running on a PC.
from glob import glob
import pandas as pd
import numpy as np
import polyphys.api as api
from polyphys.analyze import analyzer
from polyphys.analyze import measurer
import warnings
warnings.filterwarnings("ignore")

#### Whole quilibrium properties allInOne

In [None]:
#%%time
# loading databases:
database = '/Users/amirhsi_mini/research_data/analysis/'

project = "SumRule"
#project = "TransFoci"

project_dict = {
    "SumRule": {
        'spaces': ["N2000D30.0ac4.0","N2000D30.0ac6.0"],
        'time_varying_props': [ 'asphericityTMon', 'fsdTMon', 'gyrTMon',
                               'rfloryTMon','shapeTMon'],
        'measures': [np.mean, np.var, measurer.sem],
        'attributes': ['space', 'ensemble_long', 'ensemble', 'nmon', 'dcyl',
                       'dcrowd', 'phi_c_bulk', 'phi_c_bulk_round'],
        'properties': ['asphericityMon-mean', 'asphericityMon-var',
                        'asphericityMon-sem', 'fsdMon-mean', 'fsdMon-var',
                        'fsdMon-sem', 'gyrMon-mean', 'gyrMon-var',
                        'gyrMon-sem', 'rfloryMon-mean', 'rfloryMon-var',
                        'rfloryMon-sem', 'shapeMon-mean', 'shapeMon-var',
                        'shapeMon-sem']
    },
    "TransFoci": {
        'spaces': ["ns400nl5al5D20ac1"],
        'time_varying_props': ['asphericityTMon', 'fsdTMon', 'gyrTMon',
                               'shapeTMon'],
        'measures': [np.mean, np.var, measurer.sem],
        'attributes': ['ensemble_long', 'ensemble', 'space', 'dcyl',
                       'dmon_large', 'nmon_large', 'nmon_small', 'dcrowd',
                       'phi_c_bulk', 'phi_c_bulk_round'],
        'properties': ['asphericityMon-mean', 'asphericityMon-var',
                       'asphericityMon-sem', 'fsdMon-mean', 'fsdMon-var',
                       'fsdMon-sem', 'gyrMon-mean', 'gyrMon-var',
                       'gyrMon-sem', 'shapeMon-mean', 'shapeMon-var',
                       'shapeMon-sem']
    }
}

group = "bug"
species = "Mon"
save_to = "./"
save_space = True

equili_props_wholes = api.allInOne_equil_tseries(
    project,
    database,
    species,
    group,
    project_dict[project]['spaces'],
    project_dict[project]['time_varying_props'],
    project_dict[project]['measures'],
    save_space=save_space,
    round_to = 0.025,
    save_to=save_to
)

esnAvg = api.allInOne_equil_tseries_ensAvg(
    project,
    equili_props_wholes,
    species,
    group,
    project_dict[project]['properties'],
    project_dict[project]['attributes'],
    save_to =save_to
)

## Distributions: Not down yet

In [None]:
#hist_paths = glob('/Users/amirhsi_mini/probe/N500D10.0ac0.8-segment/N500epsilon5.0r5.5lz205.5sig0.8nc12012dt0.002bdump1000adump5000ens1/N500epsilon5.0r5.5lz205.5sig0.8nc12012dt0.002bdump1000adump5000ens1*')
hist_paths = glob('/Users/amirhsi_mini/probe/N500D10.0ac0.8-segment/N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1/N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1*')
species = 'Crd'
direction = 'z'
geometry='biaxial'
group='all'
segments = organizer.sort_filenames(
                hist_paths,
                fmts=['-' + direction + 'Hist' + species + '.npy']
            )
edge_segments = organizer.sort_filenames(
                hist_paths,
                fmts=['-' + direction + 'Edge' + species + '.npy']
            )
wholes = organizer.whole(
                direction + 'Hist' + species,
                segments,
                geometry=geometry,
                group=group,
                relation='histogram',
                save_to=None
            )
edge_wholes = organizer.whole(
                direction + 'Edge' + species,
                edge_segments,
                geometry=geometry,
                group=group,
                relation='bin_edge',
                save_to=None
            )
            # 'whole' dataframes, each with a 'whole' columns.
rho_wholes, phi_wholes = distributions.distributions_generator(
                wholes,
                edge_wholes,
                group,
                species,
                geometry,
                direction,
                save_to=None,
normalized=True)

In [None]:
plt.hist(edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'][:-1],edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'],weights=wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'],histtype='step',density=True)
plt.show()

In [None]:
sns.histplot(edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'][:-1],bins=edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'],weights=wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'])
plt.show()

In [None]:
fig, axes = plt.subplots(nrows=1,ncols=1,sharex=True,figsize=(8,6))
centers = 0.5*(edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'][:-1]+edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'][1:])
hist_df = pd.DataFrame(wholes)
rho_df = pd.DataFrame(rho_wholes)
phi_df = pd.DataFrame(phi_wholes)
df = pd.concat([hist_df,rho_df,phi_df],axis=1)
df.columns = ['histogram','number_density','volume_fraction']
df['center'] = centers
#df['histogram'] = df['histogram'] / df['histogram'].sum()
df['fake']= 1
#df.set_index('center',inplace=True)
#sns.histplot(x='center',bins=edge_wholes['N500epsilon5.0r5.5lz205.5sig0.8nc48047dt0.002bdump1000adump5000ens1'] ,weights='volume_fraction',data=df,element='poly',fill=False, kde=True)
#plt.show()
#df['histogram'].plot(ax=axes,ylabel='histogram')
#sns.set_theme(style="whitegrid")
#sns.set(font_scale=1.2)
sns.axes_style("darkgrid")
sns.lineplot(x='center',y='histogram', data=df,ax=axes)
#df.loc[-200:200,'number_density'].plot(ax=axes[1],ylabel='number_density')
#df.loc[-200:200,'volume_fraction'].plot(ax=axes[2],ylabel='volume_fraction',xlabel='center')
#axes.grid()
#axes.set_xlim(df.index[0]-5, df.index[-1]+5)
#axes.axvline(df.loc[df.index[0],'center'],lw=0.5,c='red',label='left end')
#axes.axvline(df.loc[df.index[-1],'center'],lw=0.5,c='green',label='right end')
#axes.axvline(df['center'],lw=0.5,c='red')
axes.set_xlabel('z (a.u.)')
axes.set_ylabel('Freqency of type-1 particles')
#ax.set_xlim[]
plt.savefig('histogram.pdf',dpi=200)

In [None]:

name = 'N500epsilon5.0r5.5lz205.5sig0.8nc36036dt0.002bdump1000adump5000ens1'
hist_info = SumRule(name, geometry='biaxial', group='all', lineage='whole')
dist_new = distributions.Distribution(
    wholes[name],
    edges[name],
    hist_info,
    'dcrowd',
    geometry='biaxial',
    direction='z',
    normalized=False)