<h1 class='tocIgnore'>ACDC 2019 Naturalist: Distance Sampling analyses with pyaudisam</h1>

(on a reduced data sample)

Please read first [how-it-works-fr.md](../how-it-works/how-it-works-fr.md)

<!-- Auto table of contents -->
<div style="overflow-y: auto">
  <h2 class='tocIgnore'>Table of content</h2>
  <div id="toc"></div>
</div>

In [None]:
%%javascript
// Generate TOC in above cell
var maxlevel = 3;
$.getScript('ipython_notebook_toc.js', function() {createTOC(maxlevel);})

In [None]:
%%html
<!- Left align markdown tables in cells -->
<style> table { display: inline-block }</style>

Note: This notebook was developped and tested under the following platform:

|            |                 |
|:-----------|:-----------------|
| os         | Windows Enterprise 10.0.19044 (64bit) |
| processor  | Intel64 Family 6 Model 165 Stepping 2, GenuineIntel, 12 CPUs |
| python     | cpython (win32) R3.8.15, packaged by conda-forge, (default, Nov 22 2022, 08:43:00), MSC v.1929 64 bits (AMD64) |
| numpy      | 1.23.2 |
| pandas     | 1.2.5 |
| zoopt      | 0.4.0 |
| matplotlib | 3.4.2 |
| jinja2     | 3.0.1 |
| pyaudisam  | 1.0.0 |
| DS engine  | C:/PortableApps/Distance 7.3/MCDS.exe |

# Imports

In [None]:
import sys
import os
import shutil
import pathlib as pl
import re
import datetime as dt

import pandas as pd
import pandas.api.types as pdt
import numpy as np

from IPython.display import HTML

In [None]:
# Comment-out to use site-packages installed pyaudisam version.
sys.path.insert(0, '../..')  # Or not ... to use local source / dev. package one

In [None]:
import pyaudisam as ads

print('pyaudisam', ads.__version__, 'from', pl.Path(ads.__path__[0]).resolve().as_posix())

ads.runtime

# Commons

In [None]:
# Create temporary directory if not yet done.
tmpDir = pl.Path('../../tmp')
tmpDir.mkdir(exist_ok=True)

In [None]:
# Logging configuration.
ads.log.configure(handlers=[sys.stdout, tmpDir / 'acdc2019.log'], reset=True,
                  loggers=[dict(name='matplotlib', level=ads.WARNING),
                           dict(name='ads', level=ads.INFO2),
                           #dict(name='ads.eng', level=ads.INFO),
                           #dict(name='ads.exr', level=ads.INFO),
                           #dict(name='ads.anr', level=ads.DEBUG1),
                           #dict(name='ads.onr', level=ads.DEBUG),
                           dict(name='acdc2019', level=ads.DEBUG)])

logger = ads.logger('acdc2019')

In [None]:
def backup(fpn, to='.', tsFmt='.%y%m%d-%H%M%S'):
    """Backup given file to target folder with custom-formatted timestamp in name"""
    fpn = pl.Path(fpn)
    tn = fpn.stem + pd.Timestamp.now().strftime(tsFmt) + fpn.suffix
    tp = pl.Path(to) if to != '.' else fpn.parent
    logger.info('Backing up to ' + (tp / tn).as_posix())
    shutil.copy(fpn, tp / tn)

# I. Define study parameters

Note: You can run this cell whenever you have changed something in `acdc-2019-ds-params.py`, no need to restart the kernel.

In [None]:
# Load parameter file (the same that is usable through pyaudisam command line)
parFile, pars = ads.loadPythonData('./acdc-2019-nat-ds-params.py')
assert pars, f'Failed to load parameter file {parFile.as_posix()}'

pars  # A types.SimpleNamespace instance (use dot = '.' to access to parameters by name)

# II. Load individualised observations

In [None]:
with pd.ExcelFile(pars.surveyDataFile) as xlsFile:
    dfIndDistObs = pd.read_excel(xlsFile, sheet_name=pars.indivDistDataSheet)
    dfTransects = pd.read_excel(xlsFile, sheet_name=pars.transectsDataSheet)

print(dict(study=pars.studyName+pars.subStudyName, observations=len(dfIndDistObs), transects=len(dfTransects)))

In [None]:
dfTransects

In [None]:
dfIndDistObs

Now, you can go straight to:
* [V. Automated pre-analyses / 1b. Or: Load pre-analysis results from a previous session](#1b.-Or%3A-Load-pre-analysis-results-from-a-previous-session),
* [VI. Automated (opt-)analyses](#VI.-Automated-(opt-)analyses),
* [VI. Automated (opt-)analyses / 2b. Or: Load (opt-)analyses results from a previous session](#2b.-Or%3A-Load--(opt-)analyses-results-from-a-previous-session),
* [Appendix. Sumup and stats for indivisualised sightings](#Appendix.-Sumup-and-stats-for-individualised-sightings),

or simply to next cell.

# III. Selection of samples for Distance export and pre-analyses

## 1. Field data examination

In [None]:
dfIndDistObs.head()

In [None]:
# Number of individuals per species, in order to decide which we'll DS-analyse
if pars.clustering: # Clustering of individuals (1 observation may be about multiple individuals)
    
    # Warning: Not tested yet.
    dfNObsCatIndiv = dfIndDistObs[pars.sampleSelCols + ['Nombre']].groupby(pars.sampleSelCols).sum()
    dfNObsCatIndiv.rename(columns=dict(Nombre='Individus'), inplace=True)
    
else: # Individuals taken 1 by 1.
    
    dfNObsCatIndiv = dfIndDistObs[pars.sampleSelCols + [pars.distanceCol]].groupby(pars.sampleSelCols).count()
    dfNObsCatIndiv.rename(columns=dict(Distance='Individus'), inplace=True)

dfNObsCatIndiv.reset_index(inplace=True)

dfNObsCatIndiv.sort_values(by='Individus', ascending=False).head(15)

## 2. Specify samples to analyse: select variants for all category column

* "Espèce" = Species,
* "Passage" = Visits (for inventory) on transect points (a = early speing, b = late spring, a+b = both),
* "Durée" = Inventory duration (5mn or 10mn),
* "Adulte" = Population type (m = adult males only, a = other adults, unsexed males or females)

**Warning** : the "Passage" column must be there for pre-analyses and analyses, even if there's only one pass.

Some other english translations ;-)
* "Indivu" = individual = 1 bird
* 

### a. First possible method: Specify variants through a dict

#### Initialisation

In [None]:
sampleSpecs = dict()  # Key = Category names

#### Some stats to help decide: Number of individuals per inventory duration for each species

In [None]:
dfNObsIndiv = dfNObsCatIndiv[['Espèce', 'Durée', 'Individus']].groupby(['Espèce', 'Durée']).sum().unstack() \
                .sort_values(by=('Individus', '10mn'), ascending=False)
dfNObsIndiv

#### Or : Species with at least N individuals observed over 10mn field inventories

In [None]:
nMinTotIndivs = 40

In [None]:
sampleSpecs[pars.speciesCol] = list(dfNObsIndiv[dfNObsIndiv[('Individus', '10mn')] >= nMinTotIndivs].index)
print(len(sampleSpecs[pars.speciesCol]), ', '.join(sampleSpecs[pars.speciesCol]))

sampleSpecs[pars.passIdCol] = [''] # All passes together

#### Or :The N most numerous species

In [None]:
nMostNumerous = 3

sampleSpecs[pars.speciesCol] = list(dfNObsIndiv[('Individus', '10mn')].index[:nMostNumerous])
print(len(sampleSpecs[pars.speciesCol]), ', '.join(sampleSpecs[pars.speciesCol]))

#### Some other stats to help decide: Number of males per inventory duration for each species

In [None]:
dfNObsMale = dfNObsCatIndiv.loc[dfNObsCatIndiv['Adulte'] == 'm', ['Espèce', 'Durée', 'Individus']] \
                .groupby(['Espèce', 'Durée']).sum().unstack() \
                .sort_values(by=('Individus', '10mn'), ascending=False)
dfNObsMale

#### Or : The species with at least N males observed during the 10mn field inventories

In [None]:
nMinMal10 = 35

sampleSpecs[pars.speciesCol] = list(dfNObsMale[dfNObsMale[('Individus', '10mn')] >= nMinMal10].index)
print(', '.join(sampleSpecs[pars.speciesCol]), '=>', len(sampleSpecs[pars.speciesCol]), 'species')

#### Or: Explicit list of species

In [None]:
sampleSpecs[pars.speciesCol] = ['Sylvia atricapilla', 'Prunella modularis', 'Phylloscopus bonelli', 'Oriolus oriolus']
print(len(sampleSpecs[pars.speciesCol]), 'species =>', ', '.join(sampleSpecs[pars.speciesCol]))

#### Mandatory: Add variant specs for pass, duration and population type categories

(examples)

In [None]:
sampleSpecs[pars.passIdCol] = ['b', 'a+b'] # Passes b or a+b => 2 variants

In [None]:
sampleSpecs['Durée'] = ['5mn', '10mn'] # 5 first mn, or all 10 mn => 2 variants

In [None]:
sampleSpecs['Adulte'] = ['m', 'm+a'] # Males, and then males + other adults => 2 variants

#### Finally: All of the above variant specs are implicit ones

... so, let's assert it !

In [None]:
sampleSpecs = dict(_impl=sampleSpecs)

### b. Second possible method: Specify variants through a workbook

(with possibly multiple sheets, each possibly being of 'implict' kind, or of 'explicit' kind)

TODO: Write a documentation for the way the parser for that all works ...

In [None]:
sampleSpecs = pars.sampleSpecFile

Now, you can go straight to:
* [IV. Export data for manual analyses through Distance software GUI](#IV.-Export-data-for-manual-analyses-through-Distance-software-GUI),
* [V. Automated pre-analyses](#V.-Automated-pre-analyses),
* [VI. Automated (opt-)analyses](#VI.-Automated-(opt-)analyses),

or simply to next cell.

## 3.Option:  For info, some stats about thus specified samples

In [None]:
# Once explicitated, here are all the selected variants ...
dfExplSampleSpecs = ads.DSAnalyser.explicitVariantSpecs(sampleSpecs)
dfExplSampleSpecs

In [None]:
# Max distance and nb of individuals for each sample, in the analysis order
# a. Real sample selection columns (some might be empty)
dfSampleOrd = dfExplSampleSpecs.dropna(axis='columns', how='all')
indexCols = [col for col in pars.sampleSelCols if col in dfSampleOrd.columns]

# b. Order of species in sample list
dfSampleOrd = dfSampleOrd.drop_duplicates(subset=[pars.speciesCol])[[pars.speciesCol]].reset_index(drop=True)
dfSampleOrd = dfSampleOrd.reset_index(drop=False).rename(columns=dict(index='order'))
dfSampleOrd = dfSampleOrd.set_index(pars.speciesCol)

# c. Max distances for each unique value of each category (ex: Adulte => m, a)
dfSampleStats = dfIndDistObs[indexCols + [pars.distanceCol]].groupby(indexCols).agg(['min', 'max', 'count'])
dfSampleStats.columns = ['Min Distance', 'Max Distance', 'NTot Obs']
dfSampleStats = dfSampleStats.reset_index()

# d. Categories for combination + simple sorting
speciesOrder = list(dfSampleOrd.index) + [e for e in dfSampleStats[pars.speciesCol].unique() if e not in dfSampleOrd.index]

dCategories = {pars.speciesCol: speciesOrder, pars.passIdCol: ['a', 'b', 'a+b'],
               'Adulte': ['m', 'a', 'm+a'], 'Durée': ['5mn', '10mn']}
dCategoryTypes = { cat: pdt.CategoricalDtype(categories=values, ordered=True) for cat, values in dCategories.items()}

for col in indexCols:
    dfSampleStats[col] = dfSampleStats[col].astype(dCategoryTypes[col])

# e. Max distances for combined values of non-nested categories (ex: Adulte => m+a)
allNestedCategories = {'Durée': '10mn'}
nonSpeciesSampleSelCols = [col for col in pars.sampleSelCols if col != pars.speciesCol]
cols2OrCombine = [col for col in nonSpeciesSampleSelCols if col in indexCols and col not in allNestedCategories]
for col2OrComb in cols2OrCombine:
    indexNoCol2CombCols = [col for col in indexCols if col != col2OrComb]
    dfSampleStatsOrComb = \
        dfSampleStats[indexNoCol2CombCols + ['Min Distance', 'Max Distance', 'NTot Obs']].groupby(indexNoCol2CombCols) \
            .agg({'Min Distance': 'min', 'Max Distance': 'max', 'NTot Obs': 'sum'})
    dfSampleStatsOrComb.columns = ['Min Distance', 'Max Distance', 'NTot Obs']
    dfSampleStatsOrComb = dfSampleStatsOrComb.dropna().reset_index()  # Why Nans appear in index ? A mystery !
    dfSampleStatsOrComb[col2OrComb] = '+'.join(dfSampleStats[col2OrComb].sort_values().unique())
    dfSampleStatsOrComb[col2OrComb] = dfSampleStatsOrComb[col2OrComb].astype(dCategoryTypes[col2OrComb])
    dfSampleStats = dfSampleStats.append(dfSampleStatsOrComb, ignore_index=True)
    
# d. Tri dans l'ordre des espèces, et des autres colonnes de sélection d'échantillon    
dfSampleStats.sort_values(by=indexCols, inplace=True)  # Magic ! (thanks to CategoricalDtype)

dfSampleStats.reset_index(inplace=True, drop=True)

dfSampleStats

In [None]:
# Save these stats (not needed for computations below, though).
fpn = tmpDir / f'{pars.studyName}{pars.subStudyName}-SampleStats.xlsx'

dfSampleStats.to_excel(fpn, index=False)

logger.info(fpn.as_posix())

Now, you can go straight to:
* [IV. Export data for manual analyses through Distance software GUI](#IV.-Export-data-for-manual-analyses-through-Distance-software-GUI),
* [V. Automated pre-analyses](#V.-Automated-pre-analyses),
* [VI. Automated (opt-)analyses](#VI.-Automated-(opt-)analyses),

or simply to next cell.

# IV. Export data for manual analyses through Distance software GUI

In [None]:
# Output folder for exported files
workDir = tmpDir / dt.datetime.now().strftime('%y%m%d-%H%M%S')
workDir.as_posix()

In [None]:
# Create a PreAnalyser instance (it knows how to export :-).
pranlysr = ads.MCDSPreAnalyser(dfIndDistObs, dfTransects=dfTransects, effortConstVal=pars.passEffort,
                               dSurveyArea=pars.studyAreaSpecs, transectPlaceCols=pars.transectPlaceCols,
                               passIdCol=pars.passIdCol, effortCol=pars.effortCol,
                               sampleSelCols=pars.sampleSelCols, sampleDecCols=[pars.effortCol, pars.distanceCol],
                               sampleIndCol=pars.sampleIndCol,
                               abbrevCol=pars.sampleAbbrevCol, abbrevBuilder=pars.sampleAbbrev,
                               distanceUnit=pars.distanceUnit, areaUnit=pars.areaUnit,
                               surveyType=pars.surveyType, distanceType=pars.distanceType,
                               clustering=pars.clustering,
                               workDir=workDir)

In [None]:
# Export data for the selected samples.
pranlysr.exportDSInputData(implSampleSpecs=sampleSpecs)

# V. Automated pre-analyses

## 1a. Or: Really run the pre-analyses

Prerequisites: run notebook at least up to end of [III. Selection of samples for Distance Sampling analyses](#III.-Selection-of-samples-for-Distance-Sampling-analyses).

In [None]:
# Output folder for results, reports, ...
workDir = tmpDir / dt.datetime.now().strftime('%y%m%d-%H%M%S')

# Output result workbook file.
preResFileNameSufx = 'PreAnalyses-results'
presFileName = workDir / f'{pars.studyName}{pars.subStudyName}-{preResFileNameSufx}.xlsx'

presFileName.as_posix()

In [None]:
# Create a pre-analyser object.
pranlysr = ads.MCDSPreAnalyser(dfIndDistObs, dfTransects=dfTransects, effortConstVal=pars.passEffort,
                               dSurveyArea=pars.studyAreaSpecs, transectPlaceCols=pars.transectPlaceCols,
                               passIdCol=pars.passIdCol, effortCol=pars.effortCol,
                               sampleSelCols=pars.sampleSelCols, sampleDecCols=[pars.effortCol, pars.distanceCol],
                               sampleIndCol=pars.sampleIndCol,
                               abbrevCol=pars.sampleAbbrevCol, abbrevBuilder=pars.sampleAbbrev,
                               distanceUnit=pars.distanceUnit, areaUnit=pars.areaUnit,
                               surveyType=pars.surveyType, distanceType=pars.distanceType,
                               clustering=pars.clustering,
                               resultsHeadCols=pars.preResultsHeadCols,
                               workDir=workDir,
                               runMethod=pars.runPreAnalysisMethod, runTimeOut=pars.runPreAnalysisTimeOut,
                               logData=pars.logPreAnalysisData, logProgressEvery=pars.logPreAnalysisProgressEvery)

In [None]:
# Check pre-analysis parameters = sample specs.
dfExplSampleSpecs, userParamSpecCols, intParamSpecCols, unmUserParamSpecCols, verdict, reasons = \
    pranlysr.explicitParamSpecs(implParamSpecs=sampleSpecs, dropDupes=True, check=True)  

logger.info(dict(nSamples=len(dfExplSampleSpecs)))

assert userParamSpecCols == [] # No analysis params here (auto. generated by PreAnalyser)
assert intParamSpecCols == [] # Idem
assert verdict
assert not reasons

In [None]:
%%time

# Run pre-analysis, on selected samples, with given model fall-back strategy
# Note: Here, we use at most 12-worker parallelism, but you might have to lower this if your PC has less hyper-threaded cores ...
preResults = pranlysr.run(implSampleSpecs=sampleSpecs, dModelStrategy=pars.modelPreStrategy, threads=12)

# Done.
preAnalysed = True

pranlysr.shutdown()

# Add some more stats to thre results object
if 'dfSampleStats' in vars():
    preResults.updateSpecs(sampleStats=dfSampleStats)

# Save results (might be useful for building a pre-analysis report in a later notebook session)
preResults.toExcel(presFileName)

backup(presFileName)  # Just in case you wanna keep results from multiple tries with diffferent samples specs or strategies ...

Note: Performances figures on a 6-core HT i7-10850H Ruindows 10 (2023) laptop with PCI-e SSD, "optimal performance power scheme", Python 3.8.15: **between 0.5 and 1.5s per sample with 12 threads** (depending on the analysis failure rate and model strategy).

Now, you can go straight to:
* [2. Build pre-analysis Excel and/or HTML report(s)](#2.-Build-pre-analysis-Excel-and%2For-HTML-report(s)).

## 1b. Or: Load pre-analysis results from a previous session

Prerequisites: run notebook at least up to end of [II. Load individualised observations](#II.-Load-individualised-observations).

In [None]:
if 'preAnalysed' not in vars():
    preAnalysed = False  # No, we've got pre-results just generated in this session, so we'll use them.

In [None]:
if not preAnalysed:
    
    if 'preResFileNameSufx' not in vars():
        preResFileNameSufx = 'PreAnalyses-results'
    
    # List tmpDir sub-folders that contain pre-analysis results
    resFolders = [fn.name for fn in tmpDir.glob('[0-9]'*6+'-'+'[0-9]'*6)
                  if (fn / f'{pars.studyName}{pars.subStudyName}-{preResFileNameSufx}.xlsx').is_file()]
    
    logger.info('Available pre-analysis results:')
    for index, folder in enumerate(resFolders):
        logger.info(f'[{index}] {folder}')
    
    # Choose manually the desired folder
    preResDirIndex = int(input(f'Enter the integer index of the chosen folder, in [0, {len(resFolders) - 1}]: '))
    workDir = tmpDir / resFolders[preResDirIndex]
    
    presFileName = workDir / f'{pars.studyName}{pars.subStudyName}-{preResFileNameSufx}.xlsx'
    
    logger.info(f'Selected result file: {presFileName}')

In [None]:
if not preAnalysed:
    
    # An analyser object knowns how to build an empty results object ...
    # But beware: We have to use the same constructor parameters as for the instance used to generate these results !
    pranlysr = ads.MCDSPreAnalyser(dfIndDistObs, dfTransects=dfTransects, effortConstVal=pars.passEffort,
                                   dSurveyArea=pars.studyAreaSpecs, transectPlaceCols=pars.transectPlaceCols,
                                   passIdCol=pars.passIdCol, effortCol=pars.effortCol,
                                   sampleSelCols=pars.sampleSelCols, sampleDecCols=[pars.effortCol, pars.distanceCol],
                                   sampleIndCol=pars.sampleIndCol,
                                   abbrevCol=pars.sampleAbbrevCol, abbrevBuilder=pars.sampleAbbrev,
                                   distanceUnit=pars.distanceUnit, areaUnit=pars.areaUnit,
                                   surveyType=pars.surveyType, distanceType=pars.distanceType,
                                   clustering=pars.clustering,
                                   resultsHeadCols=pars.preResultsHeadCols)
    
    preResults = pranlysr.setupResults()
    
    # Load results from file
    preResults.fromFile(presFileName)
    
else:
    
    logger.info('Pre-analyses just run, results are still in kernel memory: no need to reload.')
    
logger.info('... {} pre-analyses ready for reporting'.format(len(preResults)))

## 2. Build pre-analysis Excel and/or HTML report(s)

In [None]:
preResults.dfData

In [None]:
# Create a pre-analysis report object.
preReport = ads.MCDSResultsPreReport(resultsSet=preResults,
                                     title=pars.preReportStudyTitle, subTitle=pars.preReportStudySubTitle,
                                     anlysSubTitle=pars.preReportAnlysSubTitle, description=pars.preReportStudyDescr,
                                     keywords=pars.reportStudyKeywords, lang=pars.studyLang, 
                                     pySources=['acdc-2019-nat-ds-run.ipynb', 'acdc-2019-nat-ds-params.py'],
                                     sampleCols=pars.preReportSampleCols, paramCols=pars.preReportParamCols,
                                     resultCols=pars.preReportResultCols, synthCols=pars.preReportSynthCols,
                                     sortCols=pars.preReportSortCols, sortAscend=pars.preReportSortAscend,
                                     tgtPrefix=f'{pars.studyName}{pars.subStudyName}-preanalyses-report',
                                     tgtFolder=workDir, **pars.preReportPlotParams)

In [None]:
# Generate the Excel workbook report (.xlsx)
xlsxPreRep = preReport.toExcel()

backup(xlsxPreRep)  # Just in case you wanna keep reports from multiple tries with diffferent samples specs or strategies ...

HTML(f'Excel report: <a href="{xlsxPreRep}" target="blank">{pl.Path(xlsxPreRep).as_posix()}</a>')

In [None]:
# Generate the OpenDoc report (.ods)
# Note: New feature of Pandas 1.1, but no cell-coloring support through styles yet (as of 1.1.3) :-(

# odsPreRep = preReport.toOpenDoc()

# backup(odsPreRep)  # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or strategies ...

# HTML(f'Rapport OpenDoc/Spreadsheet : <a href="{odsPreRep}" target="blank">{odsPreRep}</a>')

In [None]:
%%time

# Generate the HTML report (using 6 parallel generators: consider lowering this figure on less powerfull computers).
htmlPreRep = preReport.toHtml(generators=6)

backup(htmlPreRep)  # Just in case you wanna keep reports from multiple tries with diffferent samples specs or strategies ...
                    # But warning here: it's HTML, and linked files in analysis-specific subfolders are not backup !

HTML(f'HTML pre-report: <a href="{htmlPreRep}" target="blank">{pl.Path(htmlPreRep).as_posix()}</a>')

Note: Performances figures on a 6-core HT i7-10850H Ruindows 10 (2023) laptop with PCI-e SSD, "optimal performance power scheme", Python 3.8.15: around **1.5s per sample with 6 parallel generators**.

Now, you can go straight to:
* [VI. Automated (opt-)analyses / 2b. Or: Load (opt-)analyses results from a previous session](#2b.-Or%3A-Load--(opt-)analyses-results-from-a-previous-session),

or simply to next cell.

# VI. Automated (opt-)analyses

(with possible auto-determination of truncation distance parameters, through an optimisation technique)

Prerequisites: run notebook at least up to end of [III. Selection of samples for Distance Sampling analyses](#III.-Selection-of-samples-for-Distance-Sampling-analyses).

## 1. Define analysis specs

### Or: All analysis variants specified in this workbook file

In [None]:
studyVariant = ''
anlysSpecFileName = f'{pars.studyName}-OptAnalysesToDo'
ignoreSpecs = []

### Or: Idem, but without analysis variants having auto-optimised distance truncation parameters

Warning: For this special case, you'll need to remove `rs.CLParModFitDistCuts` from `fullReportParamCols` (respectively `filsorReportParamCols`) in `acdc-2019-ds-params.py` if you want to generate a full report (respectively an auto-filtered and sorted report) ; because after running the analyses, the 'model fitting distance cut points' column is lacking from the results, as there's no more any variant on it after removing the 'AutoTruncations' variant sheet from the spec. workbook.

In [None]:
studyVariant = '-nooptim'
anlysSpecFileName = f'{pars.studyName}-OptAnalysesToDo'
ignoreSpecs = ['AutoTruncations_impl']

### Or : All analysis variants specified in this workbook file, but not re-running the optimisations

* first, you need in-memory results, through an opt-analysis run, or through reloading them via [2b. Or: Reload  (opt-)analyses results from a previous session](#2b.-Or%3A-Reload--(opt-)analyses-results-from-a-previous-session) below)
* then, you just need to extract the sample id. and analysis parameters, with ready-to-go truncation params (auto-optimised or user-set).

In [None]:
studyVariant = '-noreoptim'
anlysSpecFileName = None

In [None]:
dfExplOptAnlysSpecs = results.dfTransData(pars.studyLang)[['Espèce', 'Passage', 'Adulte', 'Durée', 'FonctionClé', 'SérieAjust',
                                                           'TrGche', 'TrDrte', 'NbTrchMod', 'OptimTrunc']]
dfExplOptAnlysSpecs

### Mandatory: Final checks and preparation, after specifying variants though a spec file or a DataFrame

In [None]:
dict(studyVar=studyVariant, specs=anlysSpecFileName if anlysSpecFileName else 'dfExplOptAnlysSpecs', ignore=ignoreSpecs)

In [None]:
# Analysis specs :
# * or: all-implicit ones, from optAnlysSpecs (with possibly some to remove),
# * or: all-explicit, from given dfExplOptAnlysSpecs DataFrame.
if anlysSpecFileName:
    
    # No explicit specs from dfExplOptAnlysSpecs.
    dfExplOptAnlysSpecs = None
    
    # All-implicit specs, from the file.
    optAnlysSpecFileExts = ['.ods', '.xlsx']
    for ext in optAnlysSpecFileExts:
        optAnlysSpecs = pars.dataDir / f'{anlysSpecFileName}{ext}'
        if optAnlysSpecs.is_file():
            break

    assert optAnlysSpecs.is_file(), \
           '{} not found, neither those with other extensions [{}] !' \
           .format(optAnlysSpecs.as_posix(), ', '.join(optAnlysSpecFileExts[:-1]))

    logger.info('All-implicit specs, through ' + optAnlysSpecs.as_posix())

    # Finally, remove some partial specs if specified
    if ignoreSpecs:
        optAnlysSpecs = pd.read_excel(optAnlysSpecs, sheet_name=None)
        for spec in ignoreSpecs:
            del optAnlysSpecs[spec]
        logger.info('... after removing ' + str(ignoreSpecs))
        logger.info('... leaving at the end ' + str(list(optAnlysSpecs.keys())))
        
else:
    
    # No implicit specs from a file.
    optAnlysSpecs = None
    
    # All-explicit specs, from given dfExplOptAnlysSpecs.
    assert not dfExplOptAnlysSpecs.empty
    
    logger.info('All-explicit specs, through dfExplOptAnlysSpecs')

## 2a. Or: Really run the (opt-)analyses

In [None]:
# Set to True for recovering from where interrupted during optimisations (in order not to redo all from start).
recoverOptims = False

In [None]:
if recoverOptims:
    
    # List possible folders for recovery.
    logger.info('Available result folders for recovery:')
    
    bkupFileNamePat = 'optr-resbak-[01].pickle.xz'
    resFolders = list()
    folderInd = 0
    for fpn in sorted(tmpDir.glob('[0-9]'*6+'-'+'[0-9]'*6)):
        
        bkupFilePathNames = list(fpn.glob(bkupFileNamePat))
        if bkupFilePathNames:
            
            logger.info(f'  [{folderInd}] {fpn.name}')
            for bfpn in bkupFilePathNames:
                logger.info('    {} {}'.format(bfpn.name, pd.Timestamp.fromtimestamp(bfpn.stat().st_mtime)))
            
            resFolders.append(fpn.name)
            folderInd += 1
            
    # If any, let the user select the right one.
    if folderInd:
        
        # Choose manually the folder to recover from / go on working inside.
        resDirIndex = int(input(f'Enter the integer index of the chosen folder, in [0, {len(resFolders) - 1}]: '))
        resFolder = resFolders[resDirIndex]
        
    # Otherwise, nothing more to achieve.
    else:
        logger.info('None found, no possible recovery: seems all needed analyses were finally run !')
        
else:
    
    # A brand new one.
    resFolder = dt.datetime.now().strftime('%y%m%d-%H%M%S')
    
# Output folder for results, reports ... etc.
workDir = tmpDir / resFolder

logger.info(f'Selected recovery run folder: {workDir.as_posix()}')

In [None]:
# Result file
resFileNameSufx = f'OptAnalyses{studyVariant}-results'
resFileName = workDir / f'{pars.studyName}{pars.subStudyName}-{resFileNameSufx}.xlsx'

resFileName.as_posix()

In [None]:
# Create the opt-analyser object.
optanlr = \
    ads.MCDSTruncationOptanalyser(dfIndDistObs, dfTransects=dfTransects, effortConstVal=pars.passEffort,
                                  dSurveyArea=pars.studyAreaSpecs, transectPlaceCols=pars.transectPlaceCols,
                                  passIdCol=pars.passIdCol, effortCol=pars.effortCol,
                                  sampleSelCols=pars.sampleSelCols, sampleDecCols=[pars.effortCol, pars.distanceCol],
                                  sampleDistCol=pars.distanceCol,
                                  abbrevCol=pars.analysisAbbrevCol, abbrevBuilder=pars.analysisAbbrev,
                                  anlysIndCol=pars.analysisIndCol, sampleIndCol=pars.sampleIndCol,
                                  distanceUnit=pars.distanceUnit, areaUnit=pars.areaUnit,
                                  surveyType=pars.surveyType, distanceType=pars.distanceType,
                                  clustering=pars.clustering,
                                  resultsHeadCols=dict(before=[pars.analysisIndCol, pars.sampleIndCol],
                                                       sample=pars.sampleSelCols,
                                                       after=pars.analysisParamCols + [pars.analysisAbbrevCol]),
                                  ldTruncIntrvSpecs=pars.ldTruncIntrvSpecs, truncIntrvEpsilon=pars.truncIntrvEpsilon,
                                  workDir=workDir, logData=pars.logOptAnalysisData,
                                  runMethod=pars.runOptAnalysisMethod, runTimeOut=pars.runOptAnalysisTimeOut,
                                  logAnlysProgressEvery=pars.logOptAnalysisProgressEvery,
                                  logOptimProgressEvery=pars.logOptimisationProgressEvery,
                                  backupOptimEvery=pars.backupOptimisationsEvery,
                                  defEstimKeyFn=pars.defEstimKeyFn, defEstimAdjustFn=pars.defEstimAdjustFn,
                                  defEstimCriterion=pars.defEstimCriterion, defCVInterval=pars.defCVInterval,
                                  defExpr2Optimise=pars.defExpr2Optimise, defMinimiseExpr=pars.defMinimiseExpr,
                                  defOutliersMethod=pars.defOutliersMethod, defOutliersQuantCutPct=pars.defOutliersQuantCutPct,
                                  defFitDistCutsFctr=pars.defFitDistCutsFctr, defDiscrDistCutsFctr=pars.defDiscrDistCutsFctr,
                                  defSubmitTimes=pars.defSubmitTimes, defSubmitOnlyBest=pars.defSubmitOnlyBest,
                                  dDefSubmitOtherParams=pars.dDefSubmitOtherParams,
                                  dDefOptimCoreParams=dict(core=pars.defCoreEngine, maxIters=pars.defCoreMaxIters,
                                                           termExprValue=pars.defCoreTermExprValue,
                                                           algorithm=pars.defCoreAlgorithm, maxRetries=pars.defCoreMaxRetries))

In [None]:
# Auto-check opt-analysis specs.
dfExplOptAnlysSpecs, userParamSpecCols, intParamSpecCols, unmUserParamSpecCols, verdict, reasons = \
    optanlr.explicitParamSpecs(implParamSpecs=optAnlysSpecs, dfExplParamSpecs=dfExplOptAnlysSpecs, dropDupes=True, check=True)  

if not verdict:
    logger.info('Opt-analysis specs errors:')
    logger.info('\n'.join(reasons))
else:
    logger.info('Opt-analysis specs OK')

logger.info(dict(specs=', '.join(optAnlysSpecs.keys()) if isinstance(optAnlysSpecs, dict)
                       else optAnlysSpecs.as_posix() if optAnlysSpecs else 'Explicit',
                 nOptAnalyses=len(dfExplOptAnlysSpecs), userParamSpecCols=', '.join(userParamSpecCols),
                 intParamSpecCols=', '.join(intParamSpecCols), unmUserParamSpecCols=', '.join(unmUserParamSpecCols)))

assert verdict
assert not reasons

# Once explicitated, here are all the selected (opt-)analysis variants ...
#dfExplOptAnlysSpecs.to_excel(tmpDir / f'{pars.studyName}{pars.subStudyName}-ExplOptAnlysSpecs.xlsx')

dfExplOptAnlysSpecs

In [None]:
# If some pre-analyses were run in this session, check that all samples targetted by opt-analyses have been pre-analysed.
if 'dfExplSampleSpecs' in vars():
    
    dfOptAnlysSpecsCheck = dfExplOptAnlysSpecs[pars.sampleSelCols].drop_duplicates()
    dfOptAnlysSpecsCheck['OptAnalyses'] = True
    dfOptAnlysSpecsCheck.set_index(pars.sampleSelCols, inplace=True)

    dfPreAnlysSpecsCheck = dfExplSampleSpecs.copy()
    dfPreAnlysSpecsCheck['PreAnalyses'] = True
    dfPreAnlysSpecsCheck.set_index(pars.sampleSelCols, inplace=True)
    
    dfCheck = dfOptAnlysSpecsCheck.join(dfPreAnlysSpecsCheck, how='outer').reset_index()
    
    logger.info(f'{len(dfCheck)} total samples already pre-analysed or to be analysed ...')
    if dfCheck.PreAnalyses.isnull().sum():
        logger.info('... and these are the ones that have not been pre-analysed :')
        display(dfCheck[dfCheck.PreAnalyses.isnull()])
    else:
        logger.info('... and all of them have been pre-analysed (good) !')

In [None]:
# Let's use implicit specs if provided (default) ... or force using explicitated ones (if we like).

# Comment-out to force using explicit specs.
#optAnlysSpecs = None

# Use implicit specs if given (not deduced explicit ones).
if optAnlysSpecs:
    dfExplOptAnlysSpecs = None  

In [None]:
# Last quick checks about what's gonna be run and how.
dict(recoverOptims=recoverOptims, dfExplOptAnlysSpecs=len([] if dfExplOptAnlysSpecs is None else dfExplOptAnlysSpecs),
     optAnlysSpecs=optAnlysSpecs if optAnlysSpecs else None, workDir=workDir.as_posix())

In [None]:
%%time

# Run all specified (opt-)analyses.
results = optanlr.run(dfExplParamSpecs=dfExplOptAnlysSpecs, implParamSpecs=optAnlysSpecs,
                      recoverOptims=recoverOptims, threads=12)

# Tip: For running only a subset of the specified variants ...
# results = optanlr.run(dfExplOptAnlysSpecs.iloc[0:2], threads=2)

# Done.
optAnalysed = True

optanlr.shutdown()

# Add some more stats to the result object
if 'dfSampleStats' in vars():
    results.updateSpecs(sampleStats=dfSampleStats)

# Save results to disk
results.toExcel(resFileName)

backup(resFileName) # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or samples ...

Note: Performances figures on a 6-core HT i7-10850H Ruindows 10 (2023) laptop with PCI-e SSD, "optimal performance power scheme", Python 3.8.15: **between 15 and 20 analysis per second** with **12 threads** (1 analysis here means 1 MCDS.exe run: beware, when using optimised trunction parameters, this means a lot of analysis, not only as many as len(dfExplOptAnlysSpecs)).

In [None]:
results.dfData.head()

Now, you can go straight to:
* [3. Auto-filtered Excel & HTML reports](#3.-Auto-filtered-Excel-%26-HTML-reports),
* [4. Full (unfiltered) Excel & HTML reports](#4.-Full-(unfiltered)-Excel-%26-HTML-reports).

## 2b. Or: Load  (opt-)analyses results from a previous session

Prerequisites: run notebook at least up to end of [II. Load individualised observations](#II.-Load-individualised-observations).

In [None]:
if 'optAnalysed' not in vars():
    optAnalysed = False
    
if not optAnalysed:
    
    # List tmpDir sub-folders that contain (opt-)analysis results, and retrieve associated study variant (assuming only 1 = 1st)
    resFileNameSufx = 'OptAnalyses*-results'
    resFolders = {}
    for folder in tmpDir.glob('[0-9]'*6+'-'+'[0-9]'*6):
        files = list(folder.glob(f'{pars.studyName}{pars.subStudyName}-{resFileNameSufx}.xlsx'))
        print(folder, len(files))
        for file in files:
            mo = re.match(f'{pars.studyName}{pars.subStudyName}-{resFileNameSufx}.xlsx'.replace('*', '(.*)'), file.name)
            studyVariant = mo.group(1)
            resFolders[folder.name] = studyVariant
            print(file, ':', studyVariant)
            break
            
    logger.info('Available (opt-)analyses result folders (with study variant):')
    for index, (folder, variant) in enumerate(resFolders.items()):
        logger.info(f'[{index}] {folder} : variant="{variant}"')

    # Choose manually the desired folder
    resDirIndex = int(input(f'Enter the integer index of the chosen folder, in [0, {len(resFolders) - 1}]: '))
    resFolder, studyVariant = list(resFolders.items())[resDirIndex]
    workDir = tmpDir / resFolder
    
    resFileNameSufx = f'OptAnalyses{studyVariant}-results'
    resFileName = workDir / f'{pars.studyName}{pars.subStudyName}-{resFileNameSufx}.xlsx'
    assert resFileName.is_file()
    
    logger.info(f'Selected result file: {resFileName.as_posix()}')

In [None]:
if not optAnalysed:
    
    # An opt-analyser object knowns how to build an empty results object ...
    # But beware: We have to use the same constructor parameters as for the instance used to generate these results !
    optanlr = \
        ads.MCDSTruncationOptanalyser(dfIndDistObs, dfTransects=dfTransects,
                                      effortConstVal=pars.passEffort, dSurveyArea=pars.studyAreaSpecs, 
                                      transectPlaceCols=pars.transectPlaceCols, passIdCol=pars.passIdCol,
                                      effortCol=pars.effortCol, sampleSelCols=pars.sampleSelCols,
                                      sampleDecCols=[pars.effortCol, pars.distanceCol], sampleDistCol=pars.distanceCol,
                                      abbrevCol=pars.analysisAbbrevCol, abbrevBuilder=pars.analysisAbbrev,
                                      anlysIndCol=pars.analysisIndCol, sampleIndCol=pars.sampleIndCol,
                                      distanceUnit=pars.distanceUnit, areaUnit=pars.areaUnit,
                                      surveyType=pars.surveyType, distanceType=pars.distanceType,
                                      clustering=pars.clustering,
                                      ldTruncIntrvSpecs=pars.ldTruncIntrvSpecs, truncIntrvEpsilon=pars.truncIntrvEpsilon,
                                      resultsHeadCols=dict(before=[pars.analysisIndCol, pars.sampleIndCol],
                                                           sample=pars.sampleSelCols,
                                                           after=pars.analysisParamCols + [pars.analysisAbbrevCol]))
    
    results = optanlr.setupResults()
    
    # Load results from file
    results.fromFile(resFileName)
    
else:

    logger.info('(Opt-)Analyses just run, results are still in kernel memory: no need to reload.')
    
logger.info('... {} opt-analyses ready for reporting'.format(len(results)))

Now, you can go straight to:
* [4. Full (unfiltered) Excel & HTML reports](#4.-Full-(unfiltered)-Excel-%26-HTML-reports),

or simply to next cell.

## 3. Auto-filtered Excel & HTML reports

In [None]:
# Create an "auto-filter-and-sort report" object
filSorReport = ads.MCDSResultsFilterSortReport(resultsSet=results,
                                               title=pars.optAnlysFilsorReportStudyTitle,
                                               subTitle=pars.optAnlysFilsorReportStudySubTitle,
                                               anlysSubTitle=pars.optAnlysFilsorReportAnlysSubTitle,
                                               description=pars.optAnlysFilsorReportStudyDescr,                                               
                                               keywords=pars.optAnlysFilsorReportStudyKeywords, lang=pars.studyLang,
                                               pySources=['acdc-2019-nat-ds-run.ipynb', 'acdc-2019-nat-ds-params.py'],
                                               sampleCols=pars.filsorReportSampleCols, paramCols=pars.filsorReportParamCols,
                                               resultCols=pars.filsorReportResultCols, synthCols=pars.filsorReportSynthCols,
                                               sortCols=pars.filsorReportSortCols, sortAscend=pars.filsorReportSortAscend,
                                               filSorSchemes=pars.filsorReportSchemes, 
                                               tgtFolder=workDir,
                                               tgtPrefix=f'{pars.studyName}{pars.subStudyName}'
                                                         f'-optanalyses{studyVariant}-report',
                                               **pars.filsorReportPlotParams)

In [None]:
%%time

# Generate the workbook report (for all the filter & sort schemes available: see pars.filSorRepSchemes)
xlsxFilSorRep = filSorReport.toExcel(rebuild=pars.filsorReportRebuild)

backup(xlsxFilSorRep)  # Just in case you wanna keep reports from multiple tries with different analysis specs or samples ...

logger.info('Excel auto-filtered report: ' + pl.Path(xlsxFilSorRep).as_posix())

In [None]:
# Tip: Open your report workbook in your .xlsx app.
os.startfile(xlsxFilSorRep)

In [None]:
#%%time

# Generate the OpenDoc report (.ods)
# Note: New feature of Pandas 1.1, but no cell-coloring support through styles yet (as of 1.1.3) :-(

# odsFilSorRep = filsorReport.toOpenDoc(rebuild=pars.filsorReportRebuild)

# backup(odsFilSorRep)  # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or samples ...

# logger.info('OpenDoc/Workbook auto-filtered report: ' + pl.Path(odsFilSorRep).resolve().as_uri())

In [None]:
%%time

# Select the filter & sort scheme from the avalable ones (see pars.filsorReportSchemes).
htmlFilSorScheme = next(schm for schm in pars.filsorReportSchemes
                        if schm['method'] is ads.MCDSTruncOptanalysisResultsSet.filterSortOnExCAicMulQua
                           and schm['filterSort']['sightRate'] == 92.5)

# Generate the HTML report for the selected filter scheme
# Note: No parallelism used here (something to be improved in pyaudisam ;-)
htmlFilSorRep = filSorReport.toHtml(htmlFilSorScheme, rebuild=pars.filsorReportRebuild)

backup(htmlFilSorRep)  # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or samples ...
                       # But warning here: it's HTML, and linked files in analysis-specific subfolders are not backup !

afsId = results.filSorSchemeId(htmlFilSorScheme)
logger.info(f'HTML auto-filtered report ({afsId} scheme):\n=> ' + pl.Path(htmlFilSorRep).resolve().as_uri())

Note: Performances figures on a 6-core HT i7-10850H Ruindows 10 (2023) laptop with PCI-e SSD, "optimal performance power scheme", Python 3.8.15: **around 2 retained analysis per second** (**no parallel generators** here, because it is disabled got this kind of report, because of some unresolved bug).

## 4. Full (unfiltered) Excel & HTML reports

In [None]:
# Create a "full report" object
report = ads.MCDSResultsFullReport(resultsSet=results, 
                                   title=pars.optAnlysFullReportStudyTitle, subTitle=pars.optAnlysFullReportStudySubTitle,
                                   anlysSubTitle=pars.optAnlysFullReportAnlysSubTitle,
                                   description=pars.optAnlysFullReportStudyDescr,
                                   keywords=pars.optAnlysFullReportStudyKeywords, lang=pars.studyLang,
                                   pySources=['acdc-2019-nat-ds-run.ipynb', 'acdc-2019-nat-ds-params.py'],
                                   sampleCols=pars.fullReportSampleCols, paramCols=pars.fullReportParamCols,
                                   resultCols=pars.fullReportResultCols, synthCols=pars.fullReportSynthCols,
                                   sortCols=pars.fullReportSortCols, sortAscend=pars.fullReportSortAscend,
                                   tgtFolder=workDir,
                                   tgtPrefix=f'{pars.studyName}{pars.subStudyName}'
                                             f'-optanalyses{studyVariant}-report',
                                   **pars.fullReportPlotParams)

In [None]:
%%time

# Generate the workbook report
xlsxRep = report.toExcel(rebuild=pars.fullReportRebuild)

backup(xlsxRep)  # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or samples ...

logger.info('Full Excel workbook report: ' + pl.Path(xlsxRep).as_posix())

In [None]:
%%time

# Generate the full HTML report
# Note: Parallelism works well here !
# (the default value for the "generators" parameter is something like the actual number of (hyper-treaded) cores of your CPU)
htmlRep = report.toHtml(rebuild=pars.fullReportRebuild)

backup(htmlRep)  # Just in case you wanna keep reports from multiple tries with diffferent analysis specs or samples ...
                 # But warning here: it's HTML, and linked files in analysis-specific subfolders are not backup !

logger.info('Full HTML report: ' + pl.Path(htmlRep).as_posix())

Note: Performances figures on a 6-core HT i7-10850H Ruindows 10 (2023) laptop with PCI-e SSD, "optimal performance power scheme", Python 3.8.15: **between 2 and 5 analysis per second with 12 parallel generators** (default number of generators = as many as hyper-threaded cores in the CPU).

# Appendix. Sumup and stats for individualised sightings

In [None]:
print(f'{pars.studyName}{pars.subStudyName}')

## 1. Transects

In [None]:
dfTransects

In [None]:
# Number of observers
dfTransects.Observateur.nunique()

In [None]:
# Number of field inventories = (point) transects per observer and seasonal passing
dfTransects[['Observateur', 'Passage', 'Point']].groupby(['Observateur', 'Passage']).count()

In [None]:
# Total number of inventories per seasonal passing
dfTransects.Passage.value_counts()

## 2. Individualised sightings

In [None]:
dfIndDistObs2 = dfIndDistObs.copy()

dfIndDistObs2['Durée'].replace('5mn', '05mn', inplace=True)

dfIndDistObs2

In [None]:
# Total number of species
display(dfIndDistObs2['Espèce'].nunique())

# Number of species per passing and transect duration
df = dfIndDistObs2[['Passage', 'Durée', 'Espèce']].groupby(['Passage', 'Durée']).nunique().unstack(-2).unstack(-1).to_frame().T
df.columns = df.columns.droplevel(0)
for p in ['a', 'b']:
    df[(p, '10mn/05mn')] = (df[(p, '10mn')] - df[(p, '05mn')]) / df[(p, '05mn')]
df.sort_index(axis='columns', inplace=True)
df

In [None]:
# Nb of individuals observed per species, passing and transect duration
df = dfIndDistObs2[['Passage', 'Durée', 'Espèce', 'Distance']].copy()
df['Adult'] = adulte = 'All adults'
df['Method'] = pars.subStudyName[1:]

df = df.groupby(['Espèce', 'Method', 'Adult', 'Passage', 'Durée']).count().unstack(-4).unstack(-3).unstack(-2).unstack(-1)
df.columns = df.columns.droplevel(0)
df.fillna(0, inplace=True)
for duree in ['05mn', '10mn']:
    df[(pars.subStudyName[1:], adulte, 'b+a', duree)] = \
        df[(pars.subStudyName[1:], adulte, 'a', duree)] + df[(pars.subStudyName[1:], adulte, 'b', duree)]

df.sort_values(by=[(pars.subStudyName[1:], adulte, 'b+a', '10mn')], ascending=False, inplace=True)
df.sort_index(axis='columns', inplace=True)
df1 = df
df

In [None]:
# Nb of males observed per species, passing and transect duration
df = dfIndDistObs2.loc[dfIndDistObs2.Adulte == 'm', ['Passage', 'Durée', 'Espèce', 'Distance']].copy()
df['Adult'] = adulte = 'Only mâles'
df['Method'] = pars.subStudyName[1:]

df = df.groupby(['Espèce', 'Method', 'Adult', 'Passage', 'Durée']).count().unstack(-4).unstack(-3).unstack(-2).unstack(-1)
df.columns = df.columns.droplevel(0)

df.fillna(0, inplace=True)
for duree in ['05mn', '10mn']:
    df[(pars.subStudyName[1:], adulte, 'b+a', duree)] = \
        df[(pars.subStudyName[1:], adulte, 'a', duree)] + df[(pars.subStudyName[1:], adulte, 'b', duree)]

df.sort_values(by=[(pars.subStudyName[1:], adulte, 'b+a', '10mn')], ascending=False, inplace=True)
df.sort_index(axis='columns', inplace=True)
df

In [None]:
# Nb of males and all adults observed per species, passing and transect duration
df1.join(df, how='outer').sort_values(by=[(pars.subStudyName[1:], 'All adults', 'b+a', '10mn')],
                                      ascending=False) # .to_excel(f'donnees/acdc/bilan-especes{pars.subStudyName}.xlsx')

In [None]:
# Nb of individuals (all adults) observed per passing and transect duration
df = dfIndDistObs2

df = df[['Passage', 'Durée', 'Adulte']].groupby(['Passage', 'Durée']).count().unstack(-2)
df.columns = df.columns.droplevel(0)
df.sort_index(axis='columns', inplace=True)

display(df)

# How much more individuals (all adults) are observed in 10mn transects, when compared to 5mn ones (mean on all passes) 
print('05mn => 10mn : +', 100 * (df.loc['10mn'].sum() - df.loc['05mn'].sum()) / df.loc['05mn'].sum(), '%')

# How much more individuals (all adults) are observed in 10mn transects, when compared to 5mn ones, per seasonal passing 
df = df.unstack(-1).to_frame().T
for p in ['a', 'b']:
    df[(p, '10mn/05mn')] = (df[(p, '10mn')] - df[(p, '05mn')]) / df[(p, '05mn')]
df.sort_index(axis='columns', inplace=True)

display(df)

In [None]:
# Nb of males observed per passing and transect duration
df = dfIndDistObs2[dfIndDistObs2.Adulte == 'm']

df = df[['Passage', 'Durée', 'Adulte']].groupby(['Passage', 'Durée']).count().unstack(-2)
df.columns = df.columns.droplevel(0)

display(df)

# How much more males are observed in 10mn transects, when compared to 5mn ones (mean on all passes) 
print('05mn => 10mn : +', 100 * (df.loc['10mn'].sum() - df.loc['05mn'].sum()) / df.loc['05mn'].sum(), '%')

# How much more males are observed in 10mn transects, when compared to 5mn ones, per seasonal passing 
df = df.unstack(-1).to_frame().T
for p in ['a', 'b']:
    df[(p, '10mn/5mn')] = (df[(p, '10mn')] - df[(p, '05mn')]) / df[(p, '05mn')]
df.sort_index(axis='columns', inplace=True)

display(df)