This tutorial is designed for users with little to no coding expertise or who are not familiar with the use of Jupyter Notebooks. More experienced users can also have a look at the [API tutorial](https://dsegebarth.github.io/dcl_stats_n_plots/api_tutorial.html) on how to use the API.

## Overview

If you are new to python and/or the jupyter interface and want a more detailed guide on how to use the dcl_stats_n_plot GUI, this tutorial is for you.

At first, please have a look at our [installation guide](https://dsegebarth.github.io/dcl_stats_n_plots/#Installation). It will take you through the entire process of setting up your computer in the right way so that you can use our interactive GUI - no worries, there is absolutely no programming expertise needed.

Once you successfully completed the [installation guide](https://dsegebarth.github.io/dcl_stats_n_plots/#Installation), please make sure the data you would like to analyze and plot does actually follow the correct format. For this, you will find a "Expected input data format" section for each type of analysis that can be done with dcl_stats_n_plots. We also provide examples of typical experiments for which the respective types of analyses are commonly used and provide GIFs to showcase the usage of the dcl_stats_n_plots GUI for each type of analysis. Currently, these implemented analyses are:

1. Comparison of two or more independent samples
2. Comparison of one group to a reference value
3. Mixed-Model ANOVA (comparison of two or more groups with repeated measures)

## Comparison of two or more independent samples

### Representative experiment

The comparison of two or more independent samples is probably one of the most frequently performed analyses in the lab. You can use this type of analysis if you:

- have multiple experimental subjects (>3) 
- that belong to at least two different groups
- and the given experiment is perfomed only once with each experimental subject (no repeated measurements)

For instance, if you recorded a single open field session of each mouse in a experimental cohort of 36 mice, which represent three different genotypes (12 homozygous transgenic mice \[tg/tg\], 12 heterozygous transgenic mice \[tg/+\], and 12 wildtype mice \[+/+\]). After you analyzed the videos, you now want to test for significant differences in the time the mice spent in the center of the open field - based on their genotype.

:::{.callout-note}

Equal samples sizes per group is ideal, yet no prerequisite for this analysis.

:::

### Expected input data format

For this analysis, you only have to provide two columns. The first one that contains the actual data values, and the second column that specifies to which group the respective data value belongs. In Microsoft Excel, this might look like the following:

![independent_samples_input_schema.png](https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/independent_samples_input_schema.png)

Where the first two data values belong to the group called "group_a", the third and fourth to a second group called "group_b", and the sixth and seventh value belong to a third group called "group_c". You probably got the idea ;-)

You are entirely free to choose any column names (for instance "time spent in center \[s\]" instead of "data", or "genotype" instead of "group_id") and also to provide your favorite group names (e.g. "+/+" instead of "group_a" and "tg/+" for "group_b"). 

:::{.callout-note}

The column name of your data column will be used as the default y-axis label, but you will still be able to modify that on demand.

:::

:::{.callout-note}

The order of appearance of the groups in the group column (top to bottom) will be used as the default x-axis order, but you will still be able to modify that on demand.

:::

### Perform the analysis of independent samples using our GUI:

All you need to do is launch our GUI and load the input data you prepared. To launch the GUI, simply copy-paste the follwing two lines of code into a Code cell of a Jupyter Notebook and run it:

In [None]:
#| output: false
from dcl_stats_n_plots import gui

gui.launch()

VBox(children=(FileUpload(value={}, accept='.xlsx, .csv', description='Upload'), HBox(children=(Dropdown(descr…

Please also find some animated GIFs below, that showcase some basic features of the GUI. Feel free to check out the GIFs for the other types of analyses to see even more examples.

:::{.callout-note}

If you don´t have your own input data ready yet, feel free to use the dummy data we have created for this tutorial. [Click here](https://github.com/DSegebarth/dcl_stats_n_plots/raw/master/test_data/independent_samples.xlsx) to initialize the download for a representative "independent samples" dataset.

:::

**Launch the GUI**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Launch_GUI_independent_samples.gif" alt="Launch_GUI" width="1024"/>

**Annotate the statistical results**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Independent_samples_annotate_stats.gif" alt="Annotate_stats_ind" width="1024"/>

**Customize your plot**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Independent_samples_customize_plot.gif" alt="Customize_plot_ind" width="1024"/>

**Download the statistical results and the generated plot**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Independent_samples_download.gif" alt="Download" width="1024"/>

## Comparison of a single group to a fixed value

### Representative experiment

:::{.callout-note}

will be added soon.

:::

### Expected input data format

For this analysis, you have to provide three columns. The first column must contain the data, the second column provides the corresponding group_id (see note below the image), and the third column provides the fixed value against which the group shall be compared.

In Microsoft Excel, this might look like the following:

![one_sample_input_schema.png](https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/one_sample_input_schema.png)

:::{.callout-note}

Of course, the second column ('group column') will only contain one group_id, since this is a _one sample_ analysis. However, keeping the expected input data structure consistent across functions (data, group, ..) is supposed to make the use of dcl_stats_n_plots a little easier for the user. On top, it allows the user to provide the x-axis-label (= group column header) and the corresponding x-axis-tick-label (= the group_id that is provided in the group column).

:::

### Perform the analysis of one sample using our GUI:

For this showcase, we have used a dummy dataset, which is freely available via our GitHub repository ([click here](https://github.com/DSegebarth/dcl_stats_n_plots/raw/master/test_data/one_sample_not_significant.xlsx) to download the file) and can be used until you have your own input data to analyze. To launch the GUI, just run the following two lines of code in a Jupyter Notebook:

In [None]:
#| output: false
from dcl_stats_n_plots import gui

gui.launch()

VBox(children=(FileUpload(value={}, accept='.xlsx, .csv', description='Upload'), HBox(children=(Dropdown(descr…

**Run a one-sample test**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/One_sample_adjust_figsize.gif" alt="One_sample" width="1024"/>

## Mixed-Model ANOVA

:::{.callout-warning}

This function is currently only implemented in a parametric variant (there is no non-parametric equivalent available via pingouin). This affects only the group level statistics (main effects), while pairwise comparisons are again performed either using parametric or non-parametric tests, depending on the respective data.

:::

### Representative experiment

:::{.callout-note}

will be added soon.

:::

### Expected input data format

Now, for this analysis, things get a little bit more complicated. In total, your input data has to be structured in four columns (also have a look at the image below). First and second column are again, as for the other analyses, the columns that provide the data values (1st) and group assignment (2nd). The third column has to provide unique subject identifiers for each experimental subject. The values of the fourth column reflect the corresponding recording sessions (i.e. an identifier for each part of your experiment, in which one of the repeated measurements took place).

In Microsoft Excel, this might look like the following:

![mma_input_schema.png](https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/mma_input_schema.png)

### Perform a Mixed-Model ANOVA using our GUI:

For this showcase, we have used a dummy dataset, which is freely available via our GitHub repository ([click here](https://github.com/DSegebarth/dcl_stats_n_plots/raw/master/test_data/mixed_model_anova_3groups.xlsx) to download the file) and can be used until you have your own input data to analyze. To launch the GUI, just run the following two lines of code in a Jupyter Notebook:

In [None]:
#| output: false
from dcl_stats_n_plots import gui

gui.launch()

VBox(children=(FileUpload(value={}, accept='.xlsx, .csv', description='Upload'), HBox(children=(Dropdown(descr…

**Run a Mixed-Model-ANOVA**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Load_MMA.gif" alt="Load_MMA" width="1024"/>

**Customize annotations**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Customize_annotations_MMA.gif" alt="Customize_annotations_MMA" width="1024"/>

**Customize group order**

<img src="https://raw.githubusercontent.com/DSegebarth/dcl_stats_n_plots/master/media/Customize_group_order_MMA.gif" alt="Customize_groups_MMA" width="1024"/>