# This Jupyter Notebook is about Bayesian Data Analysis for neuroscience data

## Introduction

This notebook is part of a 20-week internship project carried out at Ulster University.
The main goal of the project is to make advanced Bayesian statistical models more accessible
to experimental neuroscientists through user-friendly code, tutorials, and examples.

Specifically, this notebook focuses on applying existing Bayesian models to neuroscience datasets
using libraries in Python.

## Objectives

1. Apply the existing Bayesian models to a neuroscience dataset from scratch,
   documenting each step as if it were for a beginner user.

2. Design a simple and reproducible analysis pipeline using PyStan.

3. Produce clear, well-documented code that can later be integrated into
   an interactive tutorial or a web application.

## Author

- Mathis DA SILVA
- Ulster University Internship (July–December 2025)
- Supervisors: Dr. Cian O'Donnell & Dr. Conor Houghton

## References

- [Dataset from "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression"](https://www.nature.com/articles/s41467-025-56850-6#Sec25)
- [Hierarchical Bayesian modeling of multi-region brain cell count data](https://elifesciences.org/reviewed-preprints/102391v1)
- [Statistical Rethinking 2023 PDF](https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf)
- [Statistical Rethinking 2023 Videos](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus)

----

#### Here, we call libraries that we will use in this notebook.

In [2]:
import numpy as np
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt
import arviz as az
import seaborn as sns

In [10]:
datasets = {

    # Load dataset 1 : 20 regions and 2 groups (16 mice)
    'dataset1': pd.read_excel('data/dataset_neuroscience_1.xlsx', sheet_name="c-Fos-counts"),

    # Load dataset 2 : 40 regions and 4 groups (32 mice)
    'dataset2': pd.read_excel('data/dataset_neuroscience_2.xlsx', sheet_name="c-Fos-counts"),

    # Load dataset 3 : 80 regions and 4 groups (32 mice)
    'dataset3': pd.read_excel('data/dataset_neuroscience_3.xlsx', sheet_name="c-Fos-counts"),

    # Load dataset 4 : 160 regions and 6 groups (48 mice)
    'dataset4': pd.read_excel('data/dataset_neuroscience_4.xlsx', sheet_name="c-Fos-counts"),

    # Load dataset 5 : 315 regions and 8 groups (64 mice)
    'dataset5': pd.read_excel('data/dataset_neuroscience_vo.xlsx', sheet_name="c-Fos-counts")
}

#### Indications:

Previously, we added the dataset. In which, the first three columns represent brain regions with name and abbreviation. Others represent mice group by drugs as **MDMA**, **Ketamine**, **Fluoxetine**, ...

There are **64 mice** in total, and each mouse has a value for each brain region. The values represent the number of cells expressing c-Fos, a marker of **neuronal activity**. Plus, there are **315 brain regions** in the dataset.

### Statistical Models
---

We will use a **Hierarchical Bayesian Model** to analyze the dataset. The model will allow us to account for the hierarchical structure of the data, where measurements are nested within brain regions and mice. Models are: **Poisson**, **Horseshoe** and **Zero-Inflated Poisson (ZIP)**.
