# This Jupyter Notebook is about Bayesian Data Analysis for neuroscience data

## Introduction

This notebook is part of a 20-week internship project carried out at Ulster University.
The main goal of the project is to make advanced Bayesian statistical models more accessible
to experimental neuroscientists through user-friendly code, tutorials, and examples.

Specifically, this notebook focuses on applying existing Bayesian models to neuroscience datasets
using libraries in Python.

## Objectives

1. Apply the existing Bayesian models to a neuroscience dataset from scratch,
   documenting each step as if it were for a beginner user.

2. Design a simple and reproducible analysis pipeline using PyMC.

3. Produce clear, well-documented code that can later be integrated into
   an interactive tutorial or a web application.

## Author

- Mathis DA SILVA
- Ulster University Internship (July–December 2025)
- Supervisors: Dr. Cian O'Donnell & Dr. Conor Houghton

## References

- [Dataset from "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression"](https://www.nature.com/articles/s41467-025-56850-6#Sec25)
- [Hierarchical Bayesian modeling of multi-region brain cell count data](https://elifesciences.org/reviewed-preprints/102391v1)
- [Statistical Rethinking 2023 PDF](https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf)
- [Statistical Rethinking 2023 Videos](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus)

Here, we call libraries that we will use in this notebook for the moment.

In [1]:
import math
import numpy as np
import pymc as pm
import pandas as pd



We will use the dataset from the paper "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression".



In [2]:
dataset = pd.read_excel('data/dataset_neuroscience_vo.xlsx')

dataset

Unnamed: 0,abbreviation,region name,brain area,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,5MEO5 count,5MEO6 count,5MEO7 count,...,PSI7 count,PSI8 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count,SAL5 count,SAL6 count,SAL7 count,SAL8 count
0,FRP,Frontal pole cerebral cortex,Cortex,9574,7781,17598,4425,7428,8302,4288,...,3367,4342,7404,4925,12521,10363,4562,14383,789,6067
1,ILA,Infralimbic area,Cortex,12138,6742,28070,1685,15612,17191,6061,...,7591,5778,9665,8049,10853,2844,15747,15412,11667,21630
2,ORBl,Orbital area lateral part,Cortex,48129,45849,120147,28655,40438,54206,39938,...,8291,14603,56825,30618,58755,14705,26686,59049,5192,36019
3,ORBm,Orbital area medial part,Cortex,17225,8551,34163,6330,14908,23250,7993,...,4878,7177,13035,16101,14017,7855,14478,30999,10051,28545
4,ORBvl,Orbital area ventrolateral part,Cortex,32690,24460,58132,16015,24182,31926,15148,...,5081,9116,37775,26349,33593,10743,21789,37644,9320,23276
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,LING,Lingula (I),Cerebellum,52,65,317,221,63,131,2,...,150,112,51,7,119,70,22,233,114,255
312,DN,Dentate nucleus,Cerebellum,51,8,926,144,28,37,185,...,24,8,85,21,19,15,6,6,0,12
313,NOD,Nodulus (X),Cerebellum,1528,1153,3218,2399,501,562,111,...,2318,1105,369,185,610,34,7016,1493,1293,6273
314,VeCB,Vestibulocerebellar nucleus,Cerebellum,76,67,163,111,66,284,0,...,16,37,43,2,34,27,3,3,0,31


#### Indications:

Previously, we added the dataset. In which, the first three columns represent brain regions with name and abbreviation. Others represent mice group by drugs as **MDMA**, **Ketamine**, **Fluoxetine**, ...

There are **64 mice** in total, and each mouse has a value for each brain region. The values represent the number of cells expressing c-Fos, a marker of **neuronal activity**. Plus, there are **315 brain regions** in the dataset.

### Statistical Models
---

We will use a **Hierarchical Bayesian Model** to analyze the dataset. The model will allow us to account for the hierarchical structure of the data, where measurements are nested within brain regions and mice. Models are: **Poisson**, **Horseshoe** and **Zero-Inflated Poisson (ZIP)**.

For the moment, we will use a part of the dataset, specifically the first 20 brain regions and 2 groups of mice.

In [3]:
dataset1 = pd.read_excel('data/dataset_neuroscience_1.xlsx')

dataset1

Unnamed: 0,abbreviation,region name,brain area,A-SSRI1 count,A-SSRI2 count,A-SSRI3 count,A-SSRI4 count,A-SSRI5 count,A-SSRI6 count,A-SSRI7 count,A-SSRI8 count,C-SSRI1 count,C-SSRI2 count,C-SSRI3 count,C-SSRI4 count,C-SSRI5 count,C-SSRI6 count,C-SSRI7 count,C-SSRI8 count
0,FRP,Frontal pole cerebral cortex,Cortex,4297,2320,1873,3262,2331,1737,6635,1847,775,1302,833,1647,1570,1614,896,1091
1,ILA,Infralimbic area,Cortex,5028,11432,14109,13773,13181,20988,7812,9091,3352,6365,7511,5790,6080,6828,8769,6725
2,ORBl,Orbital area lateral part,Cortex,16644,37202,28689,25857,23043,15582,30958,17778,9138,9322,20219,13593,11475,24451,19115,10781
3,ORBm,Orbital area medial part,Cortex,6143,12661,17336,13613,16076,16278,14271,9314,4652,8659,7595,8839,8097,11962,15415,6527
4,ORBvl,Orbital area ventrolateral part,Cortex,11682,24502,36990,24050,20598,16528,28842,10982,13188,13533,23090,17166,14770,22438,24899,15109
5,AId,Agranular insular area dorsal part,Cortex,14605,15198,11975,12488,28418,14536,13091,20905,3899,5779,6045,10318,6164,11008,3554,3629
6,AIp,Agranular insular area posterior part,Cortex,16409,9414,7116,13154,24693,17012,7253,18837,3229,6060,3509,6762,6123,5268,3188,3601
7,AIv,Agranular insular area ventral part,Cortex,9821,15879,11689,15838,19235,12306,12327,14827,7269,7130,8873,9072,7413,9282,5129,3866
8,RSPagl,Retrosplenial area lateral agranular part,Cortex,15133,24749,25622,30641,13331,25114,13218,16085,11922,19108,12079,12608,13028,13841,15241,13118
9,RSPd,Retrosplenial area dorsal part,Cortex,18329,25369,23375,35179,16741,25149,13698,18772,12902,21235,10266,13591,16424,11048,23015,14551


We will prepare the data for the model.

Now, we will build Poisson model using PyMC.

Here, there are some visulization of this model:

\begin{gather*}
y_i \sim Poisson(\lambda_i)\\
log(\lambda_i) = E_i + \gamma_i\\
\theta_{rg} \sim Normal(5, 2)\\
\tau_{rg} \sim HalfNormal(log(1.05))\\
\gamma_i \sim Normal(\theta_{r[i]g[i]}, \tau_{r[i]g[i]})\\
\end{gather*}

In [None]:
with pm.Model() as poisson_model:



Now, we will build Horseshoe model using PyMC.

Here, there are some visulization of this model:

\begin{gather*}
y_i \sim Poisson(\lambda_i)\\
log(\lambda_i) = E_i + \gamma_i\\
\theta_{rg} \sim Normal(5, 2)\\
\tau_{rg} \sim HalfNormal(log(1.05))\\
\kappa_i \sim HalfNormal(1)\\
\gamma_i \sim Normal(\theta_{r[i]g[i]}, \kappa_i\times\tau_{r[i]g[i]})\\
\end{gather*}

In [None]:
with pm.Model() as horseshoe_model:


Now, we will build Zero-inflated Poisson model using PyMC.

Here, there are some visulization of this model:

\begin{gather*}
y_i \sim ZIPoisson(\lambda_i,\pi)\\
log(\lambda_i) = E_i + \gamma_i\\
\pi \sim Beta(1,5)\\
\theta_{rg} \sim Normal(5, 2)\\
\tau_{rg} \sim HalfNormal(log(1.05))\\
\gamma_i \sim Normal(\theta_{r[i]g[i]}, \tau_{r[i]g[i]})\\
\end{gather*}

In [None]:
with pm.Model as zeroinflatedpoisson_model:
