# This Jupyter Notebook is about Bayesian Data Analysis for neuroscience data

## Introduction

This notebook is part of a 20-week internship project carried out at Ulster University.
The main goal of the project is to make advanced Bayesian statistical models more accessible
to experimental neuroscientists through user-friendly code, tutorials, and examples.

Specifically, this notebook focuses on applying existing Bayesian models to neuroscience datasets
using libraries in Python.

## Objectives

1. Apply the existing Bayesian models to a neuroscience dataset from scratch,
   documenting each step as if it were for a beginner user.

2. Design a simple and reproducible analysis pipeline using PyMC.

3. Produce clear, well-documented code that can later be integrated into
   an interactive tutorial or a web application.

## Tools and technologies

- Python (main programming language)
- PyMC (Bayesian modeling)
- NumPy, pandas, matplotlib (data manipulation and visualization)
- Jupyter Notebook (interactive documentation and prototyping)

## Author

- Mathis DA SILVA
- Ulster University Internship (July–December 2025)
- Supervisors: Dr. Cian O'Donnell & Dr. Conor Houghton

## References

- [Dataset from "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression"](https://www.nature.com/articles/s41467-025-56850-6#Sec25)
- [Hierarchical Bayesian modeling of multi-region brain cell count data](https://elifesciences.org/reviewed-preprints/102391v1)
- [Statistical Rethinking 2023 PDF](https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf)
- [Statistical Rethinking 2023 Videos](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus)


### Here, we can call libraries that we will use in this notebook.

In [1]:
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt
import math
import numpy as np
import seaborn as sns
import statistics as stats



In [25]:
dataset1 = pd.read_excel('data/dataset_neuroscience_1.xlsx')

In [26]:
print("Dataset overview:\n")

# Overview the first few rows of the dataset 1
dataset1

Dataset overview:



Unnamed: 0,abbreviation,region name,brain area,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,5MEO5 count,5MEO6 count,5MEO7 count,...,PSI7 count,PSI8 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count,SAL5 count,SAL6 count,SAL7 count,SAL8 count
0,,,,Female,Female,Female,Male,Male,Male,Unknown,...,Unknown,Unknown,Female,Female,Female,Male,Male,Male,Unknown,Unknown
1,FRP,Frontal pole cerebral cortex,Cortex,9574,7781,17598,4425,7428,8302,4288,...,3367,4342,7404,4925,12521,10363,4562,14383,789,6067
2,ILA,Infralimbic area,Cortex,12138,6742,28070,1685,15612,17191,6061,...,7591,5778,9665,8049,10853,2844,15747,15412,11667,21630
3,ORBl,Orbital area lateral part,Cortex,48129,45849,120147,28655,40438,54206,39938,...,8291,14603,56825,30618,58755,14705,26686,59049,5192,36019
4,ORBm,Orbital area medial part,Cortex,17225,8551,34163,6330,14908,23250,7993,...,4878,7177,13035,16101,14017,7855,14478,30999,10051,28545
5,ORBvl,Orbital area ventrolateral part,Cortex,32690,24460,58132,16015,24182,31926,15148,...,5081,9116,37775,26349,33593,10743,21789,37644,9320,23276
6,AId,Agranular insular area dorsal part,Cortex,27675,33674,142433,31908,29145,51636,39032,...,39032,35849,45924,16549,34318,11374,23513,34317,13888,33825
7,AIp,Agranular insular area posterior part,Cortex,14988,10315,35142,14553,11422,23320,23231,...,27630,18816,26104,10171,13467,8642,24600,20895,13778,32166
8,AIv,Agranular insular area ventral part,Cortex,11743,15781,62167,14611,13968,27256,20453,...,11620,9970,26017,13822,23559,4142,17378,19937,9188,38037
9,RSPagl,Retrosplenial area lateral agranular part,Cortex,26242,9762,36066,15133,11147,17832,7208,...,20570,24263,34630,36011,37675,10113,25258,13317,14085,22788


#### Indications:

Previously, we added the first dataset. In which, the first three columns represent brain regions with name and abbreviation. Others represent mice group by drugs as **MDMA**, **Ketamine**, **Fluoxetine**, ...

There are **64 mice** in total, and each mouse has a value for each brain region. The values represent the number of cells expressing c-Fos, a marker of **neuronal activity**. Plus, there are **50 brain regions** in the dataset.

In [27]:
print("\nInformation about the dataset:\n")

# Overview the dataset information
dataset1.info()


Information about the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 67 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   abbreviation    50 non-null     object
 1   region name     50 non-null     object
 2   brain area      50 non-null     object
 3   5MEO1 count     51 non-null     object
 4   5MEO2 count     51 non-null     object
 5   5MEO3 count     51 non-null     object
 6   5MEO4 count     51 non-null     object
 7   5MEO5 count     51 non-null     object
 8   5MEO6 count     51 non-null     object
 9   5MEO7 count     51 non-null     object
 10  5MEO8 count     51 non-null     object
 11  6-F-DET1 count  51 non-null     object
 12  6-F-DET2 count  51 non-null     object
 13  6-F-DET3 count  51 non-null     object
 14  6-F-DET4 count  51 non-null     object
 15  6-F-DET5 count  51 non-null     object
 16  6-F-DET6 count  51 non-null     object
 17  6-F-DET7 count  51 non-

#### Indications:

Here, we have an overview of the dataset (1) information.

In [28]:
print("\nDescriptive statistics of the dataset:\n")

# Overview the descriptive statistics of the dataset
dataset1.describe()


Descriptive statistics of the dataset:



Unnamed: 0,abbreviation,region name,brain area,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,5MEO5 count,5MEO6 count,5MEO7 count,...,PSI7 count,PSI8 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count,SAL5 count,SAL6 count,SAL7 count,SAL8 count
count,50,50,50,51,51,51,51,51,51,51,...,51,51,51,51,51,51,51,51,51,51
unique,50,50,1,51,51,51,51,51,51,51,...,50,51,51,51,51,51,51,51,51,51
top,FRP,Frontal pole cerebral cortex,Cortex,Female,Female,Female,Male,Male,Male,Unknown,...,7447,Unknown,Female,Female,Female,Male,Male,Male,Unknown,Unknown
freq,1,1,50,1,1,1,1,1,1,1,...,2,1,1,1,1,1,1,1,1,1


#### Indications:

### Poisson model
---



In [None]:
with pm.Model() as poisson_model:

    # Parameters for the Poisson model
    theta_rg = pm.Normal("Normal model",mu=5,sigma=2)
    tau_rg = pm.HalfNormal("HalfNormal model", sigma=math.log(1.05))





### Horseshoe model
---

In [None]:
with pm.Model() as horseshoe_model:

    # Parameters for the Horseshoe model
    theta_rg = pm.Normal("Normal model",mu = 5,sigma = 2)
    tau_rg = pm.HalfNormal("HalfNormal model", sigma = math.log(1.05))
    k_i = pm.HalfNormal("HalfNormal model", sigma = 1)

### Zero-inflated model
---

In [None]:
with pm.Model() as zero_inflated_model:

    # Parameters for Zero-inflated model
    theta_rg = pm.Normal("Normal model", mu = 5, sigma = 2)
    tau_rg = pm.HalfNormal("HalfNormal model", sigma = math.log(1.05))

