# This Jupyter Notebook is about Bayesian Data Analysis for neuroscience data

## Introduction

This notebook is part of a 20-week internship project carried out at Ulster University.
The main goal of the project is to make advanced Bayesian statistical models more accessible
to experimental neuroscientists through user-friendly code, tutorials, and examples.

Specifically, this notebook focuses on applying existing Bayesian models to neuroscience datasets
using libraries in Python.

## Objectives

1. Apply the existing Bayesian models to a neuroscience dataset from scratch,
   documenting each step as if it were for a beginner user.

2. Design a simple and reproducible analysis pipeline using PyMC.

3. Produce clear, well-documented code that can later be integrated into
   an interactive tutorial or a web application.

## Tools and technologies

- Python (main programming language)
- PyMC (Bayesian modeling)
- NumPy, pandas, matplotlib (data manipulation and visualization)
- Jupyter Notebook (interactive documentation and prototyping)

## Author

- Mathis DA SILVA
- Ulster University Internship (July–December 2025)
- Supervisors: Dr. Cian O'Donnell & Dr. Conor Houghton

## References

- [Dataset from "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression"](https://www.nature.com/articles/s41467-025-56850-6#Sec25)
- [Hierarchical Bayesian modeling of multi-region brain cell count data](https://elifesciences.org/reviewed-preprints/102391v1)
- [Statistical Rethinking 2023 PDF](https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf)
- [Statistical Rethinking 2023 Videos](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus)


### Here, we can call libraries that we will use in this notebook.

In [2]:
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt
import math
import numpy as np
import seaborn as sns
import statistics as stats

In [3]:
brain_regions = {
    "Cortex": ["FRP", "ILA", "ORBl", "ORBm", "ORBvl", "AId", "AIp", "AIv", "RSPagl", "RSPd", "RSPv", "PL", "VISa", "TEa", "PERI", "ECT", "PA", "BMA", "BLA", "LA", "EPv", "EPd", "CLA", "VISrl", "ACAv", "ACAd", "VISpor", "MOp", "MOs", "SSp-n", "SSp-bfd", "SSp-ll", "SSp-m", "SSp-ul", "SSp-tr", "SSp-un", "SSs", "GU", "VISC", "AUDd", "AUDp", "AUDpo", "AUDv", "VISal", "VISam", "VISl", "VISp", "VISpl", "VISpm", "VISli"],

    "Olfactory": ["TR", "PAA", "MOB", "COAa", "AOB", "AON", "TT", "DP", "PIR", "COAp", "NLOT"],

    "Hippo": ["ENTm", "CA1", "ENTl", "CA3", "DG", "FC", "IG", "CA2", "HATA", "ProS", "SUB", "PRE", "POST", "PAR", "APr"],

    "StriatumPallidum": ["GPe", "GPi", "SI", "MA", "BST", "NDB", "TRS", "BAC", "MEA", "MS", "IA", "LSv", "BA", "CEA", "CP", "FS", "OT", "LSc", "ACB", "SF", "SH", "AAA", "LSr"],

    "Thalamus": ["MD", "CM", "SMT", "PR", "PT", "RE", "Xi", "RH", "PCN", "IGL", "PF", "PIL", "RT", "IMD", "IntG", "LGv", "SubG", "MH", "LH", "CL", "LD", "PVT", "IAM", "IAD", "VAL", "VM", "VPL", "VPLpc", "VPM", "PoT", "SPFm", "SPFp", "SPA", "VPMpc", "MG", "PP", "AM", "AV", "SGN", "AD", "PO", "LP", "LGd", "POL"],

    "Hypothalamus": ["VMH", "MM", "SUM", "TMd", "TMv", "MPN", "PMd", "PMv", "PVHd", "PH", "LM", "LPO", "PST", "PSTN", "PeF", "RCH", "STN", "TU", "ZI", "ME", "LHA", "AHN", "PS", "VMPO", "VLPO", "SO", "ASO", "PVH", "PVa", "PVi", "ADP", "AVP", "AVPV", "DMH", "ARH", "MPO", "OV", "PD", "PVp", "PVpo", "SBPV", "SCH", "MEPO", "SFO"],

    "MidHindMedulla": ["SFO", "NTB", "NTS", "SPVC", "SPVI", "VII", "Pa5", "VI", "ACVII", "ECU", "SPVO", "GR", "LDT", "VCO", "DCO", "AP", "SLD", "SLC", "RPO", "PRNr", "NI", "AMB", "LC", "CS", "I5", "CU", "DMX", "MDRNv", "ICB", "RO", "RPA", "RM", "y", "XII", "x", "SUV", "SPIV", "MV", "LAV", "PPY", "PRP", "NR", "PGRNl", "PGRNd", "PAS", "PARN", "MDRNd", "MDRN", "MARN", "LRN", "LIN", "ISN", "IRN", "IO", "GRN", "PC5", "P5", "V", "MA3", "III", "RN", "CUN", "PPT", "OP", "Acs5", "NOT", "MPT", "APN", "PAG", "EW", "SCm", "RR", "PN", "VTA", "SNr", "SCO", "MEV", "PBG", "SAG", "NB", "IC", "SCs", "MRN", "IV", "NPC", "VTN", "TRN", "SUT", "SG", "Pa4", "PG", "PCG", "PDTg", "DTN", "B", "SOC", "PB", "PSV", "PRNc", "DR", "NLL", "AT", "DT", "MT", "SNc", "LT", "IF", "IPN", "RL", "CLI", "PPN"],

    "Cerebellum": ["AN", "IP", "FN", "FL", "PFL", "COPY", "PRM", "SIM", "CUL", "UVU", "PYR", "FOTU", "DEC", "CENT", "LING", "DN", "NOD", "VeCB"]
}

In [4]:
dataset1 = pd.read_excel('data/dataset_neuroscience_1.xlsx')

In [5]:
print("Dataset overview:\n")

# Overview the first few rows of the dataset 1
dataset1

Dataset overview:



Unnamed: 0,abbreviation,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,5MEO5 count,5MEO6 count,5MEO7 count,5MEO8 count,6-F-DET1 count,...,PSI7 count,PSI8 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count,SAL5 count,SAL6 count,SAL7 count,SAL8 count
0,FRP,9574,7781,17598,4425,7428,8302,4288,5278,2527,...,3367,4342,7404,4925,12521,10363,4562,14383,789,6067
1,ILA,12138,6742,28070,1685,15612,17191,6061,7449,5439,...,7591,5778,9665,8049,10853,2844,15747,15412,11667,21630
2,ORBl,48129,45849,120147,28655,40438,54206,39938,24600,17575,...,8291,14603,56825,30618,58755,14705,26686,59049,5192,36019
3,ORBm,17225,8551,34163,6330,14908,23250,7993,10001,13641,...,4878,7177,13035,16101,14017,7855,14478,30999,10051,28545
4,ORBvl,32690,24460,58132,16015,24182,31926,15148,14591,15846,...,5081,9116,37775,26349,33593,10743,21789,37644,9320,23276
5,AId,27675,33674,142433,31908,29145,51636,39032,31133,20036,...,39032,35849,45924,16549,34318,11374,23513,34317,13888,33825
6,AIp,14988,10315,35142,14553,11422,23320,23231,21736,12012,...,27630,18816,26104,10171,13467,8642,24600,20895,13778,32166
7,AIv,11743,15781,62167,14611,13968,27256,20453,13995,10240,...,11620,9970,26017,13822,23559,4142,17378,19937,9188,38037
8,RSPagl,26242,9762,36066,15133,11147,17832,7208,21735,18139,...,20570,24263,34630,36011,37675,10113,25258,13317,14085,22788
9,RSPd,22295,9280,33625,11297,8479,18555,11241,17299,29553,...,14432,24744,35197,49432,47584,12537,24791,9895,11695,19132


#### Indications:

Previously, we added the first dataset. In which, the first three columns represent brain regions with name and abbreviation. Others represent mice group by drugs as **MDMA**, **Ketamine**, **Fluoxetine**, ...

There are **64 mice** in total, and each mouse has a value for each brain region. The values represent the number of cells expressing c-Fos, a marker of **neuronal activity**. Plus, there are **50 brain regions** in the dataset.

In [6]:
print("\nInformation about the dataset:\n")

# Overview the dataset information
dataset1.info()


Information about the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 65 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   abbreviation    51 non-null     object
 1   5MEO1 count     51 non-null     object
 2   5MEO2 count     51 non-null     object
 3   5MEO3 count     51 non-null     object
 4   5MEO4 count     51 non-null     object
 5   5MEO5 count     51 non-null     object
 6   5MEO6 count     51 non-null     object
 7   5MEO7 count     51 non-null     object
 8   5MEO8 count     51 non-null     object
 9   6-F-DET1 count  51 non-null     object
 10  6-F-DET2 count  51 non-null     object
 11  6-F-DET3 count  51 non-null     object
 12  6-F-DET4 count  51 non-null     object
 13  6-F-DET5 count  51 non-null     object
 14  6-F-DET6 count  51 non-null     object
 15  6-F-DET7 count  51 non-null     object
 16  6-F-DET8 count  51 non-null     object
 17  A-SSRI1 count   51 non-

#### Indications:

Here, we have an overview of the dataset (1) information.

In [7]:
print("\nDescriptive statistics of the dataset:\n")

# Overview the descriptive statistics of the dataset
dataset1.describe()


Descriptive statistics of the dataset:



Unnamed: 0,abbreviation,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,5MEO5 count,5MEO6 count,5MEO7 count,5MEO8 count,6-F-DET1 count,...,PSI7 count,PSI8 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count,SAL5 count,SAL6 count,SAL7 count,SAL8 count
count,51,51,51,51,51,51,51,51,51,51,...,51,51,51,51,51,51,51,51,51,51
unique,51,51,51,51,51,51,51,51,51,51,...,50,51,51,51,51,51,51,51,51,51
top,FRP,9574,7781,17598,4425,7428,8302,4288,5278,2527,...,7447,4342,7404,4925,12521,10363,4562,14383,789,6067
freq,1,1,1,1,1,1,1,1,1,1,...,2,1,1,1,1,1,1,1,1,1


#### Indications:

### Poisson model
---
**Description:** The Poisson model is a statistical model used to describe the distribution of count data, particularly when the counts are rare or infrequent. It assumes that the number of events occurring in a fixed interval of time or space follows a Poisson distribution.


In [None]:
  with pm.Model() as poisson_model:

    # Parameters for the Poisson model
    poisson_theta_rg = pm.Normal("Normal model",mu=5,sigma=2)

    poisson_tau_rg = pm.HalfNormal("HalfNormal model", sigma=math.log(1.05))

    poisson_E_i =

    poisson_gamma_i =

    poisson_lambda_i = pm.Deterministic("Deterministic model", math.exp(poisson_E_i + poisson_gamma_i))

    poisson_y_i = pm.Poisson("Poisson model", mu = poisson_lambda_i)

### Horseshoe model
---
**Description:** The Horseshoe model is a Bayesian hierarchical model that is particularly useful for high-dimensional data with many predictors. It is designed to handle situations where most predictors have little effect, while a few have large effects. The Horseshoe prior allows for sparsity in the model, making it effective for variable selection.

In [None]:
with pm.Model() as horseshoe_model:

    # Parameters for the Horseshoe model

    horseshoe_theta_rg = pm.Normal("Normal model", mu = 5,sigma = 2)

    horseshoe_tau_rg = pm.HalfNormal("HalfNormal model", sigma = math.log(1.05))

    kappa_i = pm.HalfNormal("HalfNormal model", sigma = 1)

    horseshoe_E_i =

    horseshoe_gamma_i =

    horseshoe_lambda_i = pm.Deterministic("Deterministic model", math.exp(horseshoe_E_i + horseshoe_gamma_i))

    horseshoe_y_i = pm.Poisson("Poisson model", mu = horseshoe_lambda_i)

### Zero-inflated model
---
**Description:** The Zero-inflated model is a statistical model used to handle count data that has an excess of zero counts. It combines a standard count model (like Poisson or Negative Binomial) with a separate process that generates excess zeros. This model is useful when the data has more zeros than what would be expected from the count distribution alone.

In [None]:
with pm.Model() as zero_inflated_model:

    # Parameters for Zero-inflated model
    zero_inflated_theta_rg = pm.Normal("Normal model", mu = 5, sigma = 2)

    zero_inflated_tau_rg = pm.HalfNormal("HalfNormal model", sigma = math.log(1.05))

    pi = pm.Beta("Beta model", alpha = 1, beta = 5)

    zero_inflated_E_i =

    zero_inflated_gamma_i =

    zero_inflated_lambda_i = pm.Deterministic("Deterministic model", math.exp(zero_inflated_E_i + zero_inflated_gamma_i))

    zero_inflated_y_i = pm.ZeroInflatedPoisson("ZeroInflatedPoisson model", mu = zero_inflated_lambda_i, psi = pi)

