# This Jupyter Notebook is about Bayesian Data Analysis for neuroscience data

## Introduction

This notebook is part of a 20-week internship project carried out at Ulster University.
The main goal of the project is to make advanced Bayesian statistical models more accessible
to experimental neuroscientists through user-friendly code, tutorials, and examples.

Specifically, this notebook focuses on applying existing Bayesian models to neuroscience datasets
using libraries in Python.

## Objectives

1. Apply the existing Bayesian models to a neuroscience dataset from scratch,
   documenting each step as if it were for a beginner user.

2. Design a simple and reproducible analysis pipeline using PyMC.

3. Produce clear, well-documented code that can later be integrated into
   an interactive tutorial or a web application.

## Tools and technologies

- Python (main programming language)
- PyMC (Bayesian modeling)
- NumPy, pandas, matplotlib (data manipulation and visualization)
- Jupyter Notebook (interactive documentation and prototyping)

## Author

- Mathis DA SILVA
- Ulster University Internship (July–December 2025)
- Supervisors: Dr. Cian O'Donnell & Dr. Conor Houghton

## References

- [Dataset from "Classification of psychedelics and psychoactive drugs based on brain-wide imaging of cellular c-Fos expression"](https://www.nature.com/articles/s41467-025-56850-6#Sec25)
- [Hierarchical Bayesian modeling of multi-region brain cell count data](https://elifesciences.org/reviewed-preprints/102391v1)
- [Statistical Rethinking 2023 PDF](https://civil.colorado.edu/~balajir/CVEN6833/bayes-resources/RM-StatRethink-Bayes.pdf)
- [Statistical Rethinking 2023 Videos](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus)


### Here, we can call libraries that we will use in this notebook.

In [2]:
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [3]:
dataset1 = pd.read_excel('data/dataset_neuroscience1.xlsx')

In [7]:
print("Dataset overview:\n")

# Overview the first few rows of the dataset 1
dataset1

Dataset overview:



Unnamed: 0,abbreviation,region name,brain area,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,6-F-DET1 count,6-F-DET2 count,6-F-DET3 count,...,MDMA3 count,MDMA4 count,PSI1 count,PSI2 count,PSI3 count,PSI4 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count
0,FRP,Frontal pole cerebral cortex,Cortex,9574,7781,17598,4425,2527,3366,2431,...,13021,14608,2980,9268,1655,6935,7404,4925,12521,10363
1,ILA,Infralimbic area,Cortex,12138,6742,28070,1685,5439,8905,3175,...,14024,15188,7100,4751,8543,10770,9665,8049,10853,2844
2,ORBl,Orbital area lateral part,Cortex,48129,45849,120147,28655,17575,23939,13940,...,67971,77775,17561,35397,18642,43822,56825,30618,58755,14705
3,ORBm,Orbital area medial part,Cortex,17225,8551,34163,6330,13641,10399,6992,...,24049,28737,7340,13098,7450,13737,13035,16101,14017,7855
4,ORBvl,Orbital area ventrolateral part,Cortex,32690,24460,58132,16015,15846,13711,9594,...,49838,52966,14416,19928,11629,24961,37775,26349,33593,10743
5,AId,Agranular insular area dorsal part,Cortex,27675,33674,142433,31908,20036,29356,20673,...,54151,56931,13135,32009,20910,40175,45924,16549,34318,11374
6,AIp,Agranular insular area posterior part,Cortex,14988,10315,35142,14553,12012,9296,18478,...,22751,23930,6332,13668,13472,23588,26104,10171,13467,8642
7,AIv,Agranular insular area ventral part,Cortex,11743,15781,62167,14611,10240,13773,9428,...,22962,24483,5457,13699,11405,19692,26017,13822,23559,4142
8,RSPagl,Retrosplenial area lateral agranular part,Cortex,26242,9762,36066,15133,18139,4558,14428,...,41924,56186,12104,31328,8918,28492,34630,36011,37675,10113
9,RSPd,Retrosplenial area dorsal part,Cortex,22295,9280,33625,11297,29553,6165,15245,...,59648,74917,12719,32300,8791,32183,35197,49432,47584,12537


#### Indications:

Previously, we added the first dataset. In which, the first three columns represent brain regions with name and abbreviation. Others represent mice group by drugs as **MDMA**, **Ketamine**, **Fluoxetine**, ...

There are **64 mice** in total, and each mouse has a value for each brain region. The values represent the number of cells expressing c-Fos, a marker of **neuronal activity**. Plus, there are **50 brain regions** in the dataset.

In [8]:
print("\nInformation about the dataset:\n")

# Overview the dataset information
dataset1.info()


Information about the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 35 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   abbreviation    50 non-null     object
 1   region name     50 non-null     object
 2   brain area      50 non-null     object
 3   5MEO1 count     50 non-null     int64 
 4   5MEO2 count     50 non-null     int64 
 5   5MEO3 count     50 non-null     int64 
 6   5MEO4 count     50 non-null     int64 
 7   6-F-DET1 count  50 non-null     int64 
 8   6-F-DET2 count  50 non-null     int64 
 9   6-F-DET3 count  50 non-null     int64 
 10  6-F-DET4 count  50 non-null     int64 
 11  A-SSRI1 count   50 non-null     int64 
 12  A-SSRI2 count   50 non-null     int64 
 13  A-SSRI3 count   50 non-null     int64 
 14  A-SSRI4 count   50 non-null     int64 
 15  C-SSRI1 count   50 non-null     int64 
 16  C-SSRI2 count   50 non-null     int64 
 17  C-SSRI3 count   50 non-

#### Indications:

In [9]:
print("\nDescriptive statistics of the dataset:\n")

# Overview the descriptive statistics of the dataset
dataset1.describe()


Descriptive statistics of the dataset:



Unnamed: 0,5MEO1 count,5MEO2 count,5MEO3 count,5MEO4 count,6-F-DET1 count,6-F-DET2 count,6-F-DET3 count,6-F-DET4 count,A-SSRI1 count,A-SSRI2 count,...,MDMA3 count,MDMA4 count,PSI1 count,PSI2 count,PSI3 count,PSI4 count,SAL1 count,SAL2 count,SAL3 count,SAL4 count
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,...,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,32260.58,27245.78,64637.18,26835.12,16542.32,10077.08,20575.66,31563.96,14331.62,18656.5,...,41281.8,54866.98,13252.94,28909.04,16101.54,37063.56,36387.28,29368.14,28735.36,9291.46
std,32854.375923,30143.129293,67481.077154,30384.234143,13950.05419,10488.117099,22014.720376,27589.03704,16071.453512,14135.442373,...,38672.784241,54588.476813,11396.828836,33232.622056,16442.003936,37167.635632,34715.000889,32155.479261,27972.883889,9041.723956
min,3984.0,4272.0,9322.0,1685.0,1594.0,1057.0,2431.0,6703.0,2480.0,2320.0,...,5521.0,8522.0,2205.0,3492.0,1168.0,5530.0,4495.0,1907.0,3526.0,488.0
25%,12285.0,7973.5,27207.5,8896.0,6408.25,3099.75,7410.0,11581.5,6400.25,9626.25,...,17461.25,23140.75,6289.25,10230.25,5457.5,13330.5,14242.0,10325.0,12386.75,3638.5
50%,21076.5,15332.0,34652.5,15574.0,12756.0,7272.0,14223.0,24508.0,9939.5,13570.0,...,25282.0,36165.0,8898.5,19335.5,11517.0,24256.0,26060.5,17203.5,21787.5,7089.5
75%,35592.5,32881.5,62554.75,28441.75,21023.0,12266.0,23630.25,42983.5,17394.0,25214.0,...,50489.75,63797.5,17024.75,31774.25,18202.75,41914.75,41507.0,33413.5,32549.5,11380.0
max,141503.0,135730.0,271662.0,141871.0,61364.0,50688.0,111361.0,137109.0,109963.0,76965.0,...,175480.0,230262.0,50761.0,194270.0,70699.0,167455.0,203464.0,190149.0,158173.0,46415.0


#### Indications: