# Tutorial 04: Indirect Connectivity and Influence

**Author:** Alexander Bates  
**Date:** 2025-12-15

## Introduction

This tutorial introduces the **influence metric**, a measure of indirect connectivity developed and used in the [BANC paper](https://www.biorxiv.org/content/10.1101/2024.12.28.630584v1) (Bates et al., 2025).

### What is Influence?

While direct synaptic connections tell us which neurons are connected, they don't capture the full picture of how signals propagate through neural circuits. The influence metric quantifies how strongly a neuron or group of neurons can affect downstream targets through both direct and indirect pathways.

### How It Works

The InfluenceCalculator package implements a linear dynamical model of neural signal propagation:

**Model equation:** τ dr(t)/dt = (W - I)r(t) + s(t)

Where:
- **r(t)** = neural activity vector
- **W** = connectivity matrix (scaled by synapse counts)
- **s(t)** = stimulation input to seed neurons
- **τ** = time constant

At steady state, the influence score equals:

**r∞ = -(W̃ - I)⁻¹s**

Where W̃ is rescaled to ensure network stability. Results are log-transformed with a constant (+24) to produce "adjusted influence" scores above zero.

### Key Advantages

1. **Captures indirect effects**: Quantifies multi-synaptic pathways
2. **Accounts for network structure**: Considers convergent and divergent connections
3. **Computationally efficient**: Uses sparse matrix decomposition with caching for repeated calculations
4. **Biologically validated**: Correlates with optogenetic activation experiments

**Currently working with dataset:** banc_746

# Core Tutorial

## Setup and Load Data

In [1]:
# Import required packages
import pandas as pd
import numpy as np
import pyarrow.feather as feather
import gcsfs
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import umap
from scipy.cluster.hierarchy import linkage, dendrogram
from InfluenceCalculator import InfluenceCalculator
from joblib import Parallel, delayed
import multiprocessing
import warnings
warnings.filterwarnings('ignore')

# Set up parallelization
# Use single core to avoid pickling issues in notebook execution
n_cores = 1  # Set to max(1, multiprocessing.cpu_count() - 1) for parallel
print(f"✓ All packages imported successfully")
print(f"Using {n_cores} cores for parallel processing")

✓ All packages imported successfully
Using 1 cores for parallel processing


In [2]:
# Import common packages and helper functions
import sys
sys.path.insert(0, '.')
from utils import *

✓ Packages loaded successfully


In [3]:
# Environment detection and Colab setup (auto-configured)
try:
    import google.colab
    IN_COLAB = True
    
    # Colab setup
    
    # Authenticate
    from google.colab import auth
    auth.authenticate_user()
    print("✓ Authenticated with Google Cloud")
    
    # Download utils.py
    import urllib.request, os
    HELPER_URL = "https://raw.githubusercontent.com/sjcabs/fly_connectome_data_tutorial/main/python/utils.py"
    if not os.path.exists("utils.py"):
        urllib.request.urlretrieve(HELPER_URL, "setup_helpers.py")
    
    print("✓ Colab environment ready\n")
except ImportError:
    IN_COLAB = False
    # Local environment - no output needed
    pass

In [4]:
# Configuration
DATASET = "banc_746"
DATASET_ID = "banc_746_id"

# Subset selection
DATA_PATH = "gs://sjcabs_2025_data"
USE_GCS = DATA_PATH.startswith("gs://")

# Setup image output directory
import os
IMG_DIR = "images/tutorial_04"
os.makedirs(IMG_DIR, exist_ok=True)

print(f"Working with dataset: {DATASET}")
print(f"Data location: {DATA_PATH}")


Working with dataset: banc_746
Data location: gs://sjcabs_2025_data


In [5]:
# Setup GCS access if needed
if USE_GCS:
    gcs = gcsfs.GCSFileSystem(token='google_default')
    print("✓ GCS filesystem initialized")
else:
    gcs = None
    print("Using local filesystem")

✓ GCS filesystem initialized


In [6]:
# Load metadata
meta_path = construct_path(DATA_PATH, DATASET, "meta")
print(f"Loading metadata from: {meta_path}")
meta = read_feather_gcs(meta_path, gcs_fs=gcs)
print(f"✓ Loaded {len(meta):,} neurons")
print(f"\nMetadata columns: {list(meta.columns)}")
meta.head(3)

Loading metadata from: gs://sjcabs_2025_data/banc/banc_746_meta.feather


✓ Loaded 168,791 rows
✓ Loaded 168,791 neurons

Metadata columns: ['banc_746_id', 'supervoxel_id', 'region', 'side', 'hemilineage', 'nerve', 'flow', 'super_class', 'cell_class', 'cell_sub_class', 'cell_type', 'neurotransmitter_predicted', 'neurotransmitter_score', 'cell_function', 'cell_function_detailed', 'body_part_sensory', 'body_part_effector', 'status']


Unnamed: 0,banc_746_id,supervoxel_id,region,side,hemilineage,nerve,flow,super_class,cell_class,cell_sub_class,cell_type,neurotransmitter_predicted,neurotransmitter_score,cell_function,cell_function_detailed,body_part_sensory,body_part_effector,status
0,720575941569192238,74803281603754231,central_brain,right,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.7534,,,,,
1,720575941574697871,74873512908765054,central_brain,right,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.7976,,,,,
2,720575941652939029,77477362601861709,central_brain,left,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",dopamine,0.5825,,,,,TRACING_ISSUE_2


In [7]:
# Load edgelist
edgelist_path = construct_path(DATA_PATH, DATASET, "edgelist_simple")
print(f"Loading edgelist from: {edgelist_path}")
edgelist_simple = read_feather_gcs(edgelist_path, gcs_fs=gcs)
print(f"✓ Loaded {len(edgelist_simple):,} connections")
print(f"\nEdgelist columns: {list(edgelist_simple.columns)}")
edgelist_simple.head(3)

Loading edgelist from: gs://sjcabs_2025_data/banc/banc_746_simple_edgelist.feather


✓ Loaded 113,981,973 rows
✓ Loaded 113,981,973 connections

Edgelist columns: ['pre', 'post', 'count', 'norm', 'total_input']


Unnamed: 0,pre,post,count,norm,total_input
0,720575941509220642,720575941277394247,1,1.0,1
1,720575941526837604,720575940420901192,1,1.0,1
2,720575941508750721,720575941576493706,1,0.5,2


## Filter Strong Connections

To speed up influence calculations, we filter out weak connections (fewer than 5 synapses):

In [8]:
# Filter for connections with at least 5 synapses
edgelist_filtered = edgelist_simple[edgelist_simple['count'] >= 5].copy()

print(f"Original connections: {len(edgelist_simple):,}")
print(f"After filtering (≥5 synapses): {len(edgelist_filtered):,}")
print(f"Retained: {100 * len(edgelist_filtered) / len(edgelist_simple):.1f}%")

Original connections: 113,981,973
After filtering (≥5 synapses): 1,953,550
Retained: 1.7%


## Example: Sensory Influence on Dopaminergic Neurons

Let's examine how sensory neurons influence mushroom body dopaminergic neurons. This is biologically relevant because:

- Dopaminergic neurons provide **teaching signals** for associative memory
- They are hypothesised to receive unconditioned sensory information
- **PAM** dopamine neurons are involved in appetitive (reward) learning
- **PPL1** dopamine neurons are involved in aversive (punishment) learning

### Define Source and Target Neurons

In [9]:
# Source: All sensory neurons (afferent flow)
sensory_neurons = meta[meta['flow'] == 'afferent'][['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates()

print(f"Found {len(sensory_neurons):,} sensory neurons")

# Get unique sensory sub-classes
sensory_sub_classes = sorted([x for x in sensory_neurons['cell_sub_class'].unique() if x is not None])

print(f"\nSensory sub-classes (n={len(sensory_sub_classes)}):")
print(", ".join(sensory_sub_classes[:10]))

# Target: All mushroom body dopaminergic neurons
mb_dopamine_neurons = meta[
    meta['cell_class'] == 'mushroom_body_dopaminergic_neuron'
][['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates()

print(f"\nFound {len(mb_dopamine_neurons):,} mushroom body dopamine neurons")

# Get unique MB dopamine types
mb_da_types = sorted(mb_dopamine_neurons['cell_type'].unique())

print(f"MB dopamine types (n={len(mb_da_types)}):")
print(", ".join(mb_da_types[:10]))

Found 15,462 sensory neurons

Sensory sub-classes (n=109):
abdomen_multidendritic_neuron, abdomen_orphan_neuron, abdomen_oxygenation_neuron, abdomen_strand_neuron, abdominal_ppk_neuron, abdominal_terminalia_bristle, abdominal_wall_multidendritic_neuron, antenna_bristle_neuron, antenna_campaniform_sensillum_neuron, antenna_hygrosensory_receptor_neuron

Found 255 mushroom body dopamine neurons
MB dopamine types (n=35):
PAM01, PAM01_b, PAM02, PAM03, PAM04, PAM04_a, PAM05, PAM06, PAM06_b, PAM07


### Set Up Influence Calculator

In [10]:
print("Initializing influence calculator...")
print("This may take a few minutes for large networks...\n")

# Prepare data for InfluenceCalculator
# The package expects a SQLite database with:
# - 'edgelist_simple' table with columns: pre, post, count, norm, post_count
# - 'meta' table with column: root_id (plus any other metadata)

# Check edgelist column names and rename if needed
edgelist_cols = list(edgelist_filtered.columns)
print(f"Edgelist columns: {', '.join(edgelist_cols)}")

if 'pre' in edgelist_cols and 'post' in edgelist_cols:
    edgelist_for_ic = edgelist_filtered.copy()
else:
    # Need to rename columns
    pre_col = f"pre_{DATASET_ID}"
    post_col = f"post_{DATASET_ID}"
    
    edgelist_for_ic = edgelist_filtered.rename(columns={
        pre_col: 'pre',
        post_col: 'post'
    })

# Add post_count column if not present (required by InfluenceCalculator)
if 'post_count' not in edgelist_for_ic.columns:
    edgelist_for_ic['post_count'] = edgelist_for_ic['count'] / edgelist_for_ic['norm']

# Prepare metadata with root_id column
meta_for_ic = meta.rename(columns={DATASET_ID: 'root_id'})

# Convert ID columns to string for SQLite compatibility
meta_for_ic['root_id'] = meta_for_ic['root_id'].astype(str)
edgelist_for_ic['pre'] = edgelist_for_ic['pre'].astype(str)
edgelist_for_ic['post'] = edgelist_for_ic['post'].astype(str)

print(f"\n✓ Data prepared for influence calculator")
print(f"  Edgelist: {len(edgelist_for_ic):,} connections")
print(f"  Metadata: {len(meta_for_ic):,} neurons\n")

# Create temporary SQLite database
import sqlite3
import tempfile

temp_db = tempfile.NamedTemporaryFile(suffix='.sqlite', delete=False)
temp_db_path = temp_db.name
temp_db.close()

print(f"Creating temporary SQLite database: {temp_db_path}")

conn = sqlite3.connect(temp_db_path)

# Write tables to database
print("Writing edgelist to database...")
edgelist_for_ic.to_sql('edgelist_simple', conn, if_exists='replace', index=False)

print("Writing metadata to database...")
meta_for_ic.to_sql('meta', conn, if_exists='replace', index=False)

conn.close()

print("✓ Database created successfully\n")

# Initialize the influence calculator
# This uses the InfluenceCalculator Python package
print("Initializing calculator (this may take several minutes)...")

ic_dataset = InfluenceCalculator(
    filename=temp_db_path,
    signed=False,
    count_thresh=5
)

print("\n✓ Influence calculator initialized")
print("Network ready for influence calculations")

Initializing influence calculator...
This may take a few minutes for large networks...

Edgelist columns: pre, post, count, norm, total_input

✓ Data prepared for influence calculator
  Edgelist: 1,953,550 connections
  Metadata: 168,791 neurons

Creating temporary SQLite database: /var/folders/dy/z4y74tc548b8w1526qf2m0p00000gn/T/tmpwmyy4rcq.sqlite
Writing edgelist to database...


Writing metadata to database...


✓ Database created successfully

Initializing calculator (this may take several minutes)...



✓ Influence calculator initialized
Network ready for influence calculations


### Calculate Influence Scores

Now we calculate influence scores from each sensory sub-class to all MB dopaminergic neurons:

In [11]:
print(f"Calculating influence scores for {len(sensory_sub_classes)} sensory sub-classes...")
print(f"Using {n_cores} cores for parallel processing\n")
print("Note: This will take time - influence calculations involve matrix operations on the full network\n")

# Get MB dopamine neuron IDs for filtering
mb_dopamine_ids = set(mb_dopamine_neurons['banc_746_id'].astype(str).values)

# Define function to calculate influence for one sensory sub-class
def calculate_influence_for_subclass(i, sensory_sub_class):
    """Calculate influence from one sensory sub-class to MB dopamine neurons."""
    # Get IDs for this sensory sub-class (as strings to match database)
    sensory_ids = sensory_neurons[
        sensory_neurons['cell_sub_class'] == sensory_sub_class
    ]['banc_746_id'].astype(str).tolist()
    
    # Skip if no neurons found
    if len(sensory_ids) == 0:
        return None
    
    # Calculate influence from this sensory sub-class
    influence_df = ic_dataset.calculate_influence(
        seed_ids=sensory_ids,
        silenced_neurons=[]
    )
    
    # Ensure id is string type
    influence_df['id'] = influence_df['id'].astype(str)
    
    # Find the influence score column (may have different names)
    influence_col = [col for col in influence_df.columns if 'Influence_score' in col][0]
    
    # Add adjusted influence (log-transform with offset, floor at 0)
    adjusted_inf = np.log(influence_df[influence_col]) + 24
    adjusted_inf[adjusted_inf < 0] = 0
    influence_df['adjusted_influence'] = adjusted_inf
    
    # Filter to MB dopamine neurons and join with metadata
    influence_scores = influence_df[
        influence_df['id'].isin(mb_dopamine_ids)
    ].merge(
        meta[['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates().assign(
            banc_746_id=lambda x: x['banc_746_id'].astype(str)
        ),
        left_on='id',
        right_on='banc_746_id',
        how='left'
    ).rename(columns={
        'cell_sub_class': 'target_class',
        'cell_type': 'target_type'
    })
    
    influence_scores['source'] = sensory_sub_class
    
    return influence_scores

# Run calculations in parallel with progress reporting
from tqdm.auto import tqdm

all_influence_scores_list = Parallel(n_jobs=n_cores, verbose=10)(
    delayed(calculate_influence_for_subclass)(i, sc) 
    for i, sc in enumerate(sensory_sub_classes)
)

# Remove None results (from empty sensory sub-classes)
all_influence_scores_list = [df for df in all_influence_scores_list if df is not None]


# Combine all results
all_influence_scores = pd.concat(all_influence_scores_list, ignore_index=True)

print(f"Total influence scores calculated: {len(all_influence_scores):,}")

# Show sample of results
print("\nSample of influence scores:")
display_cols = ['source', 'id', 'adjusted_influence', 'target_type']
# Find the influence column name
influence_col = [col for col in all_influence_scores.columns if 'Influence_score' in col]
if influence_col:
    display_cols.insert(2, influence_col[0])

print(all_influence_scores[display_cols].head(10).to_string())

Calculating influence scores for 109 sensory sub-classes...
Using 1 cores for parallel processing

Note: This will take time - influence calculations involve matrix operations on the full network



[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.9s


[Parallel(n_jobs=1)]: Done   4 tasks      | elapsed:    3.6s


[Parallel(n_jobs=1)]: Done   7 tasks      | elapsed:    6.5s


[Parallel(n_jobs=1)]: Done  12 tasks      | elapsed:   11.1s


[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:   15.7s


[Parallel(n_jobs=1)]: Done  24 tasks      | elapsed:   22.3s


[Parallel(n_jobs=1)]: Done  31 tasks      | elapsed:   28.8s


[Parallel(n_jobs=1)]: Done  40 tasks      | elapsed:   37.3s


[Parallel(n_jobs=1)]: Done  49 tasks      | elapsed:   45.7s


[Parallel(n_jobs=1)]: Done  60 tasks      | elapsed:   55.8s


[Parallel(n_jobs=1)]: Done  71 tasks      | elapsed:  1.1min


[Parallel(n_jobs=1)]: Done  84 tasks      | elapsed:  1.3min


[Parallel(n_jobs=1)]: Done  97 tasks      | elapsed:  1.5min


Total influence scores calculated: 15,478

Sample of influence scores:
                          source                  id  Influence_score_(unsigned)  adjusted_influence target_type
0  abdomen_multidendritic_neuron  720575941477857076                4.367597e-05           13.961287      PPL102
1  abdomen_multidendritic_neuron  720575941536157930                2.102865e-05           13.230375      PPL108
2  abdomen_multidendritic_neuron  720575941527558500                8.428382e-06           12.316094      PPL103
3  abdomen_multidendritic_neuron  720575941445894826                3.925197e-06           11.551906       PAM02
4  abdomen_multidendritic_neuron  720575941689127692                3.478212e-05           13.733593      PPL101
5  abdomen_multidendritic_neuron  720575941552246783                1.998037e-05           13.179240      PPL101
6  abdomen_multidendritic_neuron  720575941685802735                5.199819e-08            7.227943       PAM06
7  abdomen_multidendritic

[Parallel(n_jobs=1)]: Done 109 out of 109 | elapsed:  1.7min finished


### Aggregate by Cell Type

In [None]:
# Find the influence column name dynamically
influence_col = [col for col in all_influence_scores.columns if 'Influence_score' in col]
if not influence_col:
    raise ValueError("No Influence_score column found in results")
influence_col = influence_col[0]

# Aggregate influence scores by source and target cell type
# First join with meta to get source_class (cell_class) from source (cell_sub_class)
meta_source = meta[['cell_sub_class', 'cell_class']].drop_duplicates('cell_sub_class')
all_influence_scores_ct = all_influence_scores.merge(
    meta_source.rename(columns={'cell_sub_class': 'source', 'cell_class': 'source_class'}),
    on='source',
    how='left'
)

# Group by source_class and target_type
all_influence_scores_ct = all_influence_scores_ct.groupby(
    ['source_class', 'target_type']
).agg({
    influence_col: 'sum',
    'id': 'count'
}).reset_index().rename(columns={
    'id': 'n_targets',
    influence_col: 'influence'
})

# Recalculate adjusted_influence from summed influence (NOT sum of adjusted_influence!)
all_influence_scores_ct['adjusted_influence'] = np.log(all_influence_scores_ct['influence']) + 24
all_influence_scores_ct['adjusted_influence'] = all_influence_scores_ct['adjusted_influence'].replace([np.inf, -np.inf], 0)

# Filter out missing values
all_influence_scores_ct = all_influence_scores_ct[
    all_influence_scores_ct['target_type'].notna() & 
    all_influence_scores_ct['source_class'].notna()
]

print(f"Aggregated to {len(all_influence_scores_ct):,} source-target type pairs")

# Show top influences
print("\nTop 10 sensory → dopamine influences:")
print(all_influence_scores_ct.nlargest(10, 'adjusted_influence')[[
    'source_class', 'target_type', 'adjusted_influence', 'n_targets'
]].to_string())

## Visualisation: Influence Heatmap

Let's create an interactive heatmap showing influence from sensory sub-classes to dopamine neuron types:

In [None]:
# Create a matrix for heatmap
influence_matrix = all_influence_scores_ct.pivot(
    index='source_class',
    columns='target_type',
    values='adjusted_influence'
).fillna(0)

print(f"Heatmap matrix dimensions: {influence_matrix.shape[0]} x {influence_matrix.shape[1]}\n")

# Perform hierarchical clustering
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import pdist

# Cluster rows (sources)
row_linkage = linkage(pdist(influence_matrix.values, metric='euclidean'), method='ward')
row_order = leaves_list(row_linkage)

# Cluster columns (targets)
col_linkage = linkage(pdist(influence_matrix.T.values, metric='euclidean'), method='ward')
col_order = leaves_list(col_linkage)

# Reorder matrix
influence_matrix_ordered = influence_matrix.iloc[row_order, col_order]

# Create interactive heatmap with plotly
fig = go.Figure(data=go.Heatmap(
    z=influence_matrix_ordered.values,
    x=influence_matrix_ordered.columns,
    y=influence_matrix_ordered.index,
    colorscale='YlOrRd',
    hovertemplate='Source: %{y}<br>Target: %{x}<br>Adjusted Influence: %{z:.2f}<extra></extra>'
))

fig.update_layout(
    title='Sensory Influence on MB Dopaminergic Neurons',
    xaxis_title='Target: MB Dopamine Neuron Type',
    yaxis_title='Source: Sensory Cell Class',
    width=1200,
    height=1000,
    xaxis={'tickangle': -45, 'tickfont': {'size': 8}},
    yaxis={'tickfont': {'size': 8}}
)

# Save as HTML
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_heatmap.html")

print("✓ Interactive heatmap saved")
fig.show()

## Visualisation: UMAP of Influence Patterns

We can also visualise the influence patterns using UMAP, where each point is a dopaminergic neuron:

In [14]:
# Aggregate influence by individual neuron
all_influence_scores_n = all_influence_scores.groupby(
    ['source', 'id']
).agg({
    'Influence_score_(unsigned)': 'sum',
    'adjusted_influence': 'sum',
    'target_type': 'first',
    'target_class': 'first'
}).reset_index().rename(columns={
    'Influence_score_(unsigned)': 'influence'
})

all_influence_scores_n = all_influence_scores_n[
    all_influence_scores_n['id'].notna() & 
    all_influence_scores_n['source'].notna()
]

# Create matrix: rows = neurons, columns = sensory sub-classes
influence_matrix_umap = all_influence_scores_n.pivot(
    index='id',
    columns='source',
    values='adjusted_influence'
).fillna(0)

print(f"UMAP input matrix: {influence_matrix_umap.shape[0]} neurons x {influence_matrix_umap.shape[1]} sensory sub-classes\n")

# Run UMAP
import umap.umap_ as umap_lib

reducer = umap_lib.UMAP(n_neighbors=15, min_dist=0.1, random_state=42)
umap_result = reducer.fit_transform(influence_matrix_umap.values)

# Create data frame for plotting
umap_df = pd.DataFrame({
    'id': influence_matrix_umap.index,
    'UMAP1': umap_result[:, 0],
    'UMAP2': umap_result[:, 1]
}).merge(
    meta[['banc_746_id', 'cell_type', 'cell_sub_class', 'cell_class']].drop_duplicates(),
    left_on='id',
    right_on='banc_746_id'
)

# Plot by cell sub-class
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='cell_sub_class',
    title=f'UMAP of MB Dopamine Neurons by Sensory Influence Patterns (n = {len(umap_df)})',
    labels={'cell_sub_class': 'Cell Sub-Class'},
    width=1200,
    height=600,
    template='plotly_white'
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_umap_subclass.html")
fig.show()

# Plot by cell type
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='cell_type',
    title=f'UMAP of MB Dopamine Neurons by Sensory Influence Patterns (n = {len(umap_df)})',
    labels={'cell_type': 'Cell Type'},
    width=1200,
    height=600,
    template='plotly_white'
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_umap_type.html")
fig.show()

UMAP input matrix: 142 neurons x 109 sensory sub-classes



OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


## Key Insights

From this analysis, we can see:

1. **Diverse sensory influences**: MB dopaminergic neurons receive influence from many sensory modalities through both direct and indirect pathways
2. **Cell type specificity**: Different dopamine neuron types show distinct sensory influence patterns
3. **Indirect pathways**: Influence scores capture multi-synaptic signal propagation beyond direct connections

## Your Turn: New Challenge

Re-run Example 1 but switch from BANC to maleCNS data. What do you notice? You can then try a different population of neurons. Rather than dopamine neurons of the mushroom body, try looking at sensory influence onto mushroom body output neurons, MBONs (filter for cell types containing "MBON").

```python
# To work with a different dataset, change the DATASET variable at the top:
# DATASET = "malecns_09"
# DATASET_ID = "malecns_09_id"

# Then re-run the entire notebook to see how the results differ!
# Differences likely reflect differences in annotation between projects
```

# Extensions

Below are more involved analyses, with longer compute times. Working through these will show you how to think about sensory and effector influence together, plot a UMAP based on influence scores, and interpret the biology of our results.

## Extension 1: Specific olfactory channel influence onto pC1 neurons in BANC and maleCNS

pC1 neurons are a small cluster of sexually dimorphic, doublesex/fruitless-positive neurons in the Drosophila central brain that act as a hub for integrating social cues and controlling sex-specific internal state and behaviour. In the male literature they are often referred to as the P1 cluster; in both sexes they sit at the top of a hierarchy that gates courtship, aggression, and related states.

Since BANC is a female nervous system and maleCNS is a male one, we can directly compare information flow onto this sexually dimorphic type, and compare.

We are interested in seeing which antennal lobe glomeruli (olfactory and thermosensory), and which gustatory neuron cell sub classes influence pC1 neurons, in both data sets.

**Note:** This extension requires loading additional datasets and will significantly increase computation time. Implementing this extension in Python would require substantial additional code similar to the core tutorial but with multi-dataset handling.

## Extension 2: Abdominal neurons by effector control

Let's look at another example. Rather than calculating influence between sensors and a target population, let's define a source population and calculate influence to effector neurons, i.e. motor and endocrine neurons.

The abdominal neuromere is a little-studied region of the fly central nervous system. Let's see if we can break its neurons down into "functional modules" based on their possible divisions by motor control.

This would involve:
1. Loading abdominal subset data
2. Filtering to intrinsic neurons
3. Calculating influence to effector neurons
4. Creating heatmaps and UMAP visualisations
5. Performing hierarchical clustering to identify functional modules

**Note:** This extension requires substantial additional computation and subset data access. The implementation would mirror the approaches shown in the core tutorial.

# Summary

In this tutorial, you learned how to:

1. ✓ Calculate indirect connectivity using the influence metric
2. ✓ Set up and use the InfluenceCalculator package
3. ✓ Analyse influence from sensory neurons to dopaminergic neurons
4. ✓ Visualise influence patterns with heatmaps and UMAP
5. ✓ Interpret biological significance of influence scores

## Key Takeaways

- **Indirect connectivity matters**: The influence metric reveals how signals propagate through multi-synaptic pathways, capturing functional relationships that direct connectivity alone misses

- **Linear dynamical model**: The influence calculator uses a biologically-inspired model of neural signal propagation based on connectivity matrices and steady-state solutions

- **Strong connections dominate**: Filtering for strong synaptic connections (e.g., ≥5 synapses) dramatically reduces computational load whilst preserving the most functionally relevant pathways

- **Cell type aggregation reveals patterns**: Summing influence scores by cell type transforms neuron-level complexity into interpretable functional relationships between neural populations

- **Visualisation is essential**: Heatmaps reveal specific source-target relationships, whilst UMAP embeddings expose global patterns and functional groupings in high-dimensional influence data

- **Cross-dataset comparisons are powerful**: Analysing influence patterns across datasets (e.g., BANC vs maleCNS) reveals conserved circuit motifs and sex-specific differences in connectivity

- **Biological validation**: Influence patterns align with known biology (e.g., sensory neurons influence dopaminergic learning centres) whilst revealing novel pathways for experimental investigation

The influence metric provides a computationally efficient and biologically meaningful way to understand how information flows through neural circuits beyond direct synaptic connections.