
# RNA Seasonal Analysis Project  
## Cyclic Spline GAM Framework (IR vs IS)

This notebook describes the **conceptual workflow** for the RNA seasonal analysis.

The goal is to investigate:

1. Whether RNA expression exhibits annual cyclic structure  
2. Whether seasonal modulation differs between IR and IS immune states  

We focus on **gene-level modeling** using cyclic cubic spline GAMs.




# 1. Preprocessing Before Modeling




## 1.1 Construct Day-of-Year

Seasonality is modeled over the annual cycle.

For each sample:

$
doy = \text{day of year from Date}
$

This variable must range from 1 to 365 (or 366).

Reason:
We model annual periodicity. The spline must be cyclic across the calendar year.



## 1.2 Convert SubjectID to Categorical

SubjectID must be treated as a categorical variable.

Reason:

Sampling is uneven.  
Different subjects appear at different times of year.

Without adjusting for subject baseline differences,
seasonality can be falsely detected due to composition imbalance.



## 1.3 Gene Filtering (Required)

Before genome-wide modeling:

1. Remove genes with excessive missing values  
2. Remove near-zero variance genes  
3. Remove extremely zero-inflated genes  

Reason:

Low-information genes produce unstable spline fits and inflate noise.



# 3. Mathematical Model

Let:

$
y_{it}^{(g)}
$

denote expression of gene $g$ for subject $i$ at time $t$.

We define three models.



## Model M0 (No Seasonality (Generalized Linear Mixed Model))

$
y_{it}^{(g)} = \alpha_i + \varepsilon_{it}
$

- $\alpha_i$ = subject-specific baseline  
- $\varepsilon_{it}$ = Gaussian error  

Tests whether gene varies only by subject baseline.



## Model M1 (Common Seasonality)

$
y_{it}^{(g)} = \alpha_i + f(doy_t) + \varepsilon_{it}
$

- $f(doy)$ = cyclic cubic spline  

Tests whether gene exhibits annual cyclic modulation.



## Model M2 (IRIS-Specific Seasonality)

$
y_{it}^{(g)}= \alpha_i + f(doy_t) + g(doy_t) \cdot IRIS_i + \varepsilon_{it}
$

- $g(doy)$ = deviation spline for IS  

Tests whether seasonal pattern differs between IR and IS.



# 4. Model Comparison

Use AIC to compare models:

$
\Delta AIC_{season} = AIC(M0) - AIC(M1)
$

$
\Delta AIC_{interaction} = AIC(M1) - AIC(M2)
$

Interpretation:

- Positive $\Delta AIC_{season}$: evidence for seasonality  
- Positive $\Delta AIC_{interaction}$: evidence that IR and IS differ in seasonal shape



# 5. Required Packages

You may use:

```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.gam.api import GLMGam
from statsmodels.gam.smooth_basis import CyclicCubicSplines
import matplotlib.pyplot as plt
```



# 6. Tasks

### Task 1 – Preprocessing
- Construct day_of_year
- Perform gene filtering
- Prepare SubjectID fixed effects

### Task 2 – Genome-wide Scan
- Fit M0 and M1 for all genes
- Compute $\Delta AIC_{season}$
- Rank genes

### Task 3 – IR vs IS Analysis
Either:
- Fit interaction model M2  
OR  
- Fit group-specific models separately

### Task 4 – Visualization
- Plot distribution of seasonal strength
- Plot seasonal curves for top genes
- Compare IR vs IS curves

### Task 5 – Interpretation
Explain:
- Is RNA globally seasonal?
- Is seasonality sparse?
