# Input File Requirements 

Input file should be an Excel. The table consists of three parts: Symbol column, data columns, and label column.

## 1. Column Structure
 
### **Symbol Column**
- **Column Name**: `Symbol`  
- **Content**: Gene identifiers.  
- **Position**: First column.  
 
### **Data Columns (Dual Time Series)**
- **Column Naming Rule**: `{time}_{dup}_{series}`  
  - `time`: Time point (e.g., 0, 4, 8).  
  - `dup`: Replication number (e.g., 0, 1, 2).  
  - `series`: Time series identifier (0 or 1).  
- **Sorting Rules**:  
  1. **Group by `dup`**: Columns with the same `dup` are grouped together.  
  2. **Sort by `time` within groups**: Within each `dup` group, columns are sorted by `time` in ascending order.  
  3. **Sort groups by `dup`**: Groups are ordered by `dup` (e.g., `dup=0` first, then `dup=1`).  
  4. **All `series=0` columns first, then `series=1` columns**:  
     - First, list **all `series=0` columns** (ordered by `dup` and `time`).  
     - Then, list **all `series=1` columns** (ordered by `dup` and `time`).  
- **Example Column Names**:  
  - Series 1: `0_0_0`, `4_0_0`, `8_0_0`, `0_1_0`, `4_1_0`, `8_1_0`  
  - Series 2: `0_0_1`, `4_0_1`, `8_0_1`, `0_1_1`, `4_1_1`, `8_1_1`  
 
### **Label Columns (4 columns)**
- **Column Names**:  
  - `period_diff`: Indicates **significant difference in period** (1 = significant, 0 = not significant).  
  - `amp_diff`: Indicates **significant difference in amplitude** (1 = significant, 0 = not significant).  
  - `phase_diff`: Indicates **significant difference in phase** (1 = significant, 0 = not significant).  
  - `baseline_diff`: Indicates **significant difference in baseline** (1 = significant, 0 = not significant).  
- **Content**: Binary labels (0 or 1).  
- **Position**: Last four columns.  
- **Note**: If uncertain, set all labels to 1.  
 
---
 
## 2. Example Table Structure
| Symbol | 0_0_0 | 4_0_0 | 8_0_0 | 0_1_0 | 4_1_0 | 8_1_0 | 0_0_1 | 4_0_1 | 8_0_1 | 0_1_1 | 4_1_1 | 8_1_1 | period_diff | amp_diff | phase_diff | baseline_diff |
|--------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------------|----------|------------|---------------|
| GENE1  | 7.342 | 4.928 | 3.213 | 8.029 | 5.750 | 4.837 | 7.401 | 4.950 | 3.250 | 8.102 | 5.780 | 4.850 | 1           | 0        | 1          | 0             |
| GENE2  | 8.029 | 5.750 | 3.421 | 8.258 | 5.849 | 3.729 | 8.102 | 5.780 | 3.450 | 8.301 | 5.880 | 3.750 | 0           | 1        | 0          | 1             |
 
---
 
## 3. Data Filling Instructions
1. **Symbol Column**: Fill in unique gene identifiers.  
2. **Data Columns**:  
   - Populate all `{time}_{dup}_{series}` columns in the correct order.  
3. **Label Columns**:  
   - For each gene, assign 0 or 1 for each difference category.  
   - If uncertain, set all labels to 1. 

In [None]:
from utils.ts import multi_convert_to_ts

# 使用示例 ---------------------------------------------------
# 假设输入文件格式：
# Symbol,0_1,0_2,0_3,3_1,3_2,3_3,...,21_3,label
# gene1,26.39,23.50,22.03,28.25,19.15,22.67,...,23.29,1
# gene2,66.86,63.80,53.70,60.53,60.44,50.98,...,66.91,0

multi_convert_to_ts(
    input_csv='../example_data/example_data_t2.csv',
    output_ts='../example_data/t2/test.ts',
    problem_name='example',
    label_cols=['amp', 'phase', 'baseline', 'period']
)

In [None]:
#读入T5-small进circallm，然后以后用这个circallm进行预训练
from argparse import Namespace
from models.circaLLM import CIRCALLM
config_dict = {
    "task_name": "diffrhythm", 
    "model_name": "CIRCALLM", 
    "transformer_type": "encoder_only", 
    "freeze_embedder":False,
    "freeze_encoder":False,
    "freeze_head":False,
    "learning_rate":1e-6,
    "num_epochs":30,
    "n_channels": 2,
    "num_class": 4,
    'reduction': 'mean',
    "type":'real',
    "d_model": None, 
    "seq_len": 72,
    'enable_gradient_checkpointing': False,
    "enable_FAN":True,
    "enable_FAN_gate":True,
    "patch_len": 6, 
    "patch_stride_len": 6, 
    "device": "cpu", 
    "transformer_backbone": "google/flan-t5-small", 
    "model_kwargs": {},
    "t5_config": {
        "architectures": ["T5ForConditionalGeneration"],
        "d_ff": 1024,
        "d_kv": 64,
        "d_model": 512,
        "decoder_start_token_id": 0,
        "dropout_rate": 0.1,
        "eos_token_id": 1,
        "feed_forward_proj": "gelu",
        "initializer_factor": 1.0,
        "is_encoder_decoder": True,
        "layer_norm_epsilon": 1e-06,
        "model_type": "t5",
        "n_positions": 72,
        "num_decoder_layers": 6,
        "num_heads": 8,
        "num_layers": 6,
        "output_past": True,
        "pad_token_id": 0,
        "relative_attention_max_distance": 128,
        "relative_attention_num_buckets": 32,
        "tie_word_embeddings": False,
        "use_cache": True,
        "vocab_size": 32128
    }
}

# 将字典转换为 Namespace 对象
config = Namespace(**config_dict)

model =CIRCALLM(config)
print(model)

In [None]:
from tqdm import tqdm
import numpy as np
def train_epoch(model,device,train_dataloader,optimizer,criterion,scheduler):
    '''
    Train encoder and classification head (with accelerate enabled)
    '''
    model.to(device)
    model.train()

    all_targets,all_preds,all_scores = [],[],[]
    running_loss, t_running_loss, amp_running_loss, phase_running_loss, mesor_running_loss=0.0, 0.0, 0.0, 0.0, 0.0
    correct, correct_T, correct_Amp, correct_Phase, correct_Mesor = 0,0,0,0,0
    all_sample,t_sample,amp_sample,phase_sample,mesor_sample=0,0,0,0,0
    # i=0
    for batch_data_1, input_mask_1, x_marks_1, batch_data_2, input_mask_2, x_marks_2, targets in tqdm(train_dataloader, total=len(train_dataloader)):
        # if i==2:
        #     break
        # i=i+1
        optimizer.zero_grad()
        batch_data_1, batch_data_2 = batch_data_1.to(device).float(), batch_data_2.to(device).float()
        input_mask_1, input_mask_2 = input_mask_1.long().to(device), input_mask_2.long().to(device)
        x_marks_1, x_marks_2 = x_marks_1.to(device), x_marks_2.to(device)

        mask = (targets != -1).to(device)
        all_sample += mask.sum().item()
        t_sample += mask[:,0].sum().item()
        amp_sample += mask[:,1].sum().item()
        phase_sample += mask[:,2].sum().item()
        mesor_sample += mask[:,3].sum().item()

        labels_clean = torch.where(mask.detach().cpu(), targets, torch.zeros_like(targets))
        target=labels_clean.float().to(device)

        all_targets.extend(targets.detach().int().cpu().numpy())

        with torch.autocast(device_type='cuda', dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float32):
            output = model(x_enc=batch_data_1, input_mask=input_mask_1, x_mark=x_marks_1, 
                           x_enc2=batch_data_2, input_mask2=input_mask_2, x_mark2=x_marks_2, reduction=config.reduction)
            logits=output.logits
            raw_loss = criterion(logits, target) * mask.float()
            loss=raw_loss.sum() / mask.sum().item()

        loss.backward()
        optimizer.step()
        scheduler.step()

        #loss计算
        running_loss += loss.item()
        t_running_loss += raw_loss.sum(axis=0)[0].item() / (mask.sum(axis=0)[0].item() + 1e-6)
        amp_running_loss += raw_loss.sum(axis=0)[1].item() / (mask.sum(axis=0)[1].item() + 1e-6)
        phase_running_loss += raw_loss.sum(axis=0)[2].item() / (mask.sum(axis=0)[2].item() + 1e-6)
        mesor_running_loss += raw_loss.sum(axis=0)[3].item() / (mask.sum(axis=0)[3].item() + 1e-6)
        #预测标签获取
        scores=torch.sigmoid(logits)
        predicted = (scores > 0.5).int()
        predicted = torch.where(mask, predicted, torch.tensor(-1))
        all_preds.extend(predicted.detach().cpu().numpy())

        correct += ((predicted==target.int())*mask).sum().item()
        correct_T += ((predicted[:,0]==target[:,0].int())*mask[:,0]).sum().item()
        correct_Amp += ((predicted[:,1]==target[:,1].int())*mask[:,1]).sum().item()
        correct_Phase += ((predicted[:,2]==target[:,2].int())*mask[:,2]).sum().item()
        correct_Mesor += ((predicted[:,3]==target[:,3].int())*mask[:,3]).sum().item()

        #计算预测为正类的概率
        all_scores.extend(torch.where(mask, scores, torch.tensor(-1)).detach().to(torch.float).cpu().numpy())

    all_targets = np.array(all_targets)
    all_preds = np.array(all_preds)
    all_scores = np.array(all_scores)
    
    avg_loss = running_loss / len(train_dataloader)
    avg_t_loss, avg_amp_loss = t_running_loss / len(train_dataloader), amp_running_loss / len(train_dataloader)
    avg_phase_loss, avg_mesor_loss = phase_running_loss / len(train_dataloader), mesor_running_loss / len(train_dataloader)


    avg_accuracy = correct / all_sample

    t_accuracy, amp_accuracy = correct_T / t_sample, correct_Amp / amp_sample
    phase_accuracy, mesor_accuracy = correct_Phase / phase_sample, correct_Mesor / mesor_sample 

    loss={"avg_loss":avg_loss,
        "avg_t_loss":avg_t_loss,
        "avg_amp_loss":avg_amp_loss,
        "avg_phase_loss":avg_phase_loss,
        "avg_mesor_loss":avg_mesor_loss}
    
    accuracy={"avg_accuracy":avg_accuracy,
              "t_accuracy":t_accuracy,
              "amp_accuracy":amp_accuracy,
              "phase_accuracy":phase_accuracy,
              "mesor_accuracy":mesor_accuracy}

    result={
        "loss":loss,
        "accuracy":accuracy,
        "targets":all_targets.tolist(),
        "preds":all_preds.tolist(),
        "scores":all_scores.tolist(),
    }
    return result

def test_epoch(model,dataloader,device,criterion):
    return evaluate_epoch(model,dataloader,device,criterion)
    
def evaluate_epoch(model,dataloader,device,criterion):
    model.eval()
    model.to(device)

    all_targets, all_preds, all_scores = [],[],[]
    running_loss, t_running_loss, amp_running_loss, phase_running_loss, mesor_running_loss=0.0, 0.0, 0.0, 0.0, 0.0
    correct, correct_T, correct_Amp, correct_Phase, correct_Mesor = 0,0,0,0,0
    all_sample,t_sample,amp_sample,phase_sample,mesor_sample=0,0,0,0,0
    # i=0
    with torch.no_grad():
        for batch_data_1, input_mask_1, x_marks_1, batch_data_2, input_mask_2, x_marks_2, targets in tqdm(dataloader, total=len(dataloader)):
            # if i==2:
            #     break
            # i=i+1
            batch_data_1, batch_data_2 = batch_data_1.to(device).float(), batch_data_2.to(device).float()
            input_mask_1, input_mask_2 = input_mask_1.long().to(device), input_mask_2.long().to(device)
            x_marks_1, x_marks_2 = x_marks_1.to(device), x_marks_2.to(device)
            
            mask = (targets != -1).to(device)
            all_sample += mask.sum().item()
            t_sample += mask[:,0].sum().item()
            amp_sample += mask[:,1].sum().item()
            phase_sample += mask[:,2].sum().item()
            mesor_sample += mask[:,3].sum().item()
            labels_clean = torch.where(mask.detach().cpu(), targets, torch.zeros_like(targets))
            target=labels_clean.float().to(device)
            
            all_targets.extend(targets.detach().int().cpu().numpy())
            
            with torch.autocast(device_type='cuda', dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float32):
                output = model(x_enc=batch_data_1, input_mask=input_mask_1, x_mark=x_marks_1,
                               x_enc2=batch_data_2, input_mask2=input_mask_2, x_mark2=x_marks_2, reduction=config.reduction)
                logits=output.logits
                raw_loss = criterion(logits, target) * mask.float()
                loss=raw_loss.sum() / mask.sum().item()

            #loss计算
            running_loss += loss.item()
            t_running_loss += raw_loss.sum(axis=0)[0].item() / (mask.sum(axis=0)[0].item() + 1e-6)
            amp_running_loss += raw_loss.sum(axis=0)[1].item() / (mask.sum(axis=0)[1].item() + 1e-6)
            phase_running_loss += raw_loss.sum(axis=0)[2].item() / (mask.sum(axis=0)[2].item() + 1e-6)
            mesor_running_loss += raw_loss.sum(axis=0)[3].item() / (mask.sum(axis=0)[3].item() + 1e-6)

            #获取预测标签
            scores=torch.sigmoid(logits)*mask
            predicted = (scores > 0.5).int()

            predicted = torch.where(mask, predicted, torch.tensor(-1))
            all_preds.extend(predicted.detach().cpu().numpy())

            #统计预测正确的标签的个数
            correct += ((predicted==target.int())*mask).sum().item()
            correct_T += ((predicted[:,0]==target[:,0].int())*mask[:,0]).sum().item()
            correct_Amp += ((predicted[:,1]==target[:,1].int())*mask[:,1]).sum().item()
            correct_Phase += ((predicted[:,2]==target[:,2].int())*mask[:,2]).sum().item()
            correct_Mesor += ((predicted[:,3]==target[:,3].int())*mask[:,3]).sum().item()

            #计算预测为正类的概率
            all_scores.extend(torch.where(mask, scores, torch.tensor(-1)).detach().to(torch.float).cpu().numpy())
    
    #计算关键指标precision,recall,F1-score,auc,AP,mAP
    all_targets = np.array(all_targets)
    all_preds = np.array(all_preds)
    all_scores = np.array(all_scores)

    #计算每个epoch平均loss和accuracy
    avg_loss = running_loss / len(dataloader)
    avg_t_loss, avg_amp_loss = t_running_loss / len(dataloader), amp_running_loss / len(dataloader)
    avg_phase_loss, avg_mesor_loss = phase_running_loss / len(dataloader), mesor_running_loss / len(dataloader)

    avg_accuracy = correct / all_sample

    t_accuracy, amp_accuracy = correct_T / t_sample, correct_Amp / amp_sample
    phase_accuracy, mesor_accuracy = correct_Phase / phase_sample, correct_Mesor / mesor_sample 

    loss={"avg_loss":avg_loss,
        "avg_t_loss":avg_t_loss,
        "avg_amp_loss":avg_amp_loss,
        "avg_phase_loss":avg_phase_loss,
        "avg_mesor_loss":avg_mesor_loss}
    
    accuracy={"avg_accuracy":avg_accuracy,
              "t_accuracy":t_accuracy,
              "amp_accuracy":amp_accuracy,
              "phase_accuracy":phase_accuracy,
              "mesor_accuracy":mesor_accuracy}
    
    result={
        "loss":loss,
        "accuracy":accuracy,
        "targets":all_targets.tolist(),
        "preds":all_preds.tolist(),
        "scores":all_scores.tolist(),
    }
    return result
import os
def save_checkpoint(model,savePath="best_model.pth"):#"best_valModel.pth","current_model.pth"
    path = os.path.join("saved_nnets/RealDST-T1")
    #mkdir if not exist
    if not os.path.exists(path):
        os.makedirs(path)
    checkpoint = {
        'model_state_dict': model.state_dict()
    }
    torch.save(checkpoint, os.path.join(path, savePath))

amp

In [None]:
import torch

state_dict=torch.load("pretrained/Task2/best_model.pth")
model.load_state_dict(state_dict)
optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
criterion = torch.nn.BCEWithLogitsLoss()

In [None]:
import os
print(os.getcwd())
file_paths="../example_data/t2/"
result_fold = '../example_data/t2/result/'

In [None]:
from tkinter import TRUE
from data_provider.classfication_datasets import MulGroup_MultipleDataset
from torch.utils.data import DataLoader
from utils.logging import CustomLogger
from utils.metrics import Metric
from datetime import datetime
import torch
current_time = datetime.now()
c_time = current_time.strftime("%y_%m_%d_%H_%M")

seed=77
for filepath in file_paths:
    # 创建 MultipleDataset 实例并加载数据
    test_dataset = MulGroup_MultipleDataset(data_split="test", file_paths=[file_paths],seq_len=72,seed=seed)
    torch.manual_seed(seed)
    id=test_dataset.labels[:,0]
    test_dataset.labels=test_dataset.labels[:,1:5].astype(int)
    train_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)
    
    # test_result=train_epoch(model,device,train_dataloader,optimizer,criterion,scheduler)
    test_result=test_epoch(model, train_dataloader, device, criterion)
    # print(f"{currentFile}: Test Accuracy: {test_result['accuracy'][0]:.4f}")

    labels=np.array(test_result['targets'])
    predict=np.array(test_result['preds'])
    target1,target2,target3,target4=labels[:,0],labels[:,1],labels[:,2],labels[:,3]
    predict1,predict2,predict3,predict4=predict[:,0],predict[:,1],predict[:,2],predict[:,3]
    target1,target2,target3,target4=target1[target1 !=-1],target2[target2 !=-1],target3[target3 !=-1],target4[target4 !=-1]
    predict1,predict2,predict3,predict4=predict1[predict1 !=-1],predict2[predict2 !=-1],predict3[predict3 !=-1],predict4[predict4 !=-1]
    
    # scores=test_result['scores']
    report1=Metric.get_classification_report(target1,predict1)
    print("Period"+report1)
    report2=Metric.get_classification_report(target2,predict2)
    print("Amplitude"+report2)
    report3=Metric.get_classification_report(target3,predict3)
    print("Phase"+report3)
    report4=Metric.get_classification_report(target4,predict4)
    print("Mesor"+report4)

    sss=np.array(test_result['scores'])
    score1,score2,score3,score4=sss[:,0],sss[:,1],sss[:,2],sss[:,3]
    score1,score2,score3,score4=score1[score1 !=-1],score2[score2 !=-1],score3[score3 !=-1],score4[score4 !=-1]

    resSave={
    'accuracy':test_result['accuracy'],
    'target':{'period':target1.tolist(),'amp':target2.tolist(),'phase':target3.tolist(),'mesor':target4.tolist()},
    'preds':{'period':predict1.tolist(),'amp':predict2.tolist(),'phase':predict3.tolist(),'mesor':predict4.tolist()},
    'scores':{'period':score1.tolist(),'amp':score2.tolist(),'phase':score3.tolist(),'mesor':score4.tolist()}}
    resSave['ID']=id.tolist()

    Metric.save_metrics(resSave, result_fold, "current_res.json", 0, "example", mode='w')

    break
    

In [None]:
import utils.pro_json as pro_json

metrics_result = pro_json.calculate_multiclass_metrics(json_path='../example_data/t2/result/example/current_res.json')

pro_json.process_multiclass_json_to_csv(
    json_path="../example_data/t2/result/example/current_res.json", 
    output_fold="../example_data/t2/result/example/"  
)