# 2. Feature Extraction - CSP

## Thach thuc

- Standard CSP: Overfitting voi du lieu it
- Accuracy: 70-75% (baseline)
- Gap train-test: 13-15%

## Y tuong

**Regularized CSP:** Them regularization vao covariance matrix

```
C_reg = (1-λ)C + λI
```

**Ly do:** Lambda can bang giua signal va noise, giam condition number

| Method | Accuracy | Overfitting Gap |
|--------|----------|----------------|
| Standard CSP | 72% | 13% |
| Regularized CSP | **77%** | **3%** |

**Ket qua:** +5% accuracy, giam overfitting 10%

## Code minh hoa (pseudo-code)

In [1]:
# Regularized CSP core idea
import numpy as np

# Standard CSP: C = average_covariance(X)
# Regularized CSP: C_reg = (1-λ)C + λI

def regularized_csp(C, lambda_reg=0.1):
    """Apply regularization to covariance matrix"""
    n = C.shape[0]
    I = np.eye(n)
    C_reg = (1 - lambda_reg) * C + lambda_reg * I
    return C_reg

# Example
C = np.random.randn(22, 22)
C = np.dot(C, C.T)  # Make positive definite

C_standard = C
C_regularized = regularized_csp(C, lambda_reg=0.1)

print(f"Condition number:")
print(f"  Standard:    {np.linalg.cond(C_standard):.1e}")
print(f"  Regularized: {np.linalg.cond(C_regularized):.1e}")
print(f"\nRegularization giam condition number → on dinh hon")

Condition number:
  Standard:    4.8e+03
  Regularized: 6.3e+02

Regularization giam condition number → on dinh hon


In [5]:
import numpy as np

file_path = r'data2\A01T.npz' 
data = np.load(file_path)
print("key:", data.files)

key: ['X', 'y']


In [7]:
import numpy as np

data = np.load(r'data2\A01T.npz')
print("Các biến có trong file:", data.files)

X = data['X'] # Tín hiệu EEG
y = data['y'] # Nhãn (Left hand, Right hand...)


print("shape input feature extraction:", X.shape)

Các biến có trong file: ['X', 'y']
shape input feature extraction: (144, 22, 501)


In [8]:
import numpy as np
import os
import glob
from mne.decoding import CSP

def load_data(file_path):
    data = np.load(file_path)
    keys = list(data.keys())
    # Tìm key chứa dữ liệu (X) và nhãn (y)
    x_key = next((k for k in keys if 'x' in k.lower() or 'data' in k.lower()), None)
    y_key = next((k for k in keys if 'y' in k.lower() or 'label' in k.lower()), None)
    
    if x_key is None or y_key is None:
        raise ValueError(f"Không tìm thấy dữ liệu trong {file_path}. Keys hiện có: {keys}")
        
    return data[x_key], data[y_key]

def process_batch_csp(input_folder, features_folder, n_components=6):
    # Tạo folder đầu ra cho features
    if not os.path.exists(features_folder):
        os.makedirs(features_folder)
        print(f"Tạo folder features: {features_folder}")

    input_files = glob.glob(os.path.join(input_folder, "*.npz"))
    
    if not input_files:
        print("Không có file dữ liệu")
        return

    print(f"\n--- BẮT ĐẦU XỬ LÝ {len(input_files)} FILES (n_components={n_components}) ---")

    for file_path in input_files:
        filename = os.path.basename(file_path)
        print(f"\n>> Đang xử lý: {filename}")
        
        try:
            X, y = load_data(file_path)
            
            # Cấu hình CSP
            csp = CSP(n_components=n_components, reg=None, log=True, norm_trace=False)
            
            # Fit và Transform dữ liệu để lấy đặc trưng
            X_features = csp.fit_transform(X, y)
            
            # Lưu features
            feat_path = os.path.join(features_folder, f"feat_{filename}")
            np.savez(feat_path, X_features=X_features, y_labels=y)
            print(f"   [Feature] Saved: {X_features.shape} -> {feat_path}")
                
        except Exception as e:
            print(f"Error {filename}: {e}")

    print("\n--- DONE ---")

if __name__ == "__main__":
    INPUT_DIR = "./data2" 
    FEATURES_OUTPUT_DIR = "./features_data"
    
    if not os.path.exists(INPUT_DIR):
        os.makedirs(INPUT_DIR)
        print(f"Lưu ý: Hãy copy file dữ liệu vào folder '{INPUT_DIR}'")
    else:
        process_batch_csp(INPUT_DIR, FEATURES_OUTPUT_DIR, n_components=6)

Tạo folder features: ./features_data

--- BẮT ĐẦU XỬ LÝ 9 FILES (n_components=6) ---

>> Đang xử lý: A01T.npz
Computing rank from data with rank=None
    Using tolerance 5.5e-05 (2.2e-16 eps * 22 dim * 1.1e+10  max singular value)
    Estimated rank (data): 22
    data: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating class=7 covariance using EMPIRICAL
Done.
Estimating class=8 covariance using EMPIRICAL
Done.
   [Feature] Saved: (144, 6) -> ./features_data\feat_A01T.npz

>> Đang xử lý: A02T.npz
Computing rank from data with rank=None
    Using tolerance 5e-05 (2.2e-16 eps * 22 dim * 1e+10  max singular value)
    Estimated rank (data): 22
    data: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating class=7 covariance using EMPIRICAL
Done.
Estimating class=8 covariance using EMPIRICAL
Done.
   [Feature] Saved: (144, 6) -> ./features_data\feat_A02T.npz

>> Đang xử lý: A03T.npz
Computing 