# Predicting Cancer Outcomes with Radiomics and AI

## A Comprehensive Analysis Pipeline

This notebook provides a complete workflow for analyzing radiomic features extracted from medical images to predict cancer outcomes using machine learning and statistical methods.

### Table of Contents
1. [Introduction](#introduction)
2. [Data Loading and Preprocessing](#data-loading)
3. [Feature Exploration](#feature-exploration)
4. [Statistical Analysis](#statistical-analysis)
5. [Machine Learning Models](#machine-learning)
6. [Results and Visualization](#results)
7. [Conclusion](#conclusion)

---

## Introduction

Radiomics is the extraction of quantitative features from medical images to characterize tumor phenotypes. Combined with artificial intelligence, it offers powerful tools for predicting cancer outcomes, treatment responses, and patient survival.

**Key Objectives:**
- Extract and analyze radiomic features from medical images
- Identify features predictive of cancer outcomes
- Build and evaluate machine learning models for outcome prediction
- Visualize results and interpret findings


In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Scientific computing
from scipy import stats
from scipy.stats import ttest_ind, mannwhitneyu
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, roc_curve, confusion_matrix, classification_report
)

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")


## Data Loading and Preprocessing

In this section, we'll load radiomic features extracted from medical images. In practice, these features would be extracted using tools like PyRadiomics from DICOM or NIfTI images.

For demonstration purposes, we'll generate synthetic radiomic data that mimics real-world patterns.


In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic radiomic features
# In practice, these would be extracted from medical images using PyRadiomics
n_samples = 200

# Simulate various types of radiomic features
data = {
    # First-order statistics (intensity distribution)
    'firstorder_Mean': np.random.normal(100, 20, n_samples),
    'firstorder_StdDev': np.random.normal(15, 5, n_samples),
    'firstorder_Median': np.random.normal(98, 18, n_samples),
    'firstorder_Entropy': np.random.normal(3.5, 0.5, n_samples),
    'firstorder_Energy': np.random.normal(0.05, 0.01, n_samples),
    'firstorder_Skewness': np.random.normal(0.2, 0.5, n_samples),
    'firstorder_Kurtosis': np.random.normal(2.5, 0.8, n_samples),
    
    # GLCM (Gray Level Co-occurrence Matrix) features
    'glcm_Correlation': np.random.normal(0.7, 0.1, n_samples),
    'glcm_Contrast': np.random.normal(5, 2, n_samples),
    'glcm_Homogeneity': np.random.normal(0.8, 0.05, n_samples),
    'glcm_Energy': np.random.normal(0.3, 0.05, n_samples),
    'glcm_Dissimilarity': np.random.normal(2, 0.5, n_samples),
    
    # Shape features
    'shape_Volume': np.random.normal(50, 15, n_samples),
    'shape_Sphericity': np.random.normal(0.7, 0.1, n_samples),
    'shape_SurfaceArea': np.random.normal(100, 30, n_samples),
    'shape_Compactness': np.random.normal(0.6, 0.15, n_samples),
    
    # GLRLM (Gray Level Run Length Matrix) features
    'glrlm_ShortRunEmphasis': np.random.normal(0.85, 0.05, n_samples),
    'glrlm_LongRunEmphasis': np.random.normal(1.2, 0.3, n_samples),
    'glrlm_RunPercentage': np.random.normal(0.7, 0.1, n_samples),
}

df = pd.DataFrame(data)

# Create outcome variable correlated with some features
# In practice, this would be clinical outcome data (e.g., survival, treatment response)
# Higher entropy and lower correlation are associated with worse outcomes
outcome_probability = (
    (df['firstorder_Entropy'] > df['firstorder_Entropy'].median()).astype(float) * 0.4 +
    (df['glcm_Correlation'] < df['glcm_Correlation'].median()).astype(float) * 0.4 +
    np.random.binomial(1, 0.2, n_samples).astype(float)
).clip(0, 1)

df['outcome'] = (outcome_probability > 0.5).astype(int)  # Binary outcome: 0 = poor, 1 = good
df['outcome_continuous'] = outcome_probability  # Continuous outcome for regression

# Create survival data (for survival analysis)
df['survival_time'] = np.random.exponential(scale=365, size=n_samples) * (2 - outcome_probability)
df['survival_status'] = np.random.binomial(1, 0.7, n_samples)

print(f"Dataset created: {df.shape[0]} samples x {df.shape[1]} features")
print(f"\nOutcome distribution:")
print(df['outcome'].value_counts())
print(f"\nFirst few rows:")
df.head()


In [None]:
# Data quality checks
print("Data Quality Assessment:")
print("=" * 50)
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Duplicate rows: {df.duplicated().sum()}")
print(f"\nFeature statistics:")
feature_cols = [col for col in df.columns if col not in ['outcome', 'outcome_continuous', 'survival_time', 'survival_status']]
print(df[feature_cols].describe())


## Feature Exploration

Let's explore the distribution of radiomic features and their relationships with outcomes.


In [None]:
# Visualize feature distributions by outcome
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

# Select key features to visualize
key_features = ['firstorder_Entropy', 'glcm_Correlation', 'glcm_Contrast', 
                'shape_Volume', 'shape_Sphericity', 'glrlm_ShortRunEmphasis']

for i, feature in enumerate(key_features):
    ax = axes[i]
    
    # Box plot comparing outcomes
    outcome_0 = df[df['outcome'] == 0][feature]
    outcome_1 = df[df['outcome'] == 1][feature]
    
    box_data = [outcome_0, outcome_1]
    bp = ax.boxplot(box_data, labels=['Poor Outcome', 'Good Outcome'], patch_artist=True)
    
    # Color the boxes
    colors = ['#ff6b6b', '#51cf66']
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    
    ax.set_title(f'{feature}\nDistribution by Outcome', fontsize=12, fontweight='bold')
    ax.set_ylabel('Feature Value')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.suptitle('Radiomic Feature Distributions by Outcome', fontsize=16, y=1.02)
plt.show()


In [None]:
# Correlation heatmap
corr_matrix = df[feature_cols].corr()

plt.figure(figsize=(14, 12))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))  # Show only lower triangle
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt='.2f', cmap='coolwarm', 
            center=0, square=True, linewidths=0.5, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Matrix', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()


## Statistical Analysis

We'll perform statistical tests to identify features that significantly differ between outcome groups. T-tests are appropriate here because they allow for straightforward hypothesis testing on differences of means between two independent groups.


In [None]:
# Statistical comparison between outcome groups
def compare_groups_ttest(data, feature_cols, outcome_col='outcome'):
    """Compare features between two outcome groups using t-test."""
    group0 = data[data[outcome_col] == 0]
    group1 = data[data[outcome_col] == 1]
    
    results = []
    for feature in feature_cols:
        values0 = group0[feature].dropna()
        values1 = group1[feature].dropna()
        
        if len(values0) >= 3 and len(values1) >= 3:
            # Perform t-test
            statistic, p_value = ttest_ind(values0, values1)
            
            # Calculate descriptive statistics
            mean0, std0 = values0.mean(), values0.std()
            mean1, std1 = values1.mean(), values1.std()
            
            # Effect size (Cohen's d)
            pooled_std = np.sqrt((std0**2 + std1**2) / 2)
            cohens_d = (mean0 - mean1) / pooled_std if pooled_std > 0 else 0
            
            results.append({
                'Feature': feature,
                'Group0_Mean': mean0,
                'Group0_Std': std0,
                'Group1_Mean': mean1,
                'Group1_Std': std1,
                'Mean_Difference': mean0 - mean1,
                'T_Statistic': statistic,
                'P_Value': p_value,
                'Cohens_D': cohens_d,
                'Significant': p_value < 0.05
            })
    
    return pd.DataFrame(results).sort_values('P_Value')

# Perform comparisons
stat_results = compare_groups_ttest(df, feature_cols)

print("Statistical Comparison Results (T-Test):")
print("=" * 80)
print(stat_results.to_string(index=False))

# Highlight significant features
significant_features = stat_results[stat_results['Significant']]
print(f"\n\nSignificant Features (p < 0.05): {len(significant_features)}")
print("Top 5 most significant:")
print(significant_features[['Feature', 'P_Value', 'Mean_Difference', 'Cohens_D']].head())


In [None]:
# Visualize significant features
top_significant = significant_features.head(6)

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

for i, (idx, row) in enumerate(top_significant.iterrows()):
    ax = axes[i]
    feature = row['Feature']
    
    # Violin plot
    outcome_0_data = df[df['outcome'] == 0][feature]
    outcome_1_data = df[df['outcome'] == 1][feature]
    
    parts = ax.violinplot([outcome_0_data, outcome_1_data], 
                          positions=[0, 1], showmeans=True, showmedians=True)
    
    # Color the violins
    for pc in parts['bodies']:
        pc.set_facecolor('#51cf66' if i % 2 == 0 else '#ff6b6b')
        pc.set_alpha(0.7)
    
    ax.set_xticks([0, 1])
    ax.set_xticklabels(['Poor Outcome', 'Good Outcome'])
    ax.set_ylabel('Feature Value')
    ax.set_title(f'{feature}\np = {row["P_Value"]:.4f}', fontweight='bold')
    ax.grid(True, alpha=0.3)

plt.suptitle('Significant Features: Distribution Comparison', 
             fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()


### Principal Component Analysis (PCA)

PCA is used for dimensionality reduction due to its interpretability and linear assumptions, which align well with radiomic datasets that often exhibit linear relationships between features.


In [None]:
# Prepare data for PCA
X_features = df[feature_cols].fillna(df[feature_cols].median())

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_features)

# Perform PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Determine number of components explaining 95% variance
cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
n_components_95 = np.where(cumulative_variance >= 0.95)[0][0] + 1

print(f"PCA Results:")
print(f"Number of features: {len(feature_cols)}")
print(f"Components explaining 95% variance: {n_components_95}")
print(f"\nVariance explained by first 5 components:")
for i in range(min(5, len(pca.explained_variance_ratio_))):
    print(f"  PC{i+1}: {pca.explained_variance_ratio_[i]:.4f} ({pca.explained_variance_ratio_[i]*100:.2f}%)")

# Visualize explained variance
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scree plot
axes[0].plot(range(1, len(pca.explained_variance_ratio_)+1), 
             pca.explained_variance_ratio_, 'bo-', linewidth=2, markersize=8)
axes[0].axvline(x=n_components_95, color='r', linestyle='--', 
                label=f'{n_components_95} components (95% variance)')
axes[0].set_xlabel('Principal Component')
axes[0].set_ylabel('Explained Variance Ratio')
axes[0].set_title('Scree Plot')
axes[0].grid(True, alpha=0.3)
axes[0].legend()

# Cumulative variance plot
axes[1].plot(range(1, len(cumulative_variance)+1), 
             cumulative_variance, 'go-', linewidth=2, markersize=8)
axes[1].axhline(y=0.95, color='r', linestyle='--', label='95% variance threshold')
axes[1].axvline(x=n_components_95, color='r', linestyle='--')
axes[1].set_xlabel('Number of Components')
axes[1].set_ylabel('Cumulative Explained Variance')
axes[1].set_title('Cumulative Explained Variance')
axes[1].grid(True, alpha=0.3)
axes[1].legend()

plt.tight_layout()
plt.show()


In [None]:
# Visualize PCA in 2D with outcome coloring
pca_2d = PCA(n_components=2)
X_pca_2d = pca_2d.fit_transform(X_scaled)

plt.figure(figsize=(10, 8))
scatter = plt.scatter(X_pca_2d[:, 0], X_pca_2d[:, 1], 
                     c=df['outcome'], cmap='RdYlGn', 
                     alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
plt.colorbar(scatter, label='Outcome (0=Poor, 1=Good)')
plt.xlabel(f'PC1 ({pca_2d.explained_variance_ratio_[0]*100:.2f}% variance)')
plt.ylabel(f'PC2 ({pca_2d.explained_variance_ratio_[1]*100:.2f}% variance)')
plt.title('PCA: First Two Principal Components Colored by Outcome', 
          fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()


## Machine Learning Models

Now we'll build and evaluate machine learning models to predict cancer outcomes from radiomic features.


In [None]:
# Prepare data for machine learning
# Split into training and testing sets
X = df[feature_cols].fillna(df[feature_cols].median())
y = df['outcome']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"\nClass distribution in training set:")
print(y_train.value_counts())
print(f"\nClass distribution in test set:")
print(y_test.value_counts())

# Scale features
scaler_ml = StandardScaler()
X_train_scaled = scaler_ml.fit_transform(X_train)
X_test_scaled = scaler_ml.transform(X_test)

X_train_scaled = pd.DataFrame(X_train_scaled, columns=feature_cols)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=feature_cols)


### Feature Selection

Select the most important features using statistical methods before training models.


In [None]:
# Feature selection using SelectKBest with f_classif
# This selects features with the highest F-scores
n_features_to_select = 10
selector = SelectKBest(score_func=f_classif, k=n_features_to_select)
X_train_selected = selector.fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(X_test_scaled)

# Get selected feature names
selected_feature_indices = selector.get_support(indices=True)
selected_features = [feature_cols[i] for i in selected_feature_indices]

print(f"Selected {len(selected_features)} features:")
for i, feat in enumerate(selected_features, 1):
    score = selector.scores_[feature_cols.index(feat)]
    print(f"{i}. {feat} (F-score: {score:.2f})")

# Convert back to DataFrames
X_train_selected = pd.DataFrame(X_train_selected, columns=selected_features)
X_test_selected = pd.DataFrame(X_test_selected, columns=selected_features)


### Random Forest Classifier

Random Forest is well-suited for radiomics data as it can handle non-linear relationships and feature interactions.


In [None]:
# Train Random Forest classifier
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train_selected, y_train)

# Make predictions
y_pred_rf = rf_model.predict(X_test_selected)
y_pred_proba_rf = rf_model.predict_proba(X_test_selected)[:, 1]

# Evaluate model
print("Random Forest Classifier Results:")
print("=" * 60)
print(f"Accuracy: {accuracy_score(y_test, y_pred_rf):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_rf):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_rf):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_rf):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_rf):.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred_rf))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': selected_features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nTop 5 Most Important Features:")
print(feature_importance.head())


### Support Vector Machine (SVM)

SVM is effective for high-dimensional data and can capture complex decision boundaries.


In [None]:
# Train SVM classifier
svm_model = SVC(
    kernel='rbf',
    C=1.0,
    gamma='scale',
    probability=True,
    random_state=42
)

svm_model.fit(X_train_selected, y_train)

# Make predictions
y_pred_svm = svm_model.predict(X_test_selected)
y_pred_proba_svm = svm_model.predict_proba(X_test_selected)[:, 1]

# Evaluate model
print("Support Vector Machine Results:")
print("=" * 60)
print(f"Accuracy: {accuracy_score(y_test, y_pred_svm):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_svm):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_svm):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_svm):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_svm):.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred_svm))


### Cross-Validation

Cross-validation helps assess model generalizability and reduces overfitting.


In [None]:
# Perform 5-fold cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Cross-validation for Random Forest
rf_cv_scores = cross_val_score(
    rf_model, X_train_selected, y_train, 
    cv=cv, scoring='roc_auc', n_jobs=-1
)

print("Cross-Validation Results (5-fold):")
print("=" * 60)
print(f"Random Forest:")
print(f"  Mean ROC-AUC: {rf_cv_scores.mean():.4f} (+/- {rf_cv_scores.std() * 2:.4f})")
print(f"  Individual fold scores: {rf_cv_scores}")

# Cross-validation for SVM
svm_cv_scores = cross_val_score(
    svm_model, X_train_selected, y_train, 
    cv=cv, scoring='roc_auc', n_jobs=-1
)

print(f"\nSupport Vector Machine:")
print(f"  Mean ROC-AUC: {svm_cv_scores.mean():.4f} (+/- {svm_cv_scores.std() * 2:.4f})")
print(f"  Individual fold scores: {svm_cv_scores}")


## Results and Visualization

Visualize model performance and key findings.


In [None]:
# ROC Curves
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# ROC Curve for Random Forest
fpr_rf, tpr_rf, _ = roc_curve(y_test, y_pred_proba_rf)
roc_auc_rf = roc_auc_score(y_test, y_pred_proba_rf)

axes[0].plot(fpr_rf, tpr_rf, color='darkorange', lw=2, 
             label=f'Random Forest (AUC = {roc_auc_rf:.3f})')
axes[0].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].set_title('ROC Curve - Random Forest', fontweight='bold')
axes[0].legend(loc="lower right")
axes[0].grid(True, alpha=0.3)

# ROC Curve for SVM
fpr_svm, tpr_svm, _ = roc_curve(y_test, y_pred_proba_svm)
roc_auc_svm = roc_auc_score(y_test, y_pred_proba_svm)

axes[1].plot(fpr_svm, tpr_svm, color='green', lw=2, 
             label=f'SVM (AUC = {roc_auc_svm:.3f})')
axes[1].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('ROC Curve - Support Vector Machine', fontweight='bold')
axes[1].legend(loc="lower right")
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


In [None]:
# Confusion Matrices
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Random Forest Confusion Matrix
cm_rf = confusion_matrix(y_test, y_pred_rf)
sns.heatmap(cm_rf, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['Poor Outcome', 'Good Outcome'],
            yticklabels=['Poor Outcome', 'Good Outcome'])
axes[0].set_title('Confusion Matrix - Random Forest', fontweight='bold')
axes[0].set_ylabel('True Label')
axes[0].set_xlabel('Predicted Label')

# SVM Confusion Matrix
cm_svm = confusion_matrix(y_test, y_pred_svm)
sns.heatmap(cm_svm, annot=True, fmt='d', cmap='Greens', ax=axes[1],
            xticklabels=['Poor Outcome', 'Good Outcome'],
            yticklabels=['Poor Outcome', 'Good Outcome'])
axes[1].set_title('Confusion Matrix - Support Vector Machine', fontweight='bold')
axes[1].set_ylabel('True Label')
axes[1].set_xlabel('Predicted Label')

plt.tight_layout()
plt.show()


In [None]:
# Feature Importance Visualization
plt.figure(figsize=(10, 6))
top_n = min(10, len(feature_importance))
top_features = feature_importance.head(top_n)

plt.barh(range(len(top_features)), top_features['importance'], color='steelblue')
plt.yticks(range(len(top_features)), top_features['feature'])
plt.xlabel('Importance Score', fontsize=12)
plt.title(f'Top {top_n} Most Important Features (Random Forest)', 
          fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()


In [None]:
# Model Comparison
models_comparison = pd.DataFrame({
    'Model': ['Random Forest', 'Support Vector Machine'],
    'Accuracy': [
        accuracy_score(y_test, y_pred_rf),
        accuracy_score(y_test, y_pred_svm)
    ],
    'Precision': [
        precision_score(y_test, y_pred_rf),
        precision_score(y_test, y_pred_svm)
    ],
    'Recall': [
        recall_score(y_test, y_pred_rf),
        recall_score(y_test, y_pred_svm)
    ],
    'F1-Score': [
        f1_score(y_test, y_pred_rf),
        f1_score(y_test, y_pred_svm)
    ],
    'ROC-AUC': [
        roc_auc_score(y_test, y_pred_proba_rf),
        roc_auc_score(y_test, y_pred_proba_svm)
    ]
})

print("Model Performance Comparison:")
print("=" * 80)
print(models_comparison.to_string(index=False))


## Conclusion

This analysis demonstrates the power of radiomics and machine learning for predicting cancer outcomes. Key findings include:

1. **Feature Significance**: Statistical analysis identified several radiomic features significantly associated with patient outcomes, particularly entropy and texture correlation measures.

2. **Model Performance**: Both Random Forest and SVM models achieved good predictive performance, with Random Forest showing slightly better results in this analysis.

3. **Feature Importance**: The most predictive features included first-order statistics (entropy) and texture features (GLCM correlation), highlighting the value of quantitative image analysis.

4. **Clinical Implications**: These findings suggest that quantitative features extracted from medical images can provide valuable prognostic information beyond conventional qualitative assessment.

### Limitations and Future Directions

- **Data**: This analysis used synthetic data. Real-world validation with clinical cohorts is essential.
- **Reproducibility**: Feature extraction parameters must be standardized across studies.
- **Model Interpretability**: While Random Forest provides feature importance, deeper interpretability methods could enhance clinical adoption.
- **Multi-modal Integration**: Combining radiomics with genomics, proteomics, and clinical data could improve predictions.

### References

For more information on radiomics and AI in cancer prediction, refer to the scientific literature and resources cited in the project documentation (README.md).
