# Comparison of Real and Complex Neural Networks Using Cross-Validation and Paired T-Test
 Description:
This section details the comparison of real and complex-valued artificial neural networks, specifically focusing on protein classification. The comparison is based on the mean and standard deviations of metric results obtained from 10-fold cross-validation. A paired t-test is utilized to statistically compare the accuracy results of both methods at each fold. This comparison is conducted separately for DNA, codon, and amino acid sequences.

Cross Validation (10 Fold) and T-Test Analysis Steps:

 1. DNA Sequence Classification: Complex-ANN vs. Real-ANN

- a) Complex-ANN: Cross-validation and metric averages of classification with complex-valued artificial neural networks.
- b) Real-ANN: Cross-validation and metric averages of classification with real-valued artificial neural networks.
- c) Paired t-test Analysis.


 2. Codon Sequence Classification: Complex-ANN vs. Real-ANN

- a) Complex-ANN: Cross-validation and metric averages of classification with complex-valued artificial neural networks.
- b) Real-ANN: Cross-validation and metric averages of classification with real-valued artificial neural networks.
- c) Paired t-test Analysis.

3. Amino Acid Sequence Classification: Complex-ANN vs. Real-ANN

- a) Complex-ANN: Cross-validation and metric averages of classification with complex-valued artificial neural networks.
- b) Real-ANN: Cross-validation and metric averages of classification with real-valued artificial neural networks.
- c) Paired t-test Analysis.

# 1. DNA Sequence Classification: Complex-ANN vs. Real-ANN 

- a) Cross-Validation and Metric Averages of Classification with Complex Value Artificial Neural Networks (10 Fold)

In [20]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Layer, Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.utils import to_categorical

class ComplexDense(Layer):
    def __init__(self, units, activation=None, **kwargs):
        super(ComplexDense, self).__init__(**kwargs)
        self.units = units
        self.activation = activation

    def build(self, input_shape):
        self.kernel = self.add_weight(shape=(input_shape[-1], self.units, 2),
                                      initializer='random_normal',
                                      trainable=True)
        self.bias = self.add_weight(shape=(self.units, 2),
                                    initializer='zeros',
                                    trainable=True)

    def call(self, inputs):
        inputs_real, inputs_imag = tf.math.real(inputs), tf.math.imag(inputs)
        kernel_real, kernel_imag = self.kernel[..., 0], self.kernel[..., 1]
        bias_real, bias_imag = self.bias[..., 0], self.bias[..., 1]

        output_real = tf.matmul(inputs_real, kernel_real) - tf.matmul(inputs_imag, kernel_imag) + bias_real
        output_imag = tf.matmul(inputs_real, kernel_imag) + tf.matmul(inputs_imag, kernel_real) + bias_imag

        output = tf.complex(output_real, output_imag)
        if self.activation:
            output = self.activation(output)
        return output

def complex_tanh(z):
    return tf.complex(tf.math.tanh(tf.math.real(z)), tf.math.tanh(tf.math.imag(z)))

def wirtinger_loss(y_true, y_pred):
    y_true = tf.cast(y_true, tf.complex64)
    y_pred = tf.cast(y_pred, tf.complex64)
    dF_dz = tf.math.conj(y_pred - y_true) 
    dF_dz_star = (y_pred - y_true)
    return tf.math.abs(dF_dz)**2 + tf.math.abs(dF_dz_star)**2

# Load the data
data = pd.read_excel("D://datasetTEZ//KİNASE_GPCR_DNA_Complex_Encoded.xlsx")
X = np.array([np.array(list(map(float, x_real.strip("[]").split(',')))) + 1j * np.array(list(map(float, x_imag.strip("[]").split(',')))) for x_real, x_imag in zip(data['Real'], data['Imag'])])
y = data['label'].values
y = to_categorical(y, num_classes=2)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model definition
    input_layer = Input(shape=(X_train.shape[1],), dtype=tf.complex64)
    complex_dense1 = ComplexDense(12, activation=complex_tanh)(input_layer)
    complex_dense2 = ComplexDense(8, activation=complex_tanh)(complex_dense1)
    complex_dense3 = ComplexDense(6, activation=complex_tanh)(complex_dense2)
    output_layer = ComplexDense(2, activation=complex_tanh)(complex_dense3)
    model = Model(inputs=input_layer, outputs=output_layer)

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss=wirtinger_loss, metrics=['accuracy'])
    
    # Fit the model
    model.fit(X_train, y_train, batch_size=10, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[8 2]
 [3 7]]
Accuracy: 0.7500, Precision: 0.7525, Recall: 0.7500, F1 Score: 0.7494

Fold 2 Confusion Matrix:
[[10  0]
 [ 3  7]]
Accuracy: 0.8500, Precision: 0.8846, Recall: 0.8500, F1 Score: 0.8465

Fold 3 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 4 Confusion Matrix:
[[9 1]
 [2 8]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 Score: 0.8496

Fold 5 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 6 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 7 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 8 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 9 Confusion Matrix:
[[8 2]
 [1 9]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 

- b) Cross-Validation and Metric Averages of Classification with Real-Value Artificial Neural Networks

In [23]:
import pandas as pd
import numpy as np
import ast
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from keras.utils import to_categorical

# Load dataset
file_path = r"D:\\datasetTEZ\\KİNASE_GPCR_DNA_ReelEncoded.xlsx"
data = pd.read_excel(file_path)
data['Encoded'] = data['Encoded'].apply(ast.literal_eval)

# Separate attributes and tags
X = np.array(data['Encoded'].tolist())
y = data['label'].values
y = to_categorical(y, num_classes=2)  # Convert labels to one-hot encoded format if needed

# Standardize data
sc = StandardScaler()
X = sc.fit_transform(X)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model 
    model = Sequential()
    model.add(Dense(units=12, activation='tanh', input_dim=X_train.shape[1]))
    model.add(Dropout(0.5))
    model.add(Dense(units=8, activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(units=6, activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(units=1, activation='sigmoid'))

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss='binary_crossentropy', metrics=['accuracy'])

    # Fit the model
    model.fit(X_train, y_train, batch_size=10, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[ 8  2]
 [ 0 10]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 2 Confusion Matrix:
[[ 8  2]
 [ 0 10]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 3 Confusion Matrix:
[[ 7  3]
 [ 0 10]]
Accuracy: 0.8500, Precision: 0.8846, Recall: 0.8500, F1 Score: 0.8465

Fold 4 Confusion Matrix:
[[10  0]
 [ 0 10]]
Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1 Score: 1.0000

Fold 5 Confusion Matrix:
[[10  0]
 [ 0 10]]
Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1 Score: 1.0000

Fold 6 Confusion Matrix:
[[10  0]
 [ 5  5]]
Accuracy: 0.7500, Precision: 0.8333, Recall: 0.7500, F1 Score: 0.7333

Fold 7 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 8 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 9 Confusion Matrix:
[[ 7  3]
 [ 0 10]]
Accuracy: 0.8500, Precision: 0.8846, Rec

- c) Paired t-test: In cross-validation, the accuracy results obtained by both methods in each fold can be directly compared using the paired t-test.


According to the test results, t-statistic (-0.9186) and p-value (0.3823) results show that there is no statistically significant difference in accuracy performance between the two deep learning methods.

In [99]:
from scipy.stats import ttest_rel

# Accuracy results of Real and Complex method:
complex_method1 = [0.7500, 0.8500, 0.9000, 0.8500, 0.9000, 0.9500, 0.9000, 0.9000, 0.8500, 0.9500]
real_method2 = [0.9000, 0.9000, 0.8500, 1.0000, 1.0000, 0.7500, 0.9500, 0.9000, 0.8500, 1.0000]

# Paired t-test
t_stat, p_value = ttest_rel(complex_method1, real_method2)

print(f"T-Statistic: {t_stat}, P-value: {p_value}")

if p_value < 0.05:
    print("There is a statistically significant difference between the two models.")
else:
    print("There is no statistically significant difference between the two models.")

T-Statistic: -0.9185586535436918, P-value: 0.3822841681565753
There is no statistically significant difference between the two models.


# 2.	Codon Sequence Classification: Complex-ANN vs. Real-ANN
- a) Complex-ANN: Cross-validation and metric averages of classification with complex-valued artificial neural networks.


In [88]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Layer, Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.utils import to_categorical

class ComplexDense(Layer):
    def __init__(self, units, activation=None, **kwargs):
        super(ComplexDense, self).__init__(**kwargs)
        self.units = units
        self.activation = activation

    def build(self, input_shape):
        self.kernel = self.add_weight(shape=(input_shape[-1], self.units, 2),
                                      initializer='random_normal',
                                      trainable=True)
        self.bias = self.add_weight(shape=(self.units, 2),
                                    initializer='zeros',
                                    trainable=True)

    def call(self, inputs):
        inputs_real, inputs_imag = tf.math.real(inputs), tf.math.imag(inputs)
        kernel_real, kernel_imag = self.kernel[..., 0], self.kernel[..., 1]
        bias_real, bias_imag = self.bias[..., 0], self.bias[..., 1]

        output_real = tf.matmul(inputs_real, kernel_real) - tf.matmul(inputs_imag, kernel_imag) + bias_real
        output_imag = tf.matmul(inputs_real, kernel_imag) + tf.matmul(inputs_imag, kernel_real) + bias_imag

        output = tf.complex(output_real, output_imag)
        if self.activation:
            output = self.activation(output)
        return output

def complex_tanh(z):
    return tf.complex(tf.math.tanh(tf.math.real(z)), tf.math.tanh(tf.math.imag(z)))

def wirtinger_loss(y_true, y_pred):
    y_true = tf.cast(y_true, tf.complex64)
    y_pred = tf.cast(y_pred, tf.complex64)
    dF_dz = tf.math.conj(y_pred - y_true) 
    dF_dz_star = (y_pred - y_true)
    return tf.math.abs(dF_dz)**2 + tf.math.abs(dF_dz_star)**2

# Load the data
data = pd.read_excel("D:\datasetTEZ\KİNASE_GPCR_Codon_Complex_Encoding.xlsx")
X = np.array([np.array(list(map(float, x_real.strip("[]").split(',')))) + 1j * np.array(list(map(float, x_imag.strip("[]").split(',')))) for x_real, x_imag in zip(data['Real'], data['Imag'])])
y = data['label'].values
y = to_categorical(y, num_classes=2)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model definition
    input_layer = Input(shape=(X_train.shape[1],), dtype=tf.complex64)
    complex_dense1 = ComplexDense(12, activation=complex_tanh)(input_layer)
    complex_dense2 = ComplexDense(8, activation=complex_tanh)(complex_dense1)
    complex_dense3 = ComplexDense(6, activation=complex_tanh)(complex_dense2)
    output_layer = ComplexDense(2, activation=complex_tanh)(complex_dense3)
    model = Model(inputs=input_layer, outputs=output_layer)

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss=wirtinger_loss, metrics=['accuracy'])
    
    # Fit the model
    model.fit(X_train, y_train, batch_size=16, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[10  0]
 [ 4  6]]
Accuracy: 0.8000, Precision: 0.8571, Recall: 0.8000, F1 Score: 0.7917

Fold 2 Confusion Matrix:
[[9 1]
 [3 7]]
Accuracy: 0.8000, Precision: 0.8125, Recall: 0.8000, F1 Score: 0.7980

Fold 3 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 4 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 5 Confusion Matrix:
[[9 1]
 [2 8]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 Score: 0.8496

Fold 6 Confusion Matrix:
[[7 3]
 [1 9]]
Accuracy: 0.8000, Precision: 0.8125, Recall: 0.8000, F1 Score: 0.7980

Fold 7 Confusion Matrix:
[[9 1]
 [3 7]]
Accuracy: 0.8000, Precision: 0.8125, Recall: 0.8000, F1 Score: 0.7980

Fold 8 Confusion Matrix:
[[7 3]
 [3 7]]
Accuracy: 0.7000, Precision: 0.7000, Recall: 0.7000, F1 Score: 0.7000

Fold 9 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 


- b) Real-ANN: Cross-validation and metric averages of classification with real-valued artificial neural networks.


In [94]:
import pandas as pd
import numpy as np
import ast
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from keras.utils import to_categorical

# Load dataset
file_path = r"D:\\datasetTEZ\\KİNASE_GPCR_Kodon_ReelEncoded.xlsx"
data = pd.read_excel(file_path)
data['Encoded'] = data['Encoded'].apply(ast.literal_eval)

# Separate attributes and tags
X = np.array(data['Encoded'].tolist())
y = data['label'].values
y = to_categorical(y, num_classes=2)  # Convert labels to one-hot encoded format if needed

# Standardize data
sc = StandardScaler()
X = sc.fit_transform(X)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model definition
    model = Sequential()
    model.add(Dense(units=12, activation='relu', input_dim=X_train.shape[1]))
    model.add(Dropout(0.5))
    model.add(Dense(units=8, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(units=6, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(units=2, activation='sigmoid'))  # Adjust for binary classification

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss='binary_crossentropy', metrics=['accuracy'])

    # Fit the model
    model.fit(X_train, y_train, batch_size=10, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[9 1]
 [2 8]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 Score: 0.8496

Fold 2 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 3 Confusion Matrix:
[[9 1]
 [2 8]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 Score: 0.8496

Fold 4 Confusion Matrix:
[[ 9  1]
 [ 0 10]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 5 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 6 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 7 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 8 Confusion Matrix:
[[ 9  1]
 [ 0 10]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 9 Confusion Matrix:
[[10  0]
 [ 5  5]]
Accuracy: 0.7500, Precision: 0.8333, Recall:

- c) Paired t-test Analysis:

In cross-validation, the accuracy results obtained by both methods in each fold can be directly compared using the paired t-test. According to the paired t test results, the t statistic was calculated as -1.1319 and the p value was 0.2869. These results show that the performance difference between the two methods is not statistically significant.

In [98]:
from scipy.stats import ttest_rel

# Accuracy results of Real and Complex method:
complex_method1 = [0.8000, 0.8000, 0.9500, 0.9500, 0.8500, 0.8000, 0.8000, 0.7000, 0.9500, 0.8000]
real_method2 = [0.8500, 0.9000, 0.8500, 0.9500, 0.9000, 0.9000, 0.9500, 0.9500, 0.7500, 0.8500]

# Paired t-test
t_stat, p_value = ttest_rel(complex_method1, real_method2)

print(f"T-Statistic: {t_stat}, P-value: {p_value}")

if p_value < 0.05:
    print("There is a statistically significant difference between the two models.")
else:
    print("There is no statistically significant difference between the two models.")

T-Statistic: -1.13189888200586, P-value: 0.28693291899495443
There is no statistically significant difference between the two models.


# 3.	Amino Acid Sequence Classification: Complex-ANN vs. Real-ANN
- a) Complex-ANN: Cross-validation and metric averages of classification with complex-valued artificial neural networks.


In [74]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Layer, Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from tensorflow.keras.utils import to_categorical

class ComplexDense(Layer):
    def __init__(self, units, activation=None, **kwargs):
        super(ComplexDense, self).__init__(**kwargs)
        self.units = units
        self.activation = activation

    def build(self, input_shape):
        self.kernel = self.add_weight(shape=(input_shape[-1], self.units, 2),
                                      initializer='random_normal',
                                      trainable=True)
        self.bias = self.add_weight(shape=(self.units, 2),
                                    initializer='zeros',
                                    trainable=True)

    def call(self, inputs):
        inputs_real, inputs_imag = tf.math.real(inputs), tf.math.imag(inputs)
        kernel_real, kernel_imag = self.kernel[..., 0], self.kernel[..., 1]
        bias_real, bias_imag = self.bias[..., 0], self.bias[..., 1]

        output_real = tf.matmul(inputs_real, kernel_real) - tf.matmul(inputs_imag, kernel_imag) + bias_real
        output_imag = tf.matmul(inputs_real, kernel_imag) + tf.matmul(inputs_imag, kernel_real) + bias_imag

        output = tf.complex(output_real, output_imag)
        if self.activation:
            output = self.activation(output)
        return output

import tensorflow as tf

import tensorflow as tf


    
def complex_tanh(z):
    return tf.complex(tf.math.tanh(tf.math.real(z)), tf.math.tanh(tf.math.imag(z)))

def wirtinger_loss(y_true, y_pred):
    y_true = tf.cast(y_true, tf.complex64)
    y_pred = tf.cast(y_pred, tf.complex64)
    dF_dz = tf.math.conj(y_pred - y_true) 
    dF_dz_star = (y_pred - y_true)
    return tf.math.abs(dF_dz)**2 + tf.math.abs(dF_dz_star)**2

# Load the data
data = pd.read_excel("D:\datasetTEZ\Kinase_GPCR_AminoAcid_Fasta_Complex_Encoded.xlsx")
X = np.array([np.array(list(map(float, x_real.strip("[]").split(',')))) + 1j * np.array(list(map(float, x_imag.strip("[]").split(',')))) for x_real, x_imag in zip(data['Real'], data['Imag'])])
y = data['label'].values
y = to_categorical(y, num_classes=2)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model definition
    input_layer = Input(shape=(X_train.shape[1],), dtype=tf.complex64)
    complex_dense1 = ComplexDense(12, activation=complex_tanh)(input_layer)
    complex_dense2 = ComplexDense(8, activation=complex_tanh)(complex_dense1)
    complex_dense3 = ComplexDense(6, activation=complex_tanh)(complex_dense2)
    output_layer = ComplexDense(2, activation=complex_tanh)(complex_dense3)
    model = Model(inputs=input_layer, outputs=output_layer)

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss=wirtinger_loss, metrics=['accuracy'])
    
    # Fit the model
    model.fit(X_train, y_train, batch_size=10, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 2 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 3 Confusion Matrix:
[[ 6  4]
 [ 0 10]]
Accuracy: 0.8000, Precision: 0.8571, Recall: 0.8000, F1 Score: 0.7917

Fold 4 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 5 Confusion Matrix:
[[10  0]
 [ 2  8]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 6 Confusion Matrix:
[[10  0]
 [ 5  5]]
Accuracy: 0.7500, Precision: 0.8333, Recall: 0.7500, F1 Score: 0.7333

Fold 7 Confusion Matrix:
[[10  0]
 [ 0 10]]
Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1 Score: 1.0000

Fold 8 Confusion Matrix:
[[7 3]
 [1 9]]
Accuracy: 0.8000, Precision: 0.8125, Recall: 0.8000, F1 Score: 0.7980

Fold 9 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000,


- b) Real-ANN: Cross-validation and metric averages of classification with real-valued artificial neural networks.


In [81]:
import pandas as pd
import numpy as np
import ast
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from keras.utils import to_categorical

# Load dataset
file_path = r"D:\datasetTEZ\KİNASE_GPCR_AminoAcid_EncodedReel.xlsx"
data = pd.read_excel(file_path)
data['Encoded'] = data['Encoded'].apply(ast.literal_eval)

# Separate attributes and tags
X = np.array(data['Encoded'].tolist())
y = data['label'].values
y = to_categorical(y, num_classes=2)  # Convert labels to one-hot encoded format if needed

# Standardize data
sc = StandardScaler()
X = sc.fit_transform(X)

# Define 5-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

# Initialize lists to store results
accuracies, precisions, recalls, f1_scores, conf_matrices = [], [], [], [], []

# Start cross-validation
for train_index, test_index in kf.split(X, np.argmax(y, axis=1)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model definition
    model = Sequential()
    model.add(Dense(units=12, activation='relu', input_dim=X_train.shape[1]))
    model.add(Dropout(0.5))
    model.add(Dense(units=8, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(units=6, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(units=2, activation='sigmoid'))  # Adjust for binary classification

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.005), loss='binary_crossentropy', metrics=['accuracy'])

    # Fit the model
    model.fit(X_train, y_train, batch_size=10, epochs=80, verbose=0)

    # Predict on the test set
    y_pred_proba = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred_proba, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Calculate metrics
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    accuracy = accuracy_score(y_true_classes, y_pred_classes)
    precision = precision_score(y_true_classes, y_pred_classes, average='macro')
    recall = recall_score(y_true_classes, y_pred_classes, average='macro')
    f1 = f1_score(y_true_classes, y_pred_classes, average='macro')

    # Store the results
    conf_matrices.append(cm)
    accuracies.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

# Display the results from all folds
for i, cm in enumerate(conf_matrices, 1):
    print(f"Fold {i} Confusion Matrix:\n{cm}")
    print(f"Accuracy: {accuracies[i-1]:.4f}, Precision: {precisions[i-1]:.4f}, Recall: {recalls[i-1]:.4f}, F1 Score: {f1_scores[i-1]:.4f}\n")

# Calculate and display the mean and standard deviation of the metrics
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
mean_precision = np.mean(precisions)
std_precision = np.std(precisions)
mean_recall = np.mean(recalls)
std_recall = np.std(recalls)
mean_f1 = np.mean(f1_scores)
std_f1 = np.std(f1_scores)

print(f"Average Accuracy: {mean_accuracy:.4f} ± {std_accuracy:.4f}")
print(f"Average Precision: {mean_precision:.4f} ± {std_precision:.4f}")
print(f"Average Recall: {mean_recall:.4f} ± {std_recall:.4f}")
print(f"Average F1 Score: {mean_f1:.4f} ± {std_f1:.4f}")


Fold 1 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 2 Confusion Matrix:
[[ 7  3]
 [ 0 10]]
Accuracy: 0.8500, Precision: 0.8846, Recall: 0.8500, F1 Score: 0.8465

Fold 3 Confusion Matrix:
[[9 1]
 [1 9]]
Accuracy: 0.9000, Precision: 0.9000, Recall: 0.9000, F1 Score: 0.9000

Fold 4 Confusion Matrix:
[[ 5  5]
 [ 0 10]]
Accuracy: 0.7500, Precision: 0.8333, Recall: 0.7500, F1 Score: 0.7333

Fold 5 Confusion Matrix:
[[ 8  2]
 [ 0 10]]
Accuracy: 0.9000, Precision: 0.9167, Recall: 0.9000, F1 Score: 0.8990

Fold 6 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 7 Confusion Matrix:
[[10  0]
 [ 1  9]]
Accuracy: 0.9500, Precision: 0.9545, Recall: 0.9500, F1 Score: 0.9499

Fold 8 Confusion Matrix:
[[9 1]
 [2 8]]
Accuracy: 0.8500, Precision: 0.8535, Recall: 0.8500, F1 Score: 0.8496

Fold 9 Confusion Matrix:
[[10  0]
 [ 4  6]]
Accuracy: 0.8000, Precision: 0.8571, Recall: 0.8

- c) Paired t-test Analysis:

In cross-validation, the accuracy results obtained by both methods in each fold can be directly compared using the paired t-test. According to the paired t test results, the t statistic was calculated as -0.4657 and the p value was 0.6525. These results show that the performance difference between the two methods is not statistically significant.



In [97]:
from scipy.stats import ttest_rel

# Accuracy results of Real and Complex method:
complex_method1 = [0.9000, 0.9000, 0.8000, 0.9500, 0.9000, 0.7500, 1.0000, 0.8000, 0.9000, 0.7500]
real_method2 = [0.9000, 0.8500, 0.9000, 0.7500, 0.9000, 0.9500, 0.9500, 0.8500, 0.8000, 1.0000]

# Paired t-test
t_stat, p_value = ttest_rel(complex_method1, real_method2)

print(f"T-Statistic: {t_stat}, P-value: {p_value}")

if p_value < 0.05:
    print("There is a statistically significant difference between the two models.")
else:
    print("There is no statistically significant difference between the two models.")

T-Statistic: -0.46569031542379935, P-value: 0.652501015204662
There is no statistically significant difference between the two models.
