# Deep Learning

# Utilizing Apache Spark and PySpark for Deep Learning in Neuroimaging Studies

## Why Spark?

- **Efficient Large Data Handling:**
   - Spark is designed for fast computation and handling big datasets, essential for neuroimaging data.

- **PySpark - Python Integration:**
   - PySpark provides a Python interface to Spark, making it easier to implement deep learning algorithms using Python libraries.

- **Distributed Computing:**
   - Spark's distributed nature allows for parallel processing, speeding up the deep learning tasks.

- **MLlib for Machine Learning:**
   - Spark's MLlib offers machine learning algorithms optimized for big data, which we can leverage for our deep learning models.

## Advantages:

- Spark handles our complex and large participant data efficiently.
- PySpark allows seamless integration with Python's deep learning libraries.
- Distributed processing capability accelerates our deep learning computations.
- Scalability ensures that our system can grow with our data needs.

In summary, Apache Spark and PySpark provide a robust, scalable, and efficient platform for deep learning in our neuroimaging study, enabling us to process large datasets with speed and ease.


## 1 - Setting up Spark cluster

In [6]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName("PySpark_VSCode").getOrCreate()


In [7]:
import pandas as pd
from tensorflow import keras
from tensorflow.keras import models
from keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input


In [8]:
combined_df = pd.read_csv('/home/skander/combined.csv')

In [9]:

# Set 'participant_id' as index
combined_df_with_index = combined_df.set_index('participant_id')

# Extract features for pattern discovery
features = combined_df_with_index  # All columns except 'participant_id'

# Now features DataFrame is ready for unsupervised learning techniques



## 2 - Neuron layer

In [11]:
# Adjust the input dimension to match the number of features in your dataset
input_dim = features.shape[1]  # Number of features

# Define the size of the encoded representations
encoding_dim = 32  # experiment with this value

# Input layer
input_layer = Input(shape=(input_dim,))

# Encoded layer - where the data is compressed
encoded = Dense(encoding_dim, activation='relu')(input_layer)

# Decoded layer - reconstruction of the input
decoded = Dense(input_dim, activation='sigmoid')(encoded)

# Autoencoder model
autoencoder = Model(input_layer, decoded)

# Compile the model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Train the autoencoder
autoencoder.fit(features, features,  # Input and output are the same for an autoencoder
                epochs=50,          # Number of epochs
                batch_size=256,     # Batch size
                shuffle=True)       # Shuffle the data

Epoch 1/50


2024-02-01 09:00:40.486215: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-01 09:00:40.585502: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7fed042e5d20>