## Research Skip Connections

1.3. Skip Connection.

Skip Connection is a standard module in many convolutional architectures. By using Skip Connection, we provide an alternative path for the gradient (with backpropagation). These additional paths are beneficial for model convergence.

Skip Connection: skip some layer in neural network and feeds the output of one layer as the input to the next layers (instead of only the next one).

When using the chain rule, we must keep multiplying terms with the error gradient as we go backward. However, in long chain of multiplication, if we multiply many things together that are less than 1, the resulting gradient will be very small. Therefore, the gradient becomes very small as we approach the earlier layers in a deep architecture. In some cases, the gradient becomes zero, meaning that we do not update the early layers at all.

There are two fundamental ways that we could use Skip Connections through different non-sequential layers:

Addition as in residual architectures.
Concatenation as in densely connected architecture.\

1
What are skip connections?
Skip connections are a type of shortcut that connects the output of one layer to the input of another layer that is not adjacent to it. For example, in a CNN with four layers, A, B, C, and D, a skip connection could connect layer A to layer C, or layer B to layer D, or both. Skip connections can be implemented in different ways, such as adding, concatenating, or multiplying the outputs of the skipped layers.


Creating a test similar to the one described involves several steps, encompassing the design of the neural network architecture, implementing skip connections, using a specific activation function, applying L2 regularization, and introducing rotation transformations during training. Below is a high-level guide on how to set up and conduct such an experiment using Python and a popular deep learning library like TensorFlow or PyTorch.

### Step 1: Environment Setup

Ensure you have Python installed on your machine along with TensorFlow or PyTorch. This guide will use TensorFlow for illustration purposes, but the concepts are transferable to PyTorch with similar functionalities.

```bash
pip install tensorflow numpy
```

### Step 2: Create the Neural Network Model

Define a neural network model with three layers, using skip connections and the `tanh` activation function. TensorFlow's functional API can handle skip connections easily.

```python
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers

def create_model(input_shape):
    inputs = layers.Input(shape=input_shape)

    # Define three layers with skip connections
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    skip1 = layers.Add()([x, inputs])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
    skip2 = layers.Add()([x, skip1])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
    output = layers.Add()([x, skip2])

    model = models.Model(inputs=inputs, outputs=output)
    return model
```

### Step 3: Training Setup

Prepare your dataset. Since this experiment involves learning from random data, you can generate synthetic data pairs `(x, y)` where both `x` and `y` are random vectors of the same dimension.

```python
import numpy as np

# Generate random training data
num_samples = 1000
input_shape = 20  # Example input shape
x_train = np.random.rand(num_samples, input_shape)
y_train = np.random.rand(num_samples, input_shape)
```

### Step 4: Initial Training

Train the model using gradient descent, typically by compiling and fitting the model in TensorFlow.

```python
model = create_model(input_shape)

model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, batch_size=32)
```

### Step 5: Apply Rotation Transformations

Implement a custom training loop where you apply the rotation transformations to the weight matrices after each training step. The rotation transformation can be applied as described, using small random anti-symmetric matrices `H_i`.

```python
def apply_rotation(W, H):
    # Example rotation transformation implementation
    return W + np.dot(H, W) - np.dot(W, H)

# Custom training loop to apply rotation
for epoch in range(10):  # Additional training epochs
    # Standard training step
    model.fit(x_train, y_train, epochs=1, batch_size=32, verbose=0)

    # Apply rotation transformations to each layer's weights
    for layer in model.layers:
        if 'dense' in layer.name:  # Targeting only Dense layers
            W = layer.get_weights()[0]  # Get current weights
            H = np.random.rand(*W.shape)  # Generate H matrix
            H = H - H.T  # Make H anti-symmetric
            W_rotated = apply_rotation(W, H * 0.01)  # Small rotation
            layer.set_weights([W_rotated, layer.get_weights()[1]])  # Update weights
```

### Step 6: Evaluation and Interpretation

After training, evaluate the model's performance on test data or further analyze the model's behavior and weight changes due to rotation transformations. This process involves looking at the loss and accuracy metrics or using visualization tools to understand the weight matrix dynamics over training epochs.

This experiment is a simplified illustration. The actual implementation, especially the rotation transformation part, would need more nuanced handling to closely mimic the quantum mechanical rotation analogy and to maintain the network's stability and performance as described.


In [None]:
pip install tensorflow

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers

def create_model(input_shape):
    inputs = layers.Input(shape=input_shape)
    
    # Define three layers with skip connections
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    skip1 = layers.Add()([x, inputs])
    
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
    skip2 = layers.Add()([x, skip1])
    
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
    output = layers.Add()([x, skip2])
    
    model = models.Model(inputs=inputs, outputs=output)
    return model


## Research proj two breaking it down step by step

Experiment: Infinitely Woven Skip Connections
Goal: Develop a novel skip connection technique that allows for infinitely weaving connections between a base grid and the input/output of a neural network.

Experiment Design:

1. Base Grid:

Create a constant base grid of integers with dimension 100 x 100.
Each element in the grid represents a unique identifier for that specific location. 2. Neural Network:

Define a neural network architecture suitable for your task, taking the original 100 x 100 integer matrix as input.
The network's output can be either:
A single vector of size 100 x 100 (corresponding to each element in the input matrix).
A matrix of the same dimension (100 x 100) containing different values for each element. 3. Infinitely Woven Skip Connections:

Weaving Unit: Design a weaving unit that takes three inputs:
Base Grid Element: A single element from the base grid (integer).
Network Input: The corresponding element from the original input matrix (integer).
Network Output: The corresponding element from the network's output (either a scalar or a vector element, depending on the chosen output format).
The weaving unit can perform various operations on the input elements depending on your specific needs. Examples include:
Concatenation: Combine the base grid element, network input, and network output into a single vector.
Gated fusion: Use a gating mechanism to selectively incorporate information from the network input and output while respecting the base grid element.
Learned transformation: Apply a learned transformation on the concatenated information (or any other combination) to generate the final output.
The weaving unit's output becomes the final "woven" representation for that element. 4. Output Layer:

Create an output layer with the same dimension as the input/output (100 x 100).
Each element in the output layer is obtained by applying the weaving unit to the corresponding element in the base grid, its corresponding element in the network input, and its corresponding element in the network output. 5. Training:

Define a cost function based on your specific task and optimize the entire network, including the weaving unit, using gradient-based optimization algorithms (e.g., Adam, SGD).
Evaluation:

Compare the performance of the proposed "infinitely woven skip connection" approach against a baseline model without skip connections and a model with standard skip connections (e.g., residual connections) on the same task and data.
Analyze the impact of different weaving unit designs on performance and efficiency.
Visualize or interpret the learned weights/parameters within the weaving unit to understand how the connections are being formed and utilized.
Benefits:

This approach allows for potentially richer information flow within the network by weaving connections between the base grid, initial input, and learned features throughout the network.
The concept of "infinitely woven connections" offers flexibility in how information from different parts of the network is incorporated.
Challenges:

Designing an efficient and effective weaving unit architecture is crucial.
Training the network with an additional layer of complexity might require careful hyperparameter tuning and potentially more data.
The interpretability of the learned "woven" representations might require further investigation.
Future Work:

Explore different weaving unit architectures and their impact on performance and interpretability.
Investigate the application of this technique to different tasks and network architectures.
Analyze the computational efficiency of this approach compared to standard skip connections.
This outline provides a starting point for your experiment. You can adapt the specific details of the neural network architecture, weaving unit design, and evaluation metrics to fit your specific needs and research question.


In [None]:
Setup for a Similar Project on Skip Connections and ResNet
Prerequisites:
Python 3.6 or later
TensorFlow 2.0 or later
PyTorch (optional)
Materials:
ResNet code from Medium article
Keras implementation of ResNet
Steps:
1. Data Preparation
Obtain the CIFAR-10 dataset or another image classification dataset.
Preprocess the data by resizing, normalization, and splitting into training and validation sets.
2. Model Architecture
Implement the ResNet architecture using either TensorFlow or PyTorch.
Define the skip connection paths as described in the Medium article.
3. Training and Evaluation
Train the model on the training set using an optimizer and loss function.
Evaluate the model's performance on the validation set.
Experiment with different hyperparameters, such as learning rate and batch size, to optimize performance.
4. Visualization and Analysis
Plot the training and validation loss curves to monitor the model's progress.
Use tensorboard to visualize the model's layers and activations.
Conduct ablation studies to analyze the impact of skip connections on the model's performance.
Additional Features:
Pre-trained models: Utilize pre-trained ResNet models for feature extraction or fine-tuning.
Data augmentation: Enhance the data variability by applying transformations such as cropping, flipping, and rotation.
Batch normalization: Improve model stability and convergence.
Tips:
Start with a simple ResNet architecture with a few layers to ensure understanding before scaling up.
Use a GPU for faster training and inference.
Explore other variants of ResNet, such as ResNeXt or Wide ResNet.
Extension:
Extend the project by exploring the use of skip connections in other deep neural network architectures, such as U-Nets or transformers.
Investigate the theoretical aspects of skip connections, such as their role in preventing vanishing gradients.

In [None]:
pip install tensorflow keras numpy matplotlib scikit-learn pandas

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import pandas as pd

# Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 1)
y = 2 * X + 1 + np.random.randn(100, 1) * 0.1

# Split data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)

# Define the model architecture
def create_model(input_shape):
    inputs = layers.Input(shape=input_shape)
    
    # Define three layers with skip connections
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    skip1 = layers.Add()([x, inputs])
    
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
    skip2 = layers.Add()([x, skip1])
    
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
    output = layers.Add()([x, skip2])
    
    model = models.Model(inputs=inputs, outputs=output)
    return model

# Create the model
model = create_model(input_shape=(1,))
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100, verbose=0)

# Plot the training and validation loss curves
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

# Visualize the model's layers and activations using tensorboard
# Load the TensorBoard notebook extension
load_ext tensorboard
# Start TensorBoard within the notebook using magics
tensorboard --logdir logs

# Conduct ablation studies to analyze the impact of skip connections on the model's performance
# Define a function to create the model with or without skip connections

def create_model(input_shape, use_skip_connections=True):
    inputs = layers.Input(shape=input_shape)
    
    # Define three layers with or without skip connections
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    if use_skip_connections:
        skip1 = layers.Add()([x, inputs])
        x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
        skip2 = layers.Add()([x, skip1])
        x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
        output = layers.Add()([x, skip2])
    else:
        x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(x)
        x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(x)
        output = layers.Dense(1)(x)
    
    model = models.Model(inputs=inputs, outputs=output)
    return model

# Train the model with skip connections
model_with_skip_connections = create_model(input_shape=(1,))
model_with_skip_connections.compile(optimizer='adam', loss='mean_squared_error')
history_with_skip_connections = model_with_skip_connections.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100, verbose=0)

# Train the model without skip connections
model_without_skip_connections = create_model(input_shape=(1,), use_skip_connections=False)
model_without_skip_connections.compile(optimizer='adam', loss='mean_squared_error')
history_without_skip_connections = model_without_skip_connections.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100, verbose=0)

# Plot the training and validation loss curves for both models
plt.plot(history_with_skip_connections.history['loss'], label='Training Loss (with skip connections)')
plt.plot(history_with_skip_connections.history['val_loss'], label='Validation Loss (with skip connections)')
plt.plot(history_without_skip_connections.history['loss'], label='Training Loss (without skip connections)')
plt.plot(history_without_skip_connections.history['val_loss'], label='Validation Loss (without skip connections)')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Save the model's training history to a CSV file
history_df = pd.DataFrame({'Training Loss (with skip connections)': history_with_skip_connections.history['loss'],
                           'Validation Loss (with skip connections)': history_with_skip_connections.history['val_loss'],
                           'Training Loss (without skip connections)': history_without_skip_connections.history['loss'],
                           'Validation Loss (without skip connections)': history_without_skip_connections.history['val_loss']})
history_df.to_csv('model_training_history.csv', index=False)

# Load the model's training history from the CSV file
loaded_history_df = pd.read_csv('model_training_history.csv')
print(loaded_history_df)

# Use pre-trained ResNet models for feature extraction or fine-tuning
# Load the pre-trained ResNet50 model
resnet_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Extract features from the pre-trained ResNet model
features = resnet_model.predict(X_train)
print(features.shape)

# Fine-tune the pre-trained ResNet model
for layer in resnet_model.layers:
    layer.trainable = False
model = models.Sequential([
    resnet_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10)

# Enhance the data variability by applying transformations such as cropping, flipping, and rotation
# Define data augmentation layers
data_augmentation = tf.keras.Sequential([
    layers.experimental.preprocessing.RandomFlip("horizontal"),
    layers.experimental.preprocessing.RandomRotation(0.1),
    layers.experimental.preprocessing.RandomZoom(0.1),
])

# Visualize the augmented images
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
    for i in range(9):
        augmented_images = data_augmentation(images)
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        plt.axis("off")
        
# Improve model stability and convergence using batch normalization
# Define the model architecture with batch normalization
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.BatchNormalization(),
    layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, validation_data=(val_images, val_labels), epochs=10)

# Explore other variants of ResNet, such as ResNeXt or Wide ResNet
# Load the ResNeXt50 model
resnext_model = tf.keras.applications.ResNeXt50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Load the Wide ResNet50 model
wideresnet_model = tf.keras.applications.WideResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Explore the use of skip connections in other deep neural network architectures, such as U-Nets or transformers

# Investigate the theoretical aspects of skip connections, such as their role in preventing vanishing gradients


In [None]:
#  This function simply resizes the images to fit in AlexNet
#  Copyright 2017 The MathWorks, Inc.
function I = readFunctionTrain(filename)
# Resize the images to the size required by the network.
I = imread(filename);
I = imresize(I, [227 227]);

function saveCIFAR10AsFolderOfImages(inputPath, outputPath, varargin)
 saveCIFAR10AsFolderOfImages   Save the CIFAR-10 dataset as a folder of images
   saveCIFAR10AsFolderOfImages(inputPath, outputPath) takes the CIFAR-10
   dataset located at inputPath and saves it as a folder of images to the
   directory outputPath. If inputPath or outputPath is an empty string, it
   is assumed that the current folder should be used.

   saveCIFAR10AsFolderOfImages(..., labelDirectories) will save the
   CIFAR-10 data so that instances with the same label will be saved to
   sub-directories with the name of that label.
    Check input directories are valid
if(~isempty(inputPath))
    assert(exist(inputPath,'dir') == 7);
end
if(~isempty(outputPath))
    assert(exist(outputPath,'dir') == 7);
end
 Check if we want to save each set with the same labels to its own
 directory.
if(isempty(varargin))
    labelDirectories = false;
else
    assert(nargin == 3);
    labelDirectories = varargin{1};
end
 Set names for directories
trainDirectoryName = 'cifar10Train';
testDirectoryName = 'cifar10Test';
 Create directories for the output
mkdir(fullfile(outputPath, trainDirectoryName));
mkdir(fullfile(outputPath, testDirectoryName));
if(labelDirectories)
    labelNames = {'airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck'};
    iMakeTheseDirectories(fullfile(outputPath, trainDirectoryName), labelNames);
    iMakeTheseDirectories(fullfile(outputPath, testDirectoryName), labelNames);
    for i = 1:5
        iLoadBatchAndWriteAsImagesToLabelFolders(fullfile(inputPath,['data_batch_' num2str(i) '.mat']), fullfile(outputPath, trainDirectoryName), labelNames, (i-1)*10000);
    end
    iLoadBatchAndWriteAsImagesToLabelFolders(fullfile(inputPath,'test_batch.mat'), fullfile(outputPath, testDirectoryName), labelNames, 0);
else
    for i = 1:5
        iLoadBatchAndWriteAsImages(fullfile(inputPath,['data_batch_' num2str(i) '.mat']), fullfile(outputPath, trainDirectoryName), (i-1)*10000);
    end
    iLoadBatchAndWriteAsImages(fullfile(inputPath,'test_batch.mat'), fullfile(outputPath, testDirectoryName), 0);
end
end
function iLoadBatchAndWriteAsImagesToLabelFolders(fullInputBatchPath, fullOutputDirectoryPath, labelNames, nameIndexOffset)
load(fullInputBatchPath);
data = data'; #ok<NODEF>
data = reshape(data, 32,32,3,[]);
data = permute(data, [2 1 3 4]);
for i = 1:size(data,4)
    imwrite(data(:,:,:,i), fullfile(fullOutputDirectoryPath, labelNames{labels(i)+1}, ['image' num2str(i + nameIndexOffset) '.png']));
end
end
function iLoadBatchAndWriteAsImages(fullInputBatchPath, fullOutputDirectoryPath, nameIndexOffset)
load(fullInputBatchPath);
data = data'; #ok<NODEF>
data = reshape(data, 32,32,3,[]);
data = permute(data, [2 1 3 4]);
for i = 1:size(data,4)
    imwrite(data(:,:,:,i), fullfile(fullOutputDirectoryPath, ['image' num2str(i + nameIndexOffset) '.png']));
end
end
function iMakeTheseDirectories(outputPath, directoryNames)
for i = 1:numel(directoryNames)
    mkdir(fullfile(outputPath, directoryNames{i}));
end
end

In [None]:
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))

In [None]:
import numpy as np
import pandas as pd

# Generate a 100x100 grid of random integers (for demonstration purposes)
# Let's assume values range from 1 to 100 for simplicity
grid = np.random.randint(1, 101, size=(100, 100))

# Perform a simplified "transform" by summing values along one dimension
# This is akin to reducing the dimensionality of the data for further analysis
transformed_data = np.sum(grid, axis=0)  # Sum along columns

# For demonstration, let's show the shape of the original and transformed data
original_shape = grid.shape
transformed_shape = transformed_data.shape

original_shape, transformed_shape, transformed_data[:10]  # Display shapes and first 10 values of transformed data


In [None]:
import numpy as np
import pandas as pd

base_grid = pd.read_csv('../storage/a.csv', header=None)
# Generate a 100x100 grid of random integers (for demonstration purposes)
# Let's assume values range from 1 to 100 for simplicity
grid = base_grid

# Perform a simplified "transform" by summing values along one dimension
# This is akin to reducing the dimensionality of the data for further analysis
transformed_data = np.sum(grid, axis=0)  # Sum along columns

# For demonstration, let's show the shape of the original and transformed data
original_shape = grid.shape
transformed_shape = transformed_data.shape

original_shape, transformed_shape, transformed_data[:10]  # Display shapes and first 10 values of transformed data


Periodic summation is a concept from mathematics that allows for the extension of an integrable function \(s(t)\) into a periodic function \(s_P(t)\) with a period \(P\). This is achieved by summing shifted copies of the original function \(s(t)\) at intervals that are integer multiples of \(P\). Mathematically, this is represented as:

\[s*P(t) = \sum*{n=-\infty}^{\infty} s(t + nP)\]

This technique is pivotal in signal processing and analysis, particularly when dealing with Fourier transforms, as it allows for the transformation of non-periodic signals into periodic ones. The Fourier coefficients of the periodically summed function \(s_P(t)\) correspond to the values of the continuous Fourier transform of \(s(t)\) at intervals of \(\frac{1}{P}\). This relationship is an instance of the Poisson summation formula.

To illustrate periodic summation and its effects, let's simulate a simple integrable function and apply periodic summation to it. We'll use a discretized approach suitable for computational demonstration. Specifically, we will:

1. Define a simple integrable function \(s(t)\) over a discrete set of points.
2. Apply periodic summation to create a periodic function \(s_P(t)\) with a chosen period \(P\).
3. Visualize the original and periodically summed functions for comparison.

Let's start by defining a simple function \(s(t)\) and then perform periodic summation on it.

In the demonstration above, we've illustrated the concept of periodic summation with a simple function \(s(t) = e^{-t}\) over an interval \([0, 5)\) and applied periodic summation to extend it beyond its original domain, making it periodic with a period \(P = 5\).

- **Original Function \($s(t)$):** The left plot shows the original function \(s(t) = e^{-t}\), defined over the interval \([0, 5)\). This function decays exponentially and is not inherently periodic.
- **Periodically Summed Function \($s_P(t)$):** The right plot displays the result of applying periodic summation to \(s(t)\), effectively replicating and shifting the function to create a periodic extension. For visualization purposes, we've replicated the function three times, but theoretically, the summation extends infinitely in both directions.

This process demonstrates how periodic summation transforms a non-periodic function into a periodic one, a principle that underpins various applications in signal processing and Fourier analysis. The resultant periodic function can then be analyzed using tools like Fourier series, revealing insights about the original signal's frequency components.


In [None]:
import matplotlib.pyplot as plt

# Define a simple function s(t) = exp(-t) for t in [0, 5)
t = np.linspace(0, 5, 1000, endpoint=False)
s_t = np.exp(-t)

# Choose a period P for the summation
P = 5
# Extended t for periodic summation
t_extended = np.linspace(-P, 2*P, 3000, endpoint=False)

# Perform periodic summation: for simplicity in a discrete setting, we replicate and shift the function
s_P_t = np.tile(s_t, 3)  # Replicate s(t) three times for demonstration

# Plot the original and periodically summed function
plt.figure(figsize=(14, 6))

# Original function plot
plt.subplot(1, 2, 1)
plt.plot(t, s_t, label='$s(t) = e^{-t}$')
plt.title('Original Function $s(t)$')
plt.xlabel('t')
plt.ylabel('$s(t)$')
plt.legend()

# Periodically summed function plot
plt.subplot(1, 2, 2)
plt.plot(t_extended, s_P_t, label='$s_P(t)$ with $P=5$')
plt.title('Periodically Summed Function $s_P(t)$')
plt.xlabel('t')
plt.ylabel('$s_P(t)$')
plt.legend()

plt.tight_layout()
plt.show()


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

base_grid = pd.read_csv('../storage/a.csv', header=None)
# Generate a 100x100 grid of random integers (for demonstration purposes)
# Let's assume values range from 1 to 100 for simplicity
grid = base_grid

# Perform a simplified "transform" by summing values along one dimension (akin to reducing dimensionality)
transformed_data = np.sum(grid, axis=0)  # Sum along columns

# Conceptually, to apply "periodic summation" to the transformed data, we would replicate this data
# Here, we simulate this by replicating the transformed data three times to illustrate periodicity
periodic_summation_example = np.tile(transformed_data, 55)  # Replicate 3 times for illustration

# Show the shape of the original, transformed, and "periodically summed" data
original_shape = grid.shape
transformed_shape = transformed_data.shape
periodic_summed_shape = periodic_summation_example.shape

original_shape, transformed_shape, periodic_summed_shape, periodic_summation_example[:10]




In [None]:
# Step 1: Generate a 100x100 grid of random integers
np.random.seed(0)  # For reproducibility
grid = np.random.randint(1, 101, size=(100, 100))

# Step 2: Transform the data by summing along one dimension
transformed_data = np.sum(grid, axis=0)  # Sum along columns

# Step 3: Apply "periodic summation" by replicating the transformed data
periodic_summation_example = np.tile(transformed_data, 3)  # Replicate 3 times

# Visualization
fig, axs = plt.subplots(3, 1, figsize=(10, 15))

# Original Grid Visualization
axs[0].imshow(grid, cmap='viridis')
axs[0].set_title('Original 100x100 Grid')
axs[0].axis('off')

# Transformed Data Visualization
axs[1].plot(transformed_data)
axs[1].set_title('Transformed Data (Summed along one dimension)')
axs[1].set_xlabel('Index')
axs[1].set_ylabel('Summed Value')

# Periodically Summed Data Visualization
axs[2].plot(periodic_summation_example)
axs[2].set_title('Periodically Summed Data')
axs[2].set_xlabel('Index')
axs[2].set_ylabel('Value')

plt.tight_layout()
plt.show()

# Output the shapes for clarity
original_shape, transformed_shape, periodic_summed_shape


In [None]:
pip install torch torchvision

## Dense neural networks

https://medium.com/@karuneshu21/implement-densenet-in-pytorch-46374ef91900


## Experiment with tahn and applying L2 regularization

Creating and conducting a test similar to the one described involves a structured approach that includes designing a neural network with specific features such as skip connections, using a `tanh` activation function, applying L2 regularization, and incorporating rotation transformations during the training process. This experiment demonstrates the application of concepts from quantum mechanics (like rotation transformations) in a deep learning context to potentially enhance the learning process or model performance. Let's break down the steps required to set up and conduct this experiment using Python and TensorFlow, a popular deep learning library.

### Step 1: Environment Setup

Before starting, ensure Python is installed on your machine along with TensorFlow, which will be used to build and train the neural network model. The installation can be done using pip, Python's package installer, by running the command below in your terminal or command prompt:

```bash
pip install tensorflow numpy
```

This command installs TensorFlow and NumPy, a library for numerical computations that's often used alongside TensorFlow for handling arrays and matrices.

### Step 2: Create the Neural Network Model

The neural network model is designed with three layers, incorporating skip connections between layers to potentially enhance information flow and mitigate issues like vanishing gradients. The `tanh` activation function is used for its non-linear properties, allowing the model to learn complex patterns. L2 regularization is applied to each layer to prevent overfitting by penalizing large weights.

Here's how you can define such a model in TensorFlow:

```python
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers

def create_model(input_shape):
    inputs = layers.Input(shape=input_shape)
    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    skip1 = layers.Add()([x, inputs])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
    skip2 = layers.Add()([x, skip1])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
    output = layers.Add()([x, skip2])

    model = models.Model(inputs=inputs, outputs=output)
    return model
```

### Step 3: Training Setup

To train the model, you'll need a dataset. In this experiment, synthetic data is used, where both input `x` and target `y` are random vectors of the same dimension. This setup allows for exploring the model's capacity to learn complex mappings from inputs to outputs without focusing on real-world data intricacies.

```python
import numpy as np

num_samples = 1000
input_shape = 20  # Example input shape
x_train = np.random.rand(num_samples, input_shape)
y_train = np.random.rand(num_samples, input_shape)
```

### Step 4: Initial Training

Train the model with the synthetic dataset using gradient descent. The `adam` optimizer is chosen for its efficiency and adaptability in various scenarios, and mean squared error is used as the loss function, suitable for regression problems like predicting continuous values.

```python
model = create_model(input_shape)

model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, batch_size=32)
```

### Step 5: Apply Rotation Transformations

To mimic quantum mechanical rotations, apply rotation transformations to the weight matrices of the model after each training epoch. This step is inspired by quantum mechanics and aims to explore new states (or configurations) of the model that might lead to better performance or faster convergence.

```python
def apply_rotation(W, H):
    return W + np.dot(H, W) - np.dot(W, H)

for epoch in range(10):  # Additional training epochs
    model.fit(x_train, y_train, epochs=1, batch_size=32, verbose=0)

    for layer in model.layers:
        if 'dense' in layer.name:
            W = layer.get_weights()[0]
            H = np.random.rand(*W.shape)
            H = H - H.T  # Make H anti-symmetric
            W_rotated = apply_rotation(W, H * 0.01)
            layer.set_weights([W_rotated, layer.get_weights()[1]])
```

### Step 6: Evaluation and Interpretation

After the training process, including the application of rotation transformations, evaluate the model's performance on unseen data (test data) to assess its generalization ability. Additionally, analyzing the model's behavior, such as how the weights change over training epochs due to the rotation transformations, can offer insights into the learning dynamics and the effects of the applied quantum mechanics-inspired techniques.

This experiment serves as a simplified illustration of how concepts from quantum mechanics can be applied in deep learning. The actual implementation and effectiveness of such techniques can vary depending on the complexity of the data and the specific problem

being addressed.


In [None]:
pip install tensorflow numpy


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers


# Create model
def create_model(input_shape):
    inputs = layers.Input(shape=(input_shape,))  # Note the comma to make it a tuple

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(inputs)
    skip1 = layers.Add()([x, inputs])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip1)
    skip2 = layers.Add()([x, skip1])

    x = layers.Dense(64, activation='tanh', kernel_regularizer=regularizers.l2(0.01))(skip2)
    output = layers.Add()([x, skip2])

    model = models.Model(inputs=inputs, outputs=output)
    return model


base_grid = pd.read_csv('../storage/a.csv', header=None)
base_grid_flat = base_grid.to_numpy().flatten()
# Generate a 100x100 grid of random integers (for demonstration purposes)
# Let's assume values range from 1 to 100 for simplicity
grid = base_grid



num_samples = 10000
input_shape = 64  # Example input shape
x_train = grid
y_train = np.random.rand(num_samples, input_shape)

model = create_model(input_shape)

model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, batch_size=32)

def apply_rotation(W, H):
    return W + np.dot(H, W) - np.dot(W, H)

for epoch in range(10):  # Additional training epochs
    model.fit(x_train, y_train, epochs=1, batch_size=32, verbose=0)

    for layer in model.layers:
        if 'dense' in layer.name:
            W = layer.get_weights()[0]
            H = np.random.rand(*W.shape)
            H = H - H.T  # Make H anti-symmetric
            W_rotated = apply_rotation(W, H * 0.01)
            layer.set_weights([W_rotated, layer.get_weights()[1]])




In [None]:
import numpy as np

# Create a list of numbers
numbers = [1, 2, 3, 4, 5]

# Convert the list to a NumPy array
array = np.array(numbers)

# Calculate the sum of squares
sum_of_squares = np.sum(array ** 2)

print(sum_of_squares)  # Output: 55

In [None]:
numbers = [1, 2, 3, 4, 5]

sum_of_squares = sum(x ** 2 for x in numbers)

print(sum_of_squares)  # Output: 55

In [None]:
import numpy as np

# Define two matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[1.1, 2.1], [3.2, 4.2]])

# Calculate the Frobenius norm of the difference
frobenius_norm = np.linalg.norm(A - B, 'fro')

print(frobenius_norm)  # Output: 0.4472135954999579

Mathematically, if A and B are two matrices of the same size, the Frobenius norm of their difference is calculated as:
||A - B||\_F = sqrt(sum((A - B) \*\* 2))


In [None]:
import numpy as np

# Define two matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[1.1, 2.1], [3.2, 4.2]])

# Calculate the Frobenius norm of the difference
frobenius_norm = np.linalg.norm(A - B, 'fro')

print(frobenius_norm)  # Output: 0.4472135954999579

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers

# create base grid
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid
# Define two matrices
A = grid
B = np.array([[1.1, 2.1], [3.2, 4.2]])

# Calculate the Frobenius norm of the difference
frobenius_norm = np.linalg.norm(A - B, 'fro')

print(frobenius_norm)  # Output: 0.4472135954999579

In [None]:
def substitution(input_block, sbox):
    """Apply S-box substitution on the input block."""
    output = 0
    for i in range(0, len(input_block)):
        # Extract the corresponding bit group and apply the S-box
        output |= sbox[input_block[i]] << (4 * i)
    return output

def permutation(input_block, pbox):
    """Apply P-box permutation on the input block."""
    output = 0
    for i, p in enumerate(pbox):
        bit = (input_block >> i) & 1
        output |= bit << p
    return output

def key_mixing(input_block, subkey):
    """Mix the input block with the subkey using XOR."""
    return input_block ^ subkey

def spn_encrypt(plaintext, keys, sbox, pbox):
    """Encrypt using a simple SPN."""
    state = plaintext
    for key in keys[:-1]:  # Last key is used after the final round
        state = substitution(state, sbox)
        state = permutation(state, pbox)
        state = key_mixing(state, key)
    # Apply the final round (without permutation)
    state = substitution(state, sbox)
    state = key_mixing(state, keys[-1])
    return state

# Example S-box and P-box definitions
sbox = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8, 0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]
pbox = 

# Example keys for each round (assuming a 4-round SPN)
keys = [0x1234, 0x5678, 0x9abc, 0xdef0, 0x1111]

# Encrypting a 16-bit plaintext
plaintext = 0x1234
ciphertext = spn_encrypt(plaintext, keys, sbox, pbox)

print(f'Plaintext: 0x{plaintext:04x}')
print(f'Ciphertext: 0x{ciphertext:04x}')


To explore the concepts of Lehmer codes, inversion tables, Rothe diagrams, and generating permutations as described, let's break down the explanation into code examples and visual representations for better understanding. We will use a specific permutation as an example, as you mentioned, σ = (6, 3, 8, 1, 4, 9, 7, 2, 5), and illustrate how to work with Lehmer codes, inversion tables, and Rothe diagrams in Python.

### 1. Representing Permutations with Lehmer Codes

First, let's discuss how to convert a permutation into its Lehmer code and back. The Lehmer code of a permutation gives a way to encode permutations compactly, and it's useful for generating permutations efficiently.

```python
def permutation_to_lehmer(permutation):
    lehmer_code = []
    for i in range(len(permutation)):
        lehmer_code.append(sum(1 for j in permutation[i+1:] if j < permutation[i]))
    return lehmer_code

def lehmer_to_permutation(lehmer_code):
    n = len(lehmer_code)
    permutation = list(range(1, n+1))
    for i in range(n):
        j = lehmer_code[i]
        permutation[i], permutation[i+j] = permutation[i+j], permutation[i]
    return permutation
```

### 2. Visualizing Rothe Diagrams

A Rothe diagram is a grid representation of a permutation, showing dots at (i, σi) and crosses for each inversion. Here's a basic way to visualize a Rothe diagram using text:

```python
def print_rothe_diagram(permutation):
    n = len(permutation)
    diagram = [[' ' for _ in range(n)] for _ in range(n)]
    for i, val in enumerate(permutation):
        diagram[i][val-1] = '•'  # Mark the position of the permutation
        for x in range(val-1):
            for y in range(i+1, n):
                diagram[y][x] = 'x'  # Mark the inversions

    for row in diagram:
        print(' '.join(row))
```

### 3. Generating Permutations

#### Generating All Permutations in Lexicographic Order

Python's `itertools.permutations` function already generates all permutations of a given sequence in lexicographic order if the sequence is sorted. However, to manually implement this for learning purposes:

```python
from itertools import permutations

# Generate all permutations of a sequence
sequence = [1, 2, 3]
all_perms = list(permutations(sequence))
print(all_perms)
```

#### Converting Natural Numbers to Permutations

To convert natural numbers to permutations, we can use the factorial number system. Here's a simple example to convert a number to a permutation of `[1, 2, 3]`:

```python
def number_to_permutation(number, n):
    # Assuming n = 3 for simplicity
    factorial_bases = [2, 1]  # For n=3, skipping the last base as it's always 0
    digits = []

    for base in factorial_bases:
        digits.append(number // base)
        number = number % base

    # Convert digits to permutation
    items = list(range(1, n+1))
    permutation = []
    for digit in digits:
        permutation.append(items.pop(digit))
    permutation.append(items[0])  # Add the last item
    return permutation
```

### Visualizing and Understanding

By translating these concepts into code, you can experiment with and visualize permutations, their Lehmer codes, and Rothe diagrams directly. This hands-on approach aids in understanding the compact representation of permutations, their systematic generation, and the interesting properties of their encodings.


In [None]:
import pandas as pd
import numpy as np

# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid

def permutation_to_lehmer(permutation):
    lehmer_code = []
    for i in range(len(permutation)):
        lehmer_code.append(sum(1 for j in permutation[i+1:] if j < permutation[i]))
    return lehmer_code

def lehmer_to_permutation(lehmer_code):
    n = len(lehmer_code)
    permutation = list(range(1, n+1))
    for i in range(n):
        j = lehmer_code[i]
        permutation[i], permutation[i+j] = permutation[i+j], permutation[i]
    return permutation

# Example permutation
permutation = [3, 1, 4, 5, 2]

# Convert the permutation to its Lehmer code
lehmer_code = permutation_to_lehmer(permutation)

# Convert the Lehmer code back to the original permutation
reconstructed_permutation = lehmer_to_permutation(lehmer_code)

# Output the original permutation and the reconstructed permutation
print(f'Original permutation: {permutation}')
print(f'Reconstructed permutation: {reconstructed_permutation}')
print(f'Lehmer code: {lehmer_code}')




In [None]:
import pandas as pd
import numpy as np

# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid


def print_rothe_diagram(permutation):
    n = len(permutation)
    diagram = [[' ' for _ in range(n)] for _ in range(n)]
    for i, val in enumerate(permutation):
        diagram[i][val-1] = '•'  # Mark the position of the permutation
        for x in range(val-1):
            for y in range(i+1, n):
                diagram[y][x] = 'x'  # Mark the inversions

    for row in diagram:
        print(' '.join(row))
        
# Example permutation
permutation = base_grid[0]

# Print the Rothe diagram for the permutation
print_rothe_diagram(permutation)




In [None]:
import pandas as pd
import numpy as np
from itertools import permutations
# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid



# Generate all permutations of a sequence
sequence = [1, 2, 3]
all_perms = list(permutations(sequence))
print(all_perms)


In [None]:
import pandas as pd
import numpy as np
# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid

def number_to_permutation(number, n):
    # Assuming n = 3 for simplicity
    factorial_bases = [2, 1]  # For n=3, skipping the last base as it's always 0
    digits = []

    for base in factorial_bases:
        digits.append(number // base)
        number = number % base

    # Convert digits to permutation
    items = list(range(1, n+1))
    permutation = []
    for digit in digits:
        permutation.append(items.pop(digit))
    permutation.append(items[0])  # Add the last item
    return permutation

# Example number and permutation length
number = 5
n = 3

# Convert the number to a permutation
permutation = number_to_permutation(number, n)

# Output the original number and the reconstructed permutation
print(f'Original number: {number}')
print(f'Reconstructed permutation: {permutation}')



In [None]:
import pandas as pd
import numpy as np
# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid


def sjt_permutations(n):
    """Generate permutations of n elements in Gray code order."""
    # Initial permutation
    p = list(range(1, n + 1))
    # Direction of each element (left: -1, right: +1)
    dir = [-1] * n
    
    yield p[:]
    
    while True:
        # Find the largest mobile element (an element that can be swapped in its direction)
        mobile = None
        mobile_index = None
        for i in range(n):
            if dir[i] == -1 and i > 0 and p[i] > p[i-1] or \
               dir[i] == 1 and i < n-1 and p[i] > p[i+1]:
                if mobile is None or p[i] > p[mobile]:
                    mobile = i
        
        if mobile is None:  # No mobile element means we're done
            return
        
        # Swap the mobile element in its direction
        swap_with = mobile + dir[mobile]
        p[mobile], p[swap_with] = p[swap_with], p[mobile]
        dir[mobile], dir[swap_with] = dir[swap_with], dir[mobile]
        
        # Reverse the direction of all elements larger than the current mobile element
        for i in range(n):
            if p[i] > p[mobile]:
                dir[i] *= -1
        
        yield p[:]

# Example usage
n = 4
for perm in sjt_permutations(n):
    print(perm)


In [63]:
import math
import pandas as pd
import numpy as np
# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid.flatten()

def permutation_to_factorial_number(permutation):
    n = len(permutation)
    number = 0
    for i in range(n):
        inversions = sum(1 for j in range(i+1, n) if permutation[j] < permutation[i])
        number += inversions * math.factorial(n - i - 1)
    return number

# Example
permutation = grid
number = permutation_to_factorial_number(permutation)
print(f"The permutation {permutation} corresponds to the number {number} in the factorial number system.")


AttributeError: 'DataFrame' object has no attribute 'flatten'

In [64]:
import pandas as pd
import numpy as np
# Load the base grid from a CSV file
base_grid = pd.read_csv('../storage/a.csv', header=None)
grid = base_grid

def permutation_to_factorial_number(permutation):
    n = len(permutation)
    number = 0
    for i in range(n):
        inversions = sum(1 for j in range(i+1, n) if permutation[j] < permutation[i])
        number += inversions * math.factorial(n - i - 1)
    return number

# Example
permutation = [3, 1, 4, 5, 2]
number = permutation_to_factorial_number(permutation)
print(f"The permutation {permutation} corresponds to the number {number} in the factorial number system.")

# define a function to convert a number to a permutation
# display the generated permutation
# convert the permutation back to a number
# display the original number and the reconstructed permutation


The permutation [3, 1, 4, 5, 2] corresponds to the number 51 in the factorial number system.
