### Description of the Data Generation Method

This data generation method creates a synthetic dataset of variable-length numerical sequences along with corresponding binary labels based on a predefined rule.

* **Sequence Generation:**
  Each sequence consists of random integers ranging from 0 to 9. The length of each sequence is randomly chosen within a specified range (default between 3 and 10 elements).

* **Labeling Rule:**
  The label for each sequence is binary and determined by the parity of the sum of its elements. If the sum is even, the label is `1`; if odd, the label is `0`.

* **Data Formatting:**
  To handle sequences of varying lengths, each sequence is padded to a fixed maximum length (default 10) with a special padding value (`-1`). This ensures uniform input size for machine learning models.

* **Output Format:**
  The dataset is saved as a CSV file where each column represents an element of the padded sequence (`step_1`, `step_2`, ..., `step_10`), followed by the `label` column.

This approach facilitates training sequence models such as Recurrent Neural Networks (RNNs) to learn and replicate the underlying parity-based rule from variable-length sequences.

In [13]:
import csv
import random

In [14]:
def generate_order_dependent_sequence(min_len=5, max_len=10):
    seq_len = random.randint(min_len, max_len)
    sequence = [random.randint(0, 9) for _ in range(seq_len)]
    
    # Check for the pattern [3, 7] in order
    found_3 = False
    label = 0
    for num in sequence:
        if num == 3:
            found_3 = True
        elif num == 7 and found_3:
            label = 1
            break
    
    return sequence, label

In [15]:
dataset = [generate_order_dependent_sequence() for _ in range(10000)]

max_len = 10
PAD_VALUE = 0

# Save to CSV
with open('order_dependent_sequence_dataset.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    header = [f'step_{i+1}' for i in range(max_len)] + ['label']
    writer.writerow(header)

    for sequence, label in dataset:
        padded_sequence = sequence + [PAD_VALUE] * (max_len - len(sequence))
        writer.writerow(padded_sequence + [label])
