# Statistical Learning Sequencing

Here in this tutorial, I will walk through the module `sl_sequencing.py`. The code and algorithm developed in the module was written by [Harrison Gietz](https://github.com/hubarruby). The purpose of the module is to develop a sequence of numbers in which the numbers 
1. do not repeat
2. have an equal amount of observations of each number and 
3. have an equal number of all possible transitions to a number occur. 

This sequence can then be used for other purposes, like generating stimuli for statistical learning paradigms.

## Getting started. 

The module is written in an object oriented fashion. Let's start by

1. Importing the module
2. Setting the `Sequence` class. The class requires two parameters: your total sequence length, and a vector length. This vector is a non-zero indexed vector. So if you say `vector_size=4`, you are generating a vector $v = \{1, 2, 3, 4\}$.

In [2]:
# import 
from sl_sequencing import Sequence

# assigning class
sequence = Sequence(total=132, vector_size=4)

This generates a sequence that is 132 values in length that is composed of the values in vector $v$. 

There is only one class in this module and all of its subsequent methods will be enough to generate a valid sequence.

Now that you have your `Sequence`class initialized, you need to make the sequence.

In [3]:
# make sequence
sequence.sequence()

number of times tried: 1
Sequence achieved


<sl_sequencing.Sequence at 0x1065842b0>

The output is telling you that it took the algorithm one try to create a valid sequence given you parameters. Depending on your `vector_size` and `total` this may increase or decrease.

To see you sequence simply call the attribute generated from the `Sequence.sequence()` method

In [5]:
# viewing valid numeric sequence
print(sequence.valid_sequence)

[3, 4, 1, 4, 3, 2, 4, 2, 3, 4, 2, 3, 1, 3, 2, 1, 3, 2, 1, 2, 3, 1, 2, 1, 4, 3, 4, 1, 3, 2, 3, 1, 2, 3, 2, 1, 4, 3, 1, 3, 1, 3, 2, 1, 2, 4, 3, 1, 2, 1, 2, 3, 1, 4, 2, 1, 2, 4, 2, 1, 2, 1, 2, 4, 1, 2, 4, 2, 4, 2, 1, 4, 2, 3, 1, 2, 3, 2, 4, 1, 3, 4, 1, 3, 1, 3, 1, 4, 1, 3, 2, 3, 1, 3, 4, 3, 2, 1, 3, 2, 3, 4, 3, 2, 3, 4, 3, 4, 2, 4, 1, 4, 3, 4, 2, 4, 1, 4, 3, 4, 3, 4, 2, 4, 1, 4, 1, 4, 1, 4, 2, 4]


You can also check the validity of the sequence, which will provide a cursory overview of the sequence's adherence to the parameters. 

In [7]:
# check sequence 
sequence.validate()

total length of list: 132

Checking transition counts: 
Transition counts from 1: {2: 11, 3: 11, 4: 11}
Transition counts from 2: {1: 11, 3: 11, 4: 11}
Transition counts from 3: {1: 11, 2: 11, 4: 11}
Transition counts from 4: {1: 11, 2: 11, 3: 11}

Double Checking transition counts (with different method): 
Transition counts for 1: 
     There were 11 transitions to 2: 
     There were 11 transitions to 3: 
     There were 11 transitions to 4: 
Transition counts for 2: 
     There were 11 transitions to 1: 
     There were 11 transitions to 3: 
     There were 11 transitions to 4: 
Transition counts for 3: 
     There were 11 transitions to 1: 
     There were 11 transitions to 2: 
     There were 11 transitions to 4: 
Transition counts for 4: 
     There were 11 transitions to 1: 
     There were 11 transitions to 2: 
     There were 10 transitions to 3: 

Checking number counts: 
1: 33
2: 33
3: 33
4: 33


Above, you can see that the sequence meets all of the requirements imposed upon it. If you wish, you can save this sequence as a .csv file.

In [None]:
# save to .csv
sequence.save_csv(filename='sequence.csv')

Despite this ease of use, you can take the sequence generation a step further. Say you are using the sequence to generate a random order of stimuli. You can replace the numeric values generated by `Sequence.sequence()` with strings so that you now have a valid sequence of whatever you want, not just some arbitrary numbers. To do this you must

1. Define a dictionary in which the keys represent the numeric values in your original sequence (or really, the vector, $v$, generated when you provide an argument to the parameter `vector_size`) and the the values in the dictionary represent the replacement items.

Let's check this out below.

In [8]:
## Replacing numeric values with strings of filenames

# define dictionary
inpt = {
    1: 'blue.png', # the number '1' is replaced by 'blue.png'...
    2: 'red.png',
    3: 'pink.png',
    4: 'purple.png'
}

# replace
sequence.match(inpt=inpt)

# see new sequence
print(sequence.match_sequence)

['pink.png', 'purple.png', 'blue.png', 'purple.png', 'pink.png', 'red.png', 'purple.png', 'red.png', 'pink.png', 'purple.png', 'red.png', 'pink.png', 'blue.png', 'pink.png', 'red.png', 'blue.png', 'pink.png', 'red.png', 'blue.png', 'red.png', 'pink.png', 'blue.png', 'red.png', 'blue.png', 'purple.png', 'pink.png', 'purple.png', 'blue.png', 'pink.png', 'red.png', 'pink.png', 'blue.png', 'red.png', 'pink.png', 'red.png', 'blue.png', 'purple.png', 'pink.png', 'blue.png', 'pink.png', 'blue.png', 'pink.png', 'red.png', 'blue.png', 'red.png', 'purple.png', 'pink.png', 'blue.png', 'red.png', 'blue.png', 'red.png', 'pink.png', 'blue.png', 'purple.png', 'red.png', 'blue.png', 'red.png', 'purple.png', 'red.png', 'blue.png', 'red.png', 'blue.png', 'red.png', 'purple.png', 'blue.png', 'red.png', 'purple.png', 'red.png', 'purple.png', 'red.png', 'blue.png', 'purple.png', 'red.png', 'pink.png', 'blue.png', 'red.png', 'pink.png', 'red.png', 'purple.png', 'blue.png', 'pink.png', 'purple.png', 'blue.pn

In the case of statistical learning paradigms, in which this algorithm was originally generated for, there is typically a random and structured sequence that one might want to create. To create a structured sequence, you might want to assign whole chunks of things to a single value. This way, these entire chunks are what is written throughout the sequence, thus increasing the transitional probability for items within chunks. Let's take a look at how to implement this.

In [9]:
## Creating structur

# define nested dictionary
inpt = {
    1: ['red.png', 'blue.png', 'white.png'],
    2: ['black.png', 'green.png', 'yellow.png'],
    3: ['purple.png', 'pink.png', 'brown.png'],
    4: ['orange.png', 'gray.png', 'turqoise.png']
}

# replace
sequence.match(inpt=inpt, unlist=True)

# see new sequence
print(sequence.match_sequence)

['purple.png', 'pink.png', 'brown.png', 'orange.png', 'gray.png', 'turqoise.png', 'red.png', 'blue.png', 'white.png', 'orange.png', 'gray.png', 'turqoise.png', 'purple.png', 'pink.png', 'brown.png', 'black.png', 'green.png', 'yellow.png', 'orange.png', 'gray.png', 'turqoise.png', 'black.png', 'green.png', 'yellow.png', 'purple.png', 'pink.png', 'brown.png', 'orange.png', 'gray.png', 'turqoise.png', 'black.png', 'green.png', 'yellow.png', 'purple.png', 'pink.png', 'brown.png', 'red.png', 'blue.png', 'white.png', 'purple.png', 'pink.png', 'brown.png', 'black.png', 'green.png', 'yellow.png', 'red.png', 'blue.png', 'white.png', 'purple.png', 'pink.png', 'brown.png', 'black.png', 'green.png', 'yellow.png', 'red.png', 'blue.png', 'white.png', 'black.png', 'green.png', 'yellow.png', 'purple.png', 'pink.png', 'brown.png', 'red.png', 'blue.png', 'white.png', 'black.png', 'green.png', 'yellow.png', 'red.png', 'blue.png', 'white.png', 'orange.png', 'gray.png', 'turqoise.png', 'purple.png', 'pink.

You can use this nested list to create a more structured sequence which manipulates the transitional probabilites of the values in the sequence. However, if you are using a nested list, you must provide `unlist=True`.

You can save this new sequence to a .csv file.

In [None]:
sequence.save_match_csv(filename='match.csv')