In [1]:
import numpy as np

In [2]:
# Set value of MISSING_INT here. In respy it is imported from shared_constants module.
MISSING_INT = -99

As a first step, we need to create the variables which would later be supplied as impiuts to the function

In [3]:
num_periods = 20
num_types = 2 # There are 2 types, labelled 0 and 1.
educ_max = 25 # Maximum number of years of educ in the data, i.e., upper bound of education initial condition
educ_min = 10 # Minimim number of years of educ in the data, i.e., lower bound of education initial condition

In [4]:
# Auxiliary calculation of the education dimension
educ_range = educ_max - educ_min + 1

In [5]:
educ_range

16

Next, we create arrays which we want to populate with the state space variables.
The components of our state space are:
- the time variable: period
- the initial conditions: type and education
- the history of choices: choice_lagged

In [6]:
# Array to collect all state space points (states) that can be reached each period
states_all = np.tile(MISSING_INT, (num_periods, 100000, 3))

Let us briefly discuss the dimention of the states_all array:
- the 1st dimension is determined by the number of periods in our model
- the 2nd dimension is related to the maximum number of state space points ever feasible / possibly reachable in one of the periods. Question: Is this true? Is 100 000 simply some arbitrary large number that is for sure larger than the highest possible number of state space points ever reachable in a period? Can one replace this number by some educated guess of the maximum number of state point combinantions given no restrictions?
- the 3rd dimension is equal to the number of remaining state space components (except period number) that we want to record: type, education, choice_lagged

In [7]:
# Array for mapping the state space points to indices
shape = (num_periods, num_types, educ_range, 3)
mapping_state_index = np.tile(MISSING_INT, shape)
mapping_state_index.shape

(20, 2, 16, 3)

Let us briefly discuss the dimension of the mapping_state_index array. Each dimension component corresponds to the state space component we would like to keep record of.
- dimension num_periods corresponds to the period number
- num_types to the type indicator
- educ_range to the years of education
- the number 3 to the number of choices available to the agents each period. Question: is it not better to introduce some variable in stead of the number 3.

In [8]:
# Array for the maximum number state space points / states per period
states_number_period = np.tile(MISSING_INT, num_periods)

In a final step, we loop through all admissible state space points and fill up the constructed arrays with necessary information. Thereby, note two important details. 
- Since the individuals make their first choice in the first period, it is only possible to record their lagged choice from the second period onwards. Therefore, the loop directly skips the first period.
- If we want to record only admissible state space points, we have to account for the fact that individuals in the model start making labor supply choices only after they have completed education.

Let us look into latter in greater detail. As a reminder, we note that we model individuals from age 16 (legally binding mininum level of education) until age 60 (typical retirement entry age), i.e., for 45 periods, where the 1st period corresponds to age 16, the second period to age 17, etc. The current specification of starting values is equivalent to the observation / simulated reality that individuals in the sample have completed something between 10 and 25 years of edcuation. In our loop we want to take into account the fact that, in the first period at age 16, only individuals who have completed 10 years of education (assuming education for everyone starts at the age of 6) will be making a labor market choice between full-time (F), part-time (P), and non-employment. The remaining individuals are still in education, such that a state space point where years of education equal e.g. 11 and a labor market choice of e.g. part-time is observed is not admissible and should therefore not be recorded. This is ensured by the if clause "educ_years > period".

We also note that, in the current setting, there is no possibility of duplicate states arising.

In [19]:
# Loop over all periods / all ages
for period in range(num_periods)[1:]:
    
    # Start count for admissible state space points
    k = 0
    
    # Loop over all types
    for type_ in range(num_types):
        
        # Loop over all possible initial conditions for education
        for educ_years in range(educ_range):
            
            # Check if individual has already completed education
            # and will make a labor supply choice in the period
            if educ_years > period:
                continue
            
            # Loop over the three labor market choices, F, P, N
            for choice_lagged in [1,2,3]:
                
                # Assign the integer count k as an indicator for the
                # currently reached admissible state space point
                mapping_state_index[
                    period,
                    type_,
                    educ_years, # check if this is okay like this
                    choice_lagged - 1,
                ] = k
                
                # Record the values of the state space components
                # for the currently reached admissible state space point
                states_all[period, k, :] = [
                    type_,
                    educ_years + educ_min,
                    choice_lagged -1,
                ]
                
                # Update count
                k += 1
    
    # Record number of admissible state space points for the period currently reached in the loop 
    states_number_period[period] = k

In [None]:
# Auxiliary objects
max_states_period = max(states_number_period)

In [None]:
# Collect arguments
args = (states_all, states_number_period, mapping_state_index, max_states_period)

In [18]:
states_all[1,5,:]

array([ 0, 11,  2])

In [20]:
states_number_period

array([-99,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72,  78,
        84,  90,  96,  96,  96,  96,  96])