# Complete guide

## Introduction

This Notebook contains an overview of the basic functionality of the simulator. It introduces the simplest ways to get started with the simulator, and it dives into more advanced concepts that will allow you to get a sense of the flexibility of the system. At the end of this guide, you should be able to configure the pre-loaded simulations with custom parameters and metrics.

## Main components
A simulation needs the following components:
(Add figure of overview.)

- **Users**: agents who interact with each other and with items.
- **Model**: agent that defines the behavior of the sociotechnical system. The model mediates the interactions among users and between users and the system.
- **Items**: passive components that are served to the users by the model.
- **Measurements**: modules built into the models which automatically compute information about the system.

## Dynamics
The following steps are at the heart of the simulations:
1. The **model** presents the **users** with some recommended **items**. In general, the items are chosen such that they maximize the probability of user engangement. This probability is based on the model's _prediction_ of user preferences.
2. The **users** view the items presented by the **model**, and interact with some **items** according to some _actual_ preferences.
3. The **model** updates its system state (such as the prediction of user preferences) based on the interactions of **users** with **items**, and it takes some **measurements**.

We will see that this framework is very flexible and it can be a generalization of many classic and new models.

## Quick start: instantiate a model and run
The fastest way to get started is to choose a model, instantiate it with no parameters, and run it for some time steps. Here we run a simple [content filtering recommendation system](https://elucherini.github.io/algo-segregation/reference/models.html#module-models.content) (please refer to the [BaseRecommender documentation](https://elucherini.github.io/algo-segregation/reference/models.html#models.recommender.BaseRecommender) for a complete list of class attributes and methods shared by all models, as that information is currently incomplete in the docs of the other pre-loaded models).

Content filters infer information about the _attributes_ of users based on their past interactions and recommend items with similar attributes to those of users.

In [1]:
import pandas as pd
import numpy as np

import trecs
from trecs.models import ContentFiltering
from trecs.random import Generator
from trecs.metrics import HomogeneityMeasurement, RecSimilarity, AverageFeatureScoreRange

In [2]:
# Create ContentFiltering instance without arguments
default_filtering = ContentFiltering()
# Run for 5 time steps
default_filtering.run(timesteps=5)

100%|██████████| 5/5 [00:00<00:00, 68.70it/s]


In [3]:
# Collect measurements about the simulation
results = default_filtering.get_measurements()

print("Results of the simulation:")
pd.DataFrame(results)

Results of the simulation:


Unnamed: 0,mse,timesteps
0,,0
1,124.60279725267516,1
2,150.5610615878324,2
3,153.07753159337992,3
4,152.3401336914828,4
5,150.30186728696842,5


In what follows, we expand on this minimal example to gain a deeper understanding of what happens under the hood.

## Models

As in the ``Quick Start``, if you want to run a simulation, the smallest piece of information you need is the model you want to run. There are a number of pre-loaded models that work out of the box. We continue to use a generic content filtering recommendation system; please see the docs for a [list of pre-loaded models](https://elucherini.github.io/algo-segregation/reference/models.html).

Recall that content filters infer information about the _attributes_ of users based on their past interactions and recommend items with similar attributes to those of users.

In [4]:
# Again, we instantiate the model with no arguments
default_filtering = ContentFiltering()

In the cell above, we instantiated a content filtering recommender system with default parameters. We print below the default number of users and items in the system.

In [5]:
print("Number of users in system: %d" % default_filtering.num_users)
print("Number of items in system: %d" % default_filtering.num_items)

Number of users in system: 100
Number of items in system: 1250


The model also created a representation for both users and items.

In [6]:
print("In content filtering, the default parameters are given by:")
print("- An all-zeros matrix of users of size %s." % str(default_filtering.users_hat.shape))
print("- A randomly generated matrix of items of size %s." % str(default_filtering.items_hat.shape))

In content filtering, the default parameters are given by:
- An all-zeros matrix of users of size (100, 1000).
- A randomly generated matrix of items of size (1000, 1250).


Formally, content filtering supports user profiles of size `|num_users x num_attributes|` and item attributes of size `|num_attributes x num_items|`.

### Set number of users or items
We can customize the number of users in the system:

In [7]:
# instantiate content filter with a different number of users
number_of_users = 500
filtering = ContentFiltering(num_users=number_of_users)
print("The number of users in the system is now %d." % filtering.num_users)
print("The number of items in the system is still %d." % filtering.num_items)

The number of users in the system is now 500.
The number of items in the system is still 1250.


Or the number of items:

In [8]:
# instantiate with a different number of items
number_of_items = 5000
filtering = ContentFiltering(num_items = number_of_items)
print("The number of items in the system is now %d." % filtering.num_items)
print("The number of users in the system is back to %d." % filtering.num_users)

The number of items in the system is now 5000.
The number of users in the system is back to 100.


Or both:

In [9]:
# instantiate with a different number of items and users
number_of_items = 5000
number_of_users = 500
filtering = ContentFiltering(num_items = number_of_items, num_users=number_of_users)
print("The number of items in the system is now %d." % filtering.num_items)
print("The number of users in the system is now %d." % filtering.num_users)

The number of items in the system is now 5000.
The number of users in the system is now 500.


Note that the representations of items and users are set accordingly:

In [10]:
print("The size of item_attributes is %s." % str(filtering.items_hat.shape))
print("The size of user_profiles is %s." % str(filtering.users_hat.shape))

The size of item_attributes is (1000, 5000).
The size of user_profiles is (500, 1000).


## User predictions and Items
We might also want to define our own representation of users and items. We can do so by defining matrices that satisfy the constraints of the model. The constraints for ContentFiltering (some of which have been mentioned above) are:

- User profiles must be of size `|num_users x num_attributes|`.
- User profiles are matrices of integers representing the number of interactions of each user with items that have a given attributes. `user_profiles[i, j]` represents the number of interactions user `i` had with items with attribute `j`.
- Item attributes must be of size `|num_attributes x num_items|`.
- The model doesn't define any constraint on item attributes. If `item_attributes` is binary, then its `[i, j]`th element is 1 if item `j` is attributed attribute `i`; otherwise, it's 0. Item attributes can also be real-valued, representing the probability that each attribute has to describe items.

If you're already familiar with Numpy: the model is compatible with `ndarray` and _array_like_ data structures. 

If you're not familiar with Numpy: the framework provides a random number generator that lets you draw from several distributions (which, in practice, is a thin wrapper around `numpy.random.Generator`). Please refer to the Numpy documentation for a [list of distributions](https://numpy.org/doc/stable/reference/random/generator.html?highlight=generator#distributions).

In [11]:
# Keep the dimensions small for easy visualization
number_of_users = 5
number_of_attributes = 10
number_of_items = 15
# We define user_representation using the standard integer generator in Numpy.
# We assume a number of interactions with each attribute in the interval [0,4).
user_representation = np.random.randint(4, size=(number_of_users, number_of_attributes))

# We define item_representation using the Generator that comes with the framework
# We assume a binary matrix with a binomial distribution

item_representation = Generator().binomial(n=1, p=.3,
                                           size=(number_of_attributes, number_of_items))
# Note that this is equivalent to:
# item_representation = np.random.Generator(np.random.MT19937()).binomial(n=1, p=.5, size=(...))

print("User representation (num_users x num_attributes):\n%s\n" % (str(user_representation)))
print("Item representation (num_attributes x num_items):\n%s" % (str(item_representation)))

User representation (num_users x num_attributes):
[[0 1 0 3 2 2 3 2 2 3]
 [1 0 2 2 0 3 3 0 3 0]
 [1 1 2 1 2 0 0 2 2 0]
 [3 1 3 2 0 2 1 1 2 0]
 [3 0 1 1 2 3 1 2 2 3]]

Item representation (num_attributes x num_items):
[[0 1 0 0 1 0 1 0 1 0 1 0 0 0 1]
 [1 0 0 0 0 0 1 1 0 0 1 0 0 0 1]
 [0 0 0 0 0 0 1 0 1 1 1 0 1 0 1]
 [0 1 1 0 0 0 0 0 0 0 0 1 0 0 0]
 [1 0 0 0 0 0 1 1 0 0 0 0 1 1 0]
 [0 1 0 0 0 0 1 1 0 0 1 0 0 1 0]
 [1 1 0 0 1 0 0 0 0 0 1 1 1 0 0]
 [1 0 0 0 1 0 0 0 0 0 1 0 1 1 0]
 [0 0 1 1 1 0 0 1 0 0 1 0 0 0 1]
 [0 0 0 1 0 1 1 0 1 0 1 0 1 0 0]]


In [12]:
# Initialize with custom representations
filtering = ContentFiltering(user_representation=user_representation,
                            item_representation=item_representation)

# Check if they're equivalent
is_user_equivalent = "yes" if np.array_equal(user_representation, filtering.users_hat) else "no"
is_item_equivalent = "yes" if np.array_equal(item_representation, filtering.items_hat) else "no"
print("Is user_profiles equivalent to user_representation? %s." % is_user_equivalent)
print("Is item_attributes equivalent to item_representation? %s." % is_item_equivalent)

Is user_profiles equivalent to user_representation? yes.
Is item_attributes equivalent to item_representation? yes.


You can also initialize models with `user_representation` and `item_representation` individually. In this case, the representation that has not been initialized will adapt to the size defined by the user.

In [13]:
# Let's only initialize user_profiles
filtering = ContentFiltering(user_representation=user_representation)
print("After initializing user_profiles, the size of item_attributes (and so the number of attributes in the system) adapts automatically to it.")
print("Size of user_profiles, as defined above: %s." % str(filtering.users_hat.shape))
print("Size of item_attributes: %s.\n" % str(filtering.items_hat.shape))

# The same happens by only initializing item_attributes
filtering = ContentFiltering(item_representation=item_representation)
print("After initializing item_attributes, the size of user_profiles (and so the number of attributes in the system) adapts automatically to it.")
print("Size of item_attributes, as defined above: %s." % str(filtering.items_hat.shape))
print("Size of user_profiles: %s." % str(filtering.users_hat.shape))

After initializing user_profiles, the size of item_attributes (and so the number of attributes in the system) adapts automatically to it.
Size of user_profiles, as defined above: (5, 10).
Size of item_attributes: (10, 1250).

After initializing item_attributes, the size of user_profiles (and so the number of attributes in the system) adapts automatically to it.
Size of item_attributes, as defined above: (10, 15).
Size of user_profiles: (100, 10).


## Run a simulation
We can run a simulation for the predefined number of time steps (50), or define our own duration.

In [14]:
# let's initialize a model with both user_representation and item_representation defined above
filtering = ContentFiltering(user_representation=user_representation,
                            item_representation=item_representation)
# Run the model for the predefined number of timesteps:
filtering.run()

100%|██████████| 50/50 [00:00<00:00, 4787.80it/s]


At the end of the simulation, we can examine the results of the measurements. For example:

In [15]:
# To get the measurements of all timesteps<=50
measurements = filtering.get_measurements()

# Measurements can be easily converted to pandas DataFrame objects

pd.DataFrame(measurements).head()

Unnamed: 0,mse,timesteps
0,,0
1,1.1520858423856952,1
2,1.1470475696258833,2
3,1.117040877176485,3
4,1.0829636182544558,4


## Measurements
At each time step of the simulation, measurement modules calculate a quantity based on the system state. An example of such quantity is the mean squared error between the predicted user profiles and the actual user profiles -- that is, how close is the model to predicting the real preferences of the system?

It's easy to define new metrics, but in this guide we will use some of the pre-loaded metrics to get a better sense of how they work. For a list of pre-loaded metrics and their descriptions, see the [docs](https://elucherini.github.io/algo-segregation/reference/metrics.html).

### View metrics

First, we note that the content filtering recommender system, with its default settings, only tracks one metric: the [mean squared error for user profiles](https://elucherini.github.io/algo-segregation/reference/metrics.html#mse-measurement). The metrics monitored are stored in the [metrics](https://elucherini.github.io/algo-segregation/reference/models.html#models.recommender.MeasurementModule) attribute of the model.

In [16]:
# The metrics tracked by each model can be examined by printing the `metrics` list.
print("The system is currently monitoring these metrics:")
print(filtering.metrics)

The system is currently monitoring these metrics:
[<trecs.metrics.measurement.MSEMeasurement object at 0x7fbd9c89e690>]


### Add metrics
To **maintain compatibility with pandas**, we suggest to **only add metrics to instances of models that have not been run yet**. This is to avoid having measurements that start at different time steps, resulting in arrays of different length. Feel free to disregard this advice if pandas compatibility is not important to your application.

We can instantiate a model, add a new metric, and then run the model.

We will add [HomogeneityMeasurement](https://elucherini.github.io/algo-segregation/reference/metrics.html#homogeneity-measurement), which provides a measure of the homogeneity of user interactions in the system as a whole.



In [17]:
#change the number of items and users to make metric values more reasonable and dimensions distinguishable
number_of_items=100
number_of_users=50
number_of_attributes=20

#change the distribution of item attributes so average feature score range can be added later
item_representation = Generator().normal(size=(number_of_attributes, number_of_items))
filtering = ContentFiltering(num_users=number_of_users, num_items=number_of_items, 
                             num_attributes=number_of_attributes,
                             item_representation=item_representation)

# This method accepts a variable number of metrics
filtering.add_metrics(HomogeneityMeasurement())

print("These are the current metrics:")
print(filtering.metrics)

These are the current metrics:
[<trecs.metrics.measurement.MSEMeasurement object at 0x7fbd6123f7d0>, <trecs.metrics.measurement.HomogeneityMeasurement object at 0x7fbd6123f850>]


We will also add [RecSimilarity](https://elucherini.github.io/algo-segregation/reference/metrics.html#jaccard-similarity-TODO-add-this), which measures the similarity between interaction patterns for pairs of users. In order to measure Jaccard similarity, we must specify which pairs of users we want to compare.

In [18]:
js_pairs = [(u1_idx, u2_idx) for u1_idx in range(filtering.num_users) for u2_idx in range(filtering.num_users) if u1_idx != u2_idx] 
filtering.add_metrics(RecSimilarity(pairs=js_pairs))

print("These are the current metrics:")
print(filtering.metrics)

These are the current metrics:
[<trecs.metrics.measurement.MSEMeasurement object at 0x7fbd6123f7d0>, <trecs.metrics.measurement.HomogeneityMeasurement object at 0x7fbd6123f850>, <trecs.metrics.measurement.JaccardSimilarity object at 0x7fbd61238510>]


Now we will add average feature score range, a metric for evaluating within list diversity based on the range of item attribute values

In [19]:
filtering.add_metrics(AverageFeatureScoreRange())

In [20]:
# now we run the model
filtering.run(timesteps=5)
measurements = filtering.get_measurements()
pd.DataFrame(measurements)

100%|██████████| 5/5 [00:00<00:00, 64.77it/s]


Unnamed: 0,mse,homogeneity,jaccard_similarity,afsr,timesteps
0,,,,,0
1,1.5139009543444946,-45.5,1.000000000000011,9.475716982097303,1
2,1.2590469823779429,-1.0,0.1465022901391119,10.69450056800893,2
3,1.1866208953876878,-0.5,0.110686039247358,10.842398294223246,3
4,1.1742435439357988,0.0,0.09990384740748,10.842398294223246,4
5,1.1609903534788495,-0.5,0.0956691358037155,10.822819559705623,5


Measurements at time step 0 can be undefined (`None`, `NaN`, etc.) because it denotes the measurements before the start of the simulation. MSE is undefined at the beginning because the system has not yet made predictions on the user profiles; similarly, homogeneity is meaningless before the simulation begins because there are no user interactions to consider.

## System state


Some applications might require keeping a history of the system's internal state for future processing. This is useful, for example, to study the evolution of predicted user profiles. The framework provides an interface to [store and access all the states of each component over time](https://elucherini.github.io/algo-segregation/reference/models.html#models.recommender.SystemStateModule). Some components are tracked by default, others are added into the individual models.

In [21]:
system_state = filtering.get_system_state()
print("These are the system state components being monitored:")
print(system_state.keys())

ValueError: No measurement module defined

In [None]:
print("There are as many states as the timesteps for which we ran the system + the initial state.")
print("For example, the history of predicted_user_profiles has length:", (len(system_state['predicted_user_profiles'])))
# the last states correspond to the current state of the components
print("Furthermore, the last state is in the history of a component corresponds to its current state.")
print("Is this true for predicted_user_profiles?", np.array_equal(system_state['predicted_user_profiles'][5], filtering.user_profiles))

To start tracking a new component, you can use the [add_state_variable()](https://elucherini.github.io/algo-segregation/reference/models.html#models.recommender.SystemStateModule.add_state_variable). Note that state variables can only be monitored if they must inherit from the `BaseComponent` class. Creating new state variables is outside of the scope of this guide, so please refer to the [advanced-models](advanced-models.ipynb) and the [advanced-metrics](advanced-metrics.ipynb) notebooks.

## "Real" users
Most of what we've seen so far about users refers to the predictions that the system makes about users' preferences. This framework allows for modeling system predictions as well as "real" users. The [Users](https://elucherini.github.io/algo-segregation/reference/components.html#components.users.Users) component allows for personalization of actual user profiles, which the system is not aware of, and of new kinds of interactions with items.

**Note:** The interface to add actual_user_representation is currently broken. Please wait a couple days for me to restore it.

### User interactions with items
The Users class also determines how users interact with items. The default behavior is defined in [get_user_feedback()](https://elucherini.github.io/algo-segregation/reference/components.html#components.users.Users.get_user_feedback). In short, when the system presents items to users, users internally evaluate the items and choose the one item that is most similar to their actual preferences. Please note that by default, there is no limit to the number of times users can interact with a given item.

#### Overriding user behavior

The default behavior can be overwritten through API. **A new advanced guide about users is in the works.**

## Model parameters about user interactions
Models also provide a few initialization parameters that can be used to tweak the behavior of the model in regards to user interactions. Specifically, models determine the number of items to present users at each iteraction through parameter `num_items_per_iter`. The default is 10 items per user per iteration.

### Other model parameters
Please refer to [the docs](https://elucherini.github.io/algo-segregation/reference/models.html).