# Item Response Models
- categories: [Julia, Turing, ItemResponse]

In [1]:
#collapse
using Turing
using Bijectors
using Gadfly
using DataFrames, DataFramesMeta
Gadfly.set_default_plot_size(900px, 300px)

Item response models are used to simultaneously make inferences about two interacting populations. Commonly, a population of test questions and a population of test takers (students) with the result (success/failure) of each student on each question they've seen. This is an interesting problem in that even in the basic case:

- students have different levels of aptitude
- questions have different levels of difficulty
- not every student sees every question
- not every question needs to be seen by the same number of students
- we should be able to make relative inferences between students/questions that have no overlapping questions/students
- the data is nonetheless fairly simple: `[correct (Boolean), student_id (categorical), question_id (categorical]`

Item Response may seem very similar to collaborative filtering, so it's worth disambiguating.

Collaborative filtering aims to complete a sparsely determined preferences/ratings matrix of consumer - item scores (e.g. "3.5 :star:") $M$. A common approach is alternating least squares which iteratively factors the matrix into a product-feature matrix $P$ and a customer-preference matrix $C$. The goal is to create these so that their product "accurately completes" $M$, ie if $CP = \overline{M}$ then the difference $M - \overline{M}$ is small wherever we have entries for $M$ (remember, $M$ is incomplete). 

A key fact is that the matrix $\overline{M}$ (the list of recommendations) is the important output here, and the factors $C$ and $P$ are intriguing but not critical. This is different from Item Response where the background variables describing difficulty and aptitude for each question, student are the primary desired outputs (but we could infer $P(\mathrm{correct} | \mathrm{student_id}, \mathrm{question_id})$ for unseen pairings!).

The other distinction worth mentioning is that the IR models have enormous flexibility in how they inform the probability of success, as we'll see. Collaborative filtering, at least with ALS, is just optimizing a matrix factorization task. Since $\overline{M} = CP$, the user-product score can only be the dot product of the product-feature vector and the customer-preference vector, it attempts to encode "how well do the consumer preferences overlap with the product features."

[Stan users guide - IRT](https://mc-stan.org/docs/2_18/stan-users-guide/item-response-models-section.html)