# Simple frequentionist model
This model represents a simple frequentionists-based probability estimator given partial data. It will work well with only few observations. However as more questions are ansered, the number of data in conditioned probability table will decrase exponentialy. On avarage, we will only have data to accomodate for 8 observation, as \\(log_5(10^6) = 8.53\\)


In [6]:
from common import load_joint

In [10]:
class FrequentionistModel:
    def __init__(self):
        self._joint = load_joint(10_000)
        self._observed = dict()
        
    def observe(self, x: dict) -> None:
        self._observed = {**self._observed, **x}
        
    def clear_observations(self):
        self._observed = {}
    
    def predict_proba(self, column_name: str) -> dict:
        posterior = self._joint
        for key, value in self._observed.items():
            posterior = posterior[posterior[key] == value]
            
        # Return uniform distribution if no data is available
        if len(posterior) == 0:
            return {i: 0.2 for i in range(1, 6)}

        posterior_probs = posterior[column_name].where(lambda x: x != 0).value_counts(True).to_dict()
        
        # Fill non-observed with zero probability
        for i in range(1, 6):
            if i not in posterior_probs.keys():
                posterior_probs[i] = 0
                
        return posterior_probs

In [11]:
model = FrequentionistModel()

## Example prior
For start, we can ask the model what is the probability distribution over statement "I am the life of the party.", where 1 means "I totaly disagree" and 5 means "I totaly agree". We can se that most people from set are generaly not a party beasts, however most people (28.5%) will score this as netural (3)

In [12]:
model.predict_proba("EXT1")

{3.0: 0.28201511335012597,
 1.0: 0.2478589420654912,
 2.0: 0.20161209068010075,
 4.0: 0.1892191435768262,
 5.0: 0.07929471032745591}

## Conditioning on introvertic anwsers
When we score 5 on "I don't talk a lot." and 1 on "I feel comfortable around people.", anwsering a life of the party as 1 will be over 80% sure

In [13]:
model.observe({"EXT2": 5.0, "EXT3": 1.0})
model.predict_proba("EXT1")

{1.0: 0.8012820512820513,
 2.0: 0.08974358974358974,
 3.0: 0.07051282051282051,
 5.0: 0.019230769230769232,
 4.0: 0.019230769230769232}

## Conditioning on extravertic anwsers
When we do the oposite - set 1 on "I don't talk a lot." and 5 on "I feel comfortable around people.", we can predict agreeing statements (4 & 5) with over 67% confidance

In [15]:
model.clear_observations()
model.observe({"EXT2": 1.0, "EXT3": 5.0})
model.predict_proba("EXT1")

{4.0: 0.36610878661087864,
 5.0: 0.32426778242677823,
 3.0: 0.2196652719665272,
 2.0: 0.04497907949790795,
 1.0: 0.04497907949790795}