In [9]:
# Logic for allowing the import of notebooks.
import io, os, sys, types
from IPython import get_ipython
from nbformat import read
from IPython.core.interactiveshell import InteractiveShell

def find_notebook(fullname, path=None):
    """find a notebook, given its fully qualified name and an optional path

    This turns "foo.bar" into "foo/bar.ipynb"
    and tries turning "Foo_Bar" into "Foo Bar" if Foo_Bar
    does not exist.
    """
    name = fullname.rsplit('.', 1)[-1]
    if not path:
        path = ['']
    for d in path:
        nb_path = os.path.join(d, name + ".ipynb")
        if os.path.isfile(nb_path):
            return nb_path
        # let import Notebook_Name find "Notebook Name.ipynb"
        nb_path = nb_path.replace("_", " ")
        if os.path.isfile(nb_path):
            return nb_path

class NotebookLoader(object):
    """Module Loader for Jupyter Notebooks"""
    def __init__(self, path=None):
        self.shell = InteractiveShell.instance()
        self.path = path

    def load_module(self, fullname):
        """import a notebook as a module"""
        path = find_notebook(fullname, self.path)

        print ("importing Jupyter notebook from %s" % path)

        # load the notebook object
        with io.open(path, 'r', encoding='utf-8') as f:
            nb = read(f, 4)


        # create the module and add it to sys.modules
        # if name in sys.modules:
        #    return sys.modules[name]
        mod = types.ModuleType(fullname)
        mod.__file__ = path
        mod.__loader__ = self
        mod.__dict__['get_ipython'] = get_ipython
        sys.modules[fullname] = mod

        # extra work to ensure that magics that would affect the user_ns
        # actually affect the notebook module's ns
        save_user_ns = self.shell.user_ns
        self.shell.user_ns = mod.__dict__

        try:
            for cell in nb.cells:
                if cell.cell_type == 'code':
                    # transform the input to executable Python
                    code = self.shell.input_transformer_manager.transform_cell(cell.source)
                    # run the code in themodule
                    exec(code, mod.__dict__)
        finally:
            self.shell.user_ns = save_user_ns
        return mod
    
class NotebookFinder(object):
    """Module finder that locates Jupyter Notebooks"""
    def __init__(self):
        self.loaders = {}

    def find_module(self, fullname, path=None):
        nb_path = find_notebook(fullname, path)
        if not nb_path:
            return

        key = path
        if path:
            # lists aren't hashable
            key = os.path.sep.join(path)

        if key not in self.loaders:
            self.loaders[key] = NotebookLoader(path)
        return self.loaders[key]

sys.meta_path.append(NotebookFinder())

# Theoretical Part
---

### 1. Hypothesis Testing – The problem of multiple comparisons [5 points]

The problem of multiple comparisons can be viewed in terms of a Bernoulli experiment over multipe hypothesis tests, in which the Type I Error probability $\alpha$ of each hypothesis test is independent of the the previous tests. We treat the probability of Type I Error as the probability of a success in the Bernoulli experiment. Thus, for any one experiment, we have a probability distribtion governed by the following parameters as follows:

* chance of making a Type I Error: $\alpha$
* chance of not making a Type I Error: $1 - \alpha$

Given this setup, a collection of $m$ hypothesis tests generates a binomial distribution for the chance of observing $k$ Type I Errors. The distribution is governed by the following parameters:

* number of trials = $m$
* number of successes = $k$
* probability of Type I Error = $alpha$

The probability mass functions for the binomial distribution is given by:

$$Pr(k\ |\ n, p) = C_n^k p^k (1-p)^{n-k} = \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k}$$

Given this reasoning, we can answer the questions as follows:

$$P(m^{th}\ experiment\ gives\ significant\ result\ |\ m experiments\ lacking\ power\ to\ reject\ H_0) = Pr(Type\ I\ Error) = \alpha$$

$$P(at\ least\ one\ significant\ result\ |\ m\ experiments\ lacking\ power\ to\ reject\ H_0) = Pr(k=1\ |\ n=m, p=\alpha) = \frac{m!}{(m-1)!} \alpha(1-\alpha)^{m-1}$$



### 2. Bias and unfairness in Interleaving experiments [10 points]
A scenario in which a team-draft interleaving algorithm is insensitive between 2 ranked lists of length 4 is detailed in Hoffman et al. (2011). A similar argument is made here for length 3. The scenario is showed in the figure below.

![caption](files/team-draft-interleaving-insensitivity.png)



### References
HOFMANN, K., WHITESON, S., AND DE RIJKE, M. 2011. A probabilistic method for inferring preferences from
clicks. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM).

# Experimental Part
---

### Step 1: Simulate Rankings of Relevance for  E  and  P (5 points)

In [10]:
# import step_1

### Step 2 :  Implement Evaluation Measures  (10 points)

In [11]:
# import step_2

### Step 3 :  Calculate the  𝛥measure (0 points)

In [12]:
# import step_3

### Step 4 :  Implement Interleaving ( 15 points )

In [13]:
# import step_4

### Step 5 :  Implement User Clicks Simulation ( 15 points )

In [14]:
# import step_5

### Step 6 :  Simulate Interleaving Experiment ( 10 points )

In [15]:
# import step_6

### Step 7 :  Results and   Analysis ( 30 points )

In [16]:
# import step_7