# a)
One simple way to protect the private information of the individuals is to just hide direct identifiers. However this is generally insufficient as attackers may have other identifying information. This information, combined with the information in the database, can reveal identities. 
<br>
Another method is k-anonymization, where k-1 people are indistinguishable from each other (with respect to quasi-identifiers) in the database. Columns with personal information, like name and date of birth are removed, and the rest of the information is generalized. For instance can a variable like age be categorical with different age-groups. Even though k-anonymization is an improvement from simply removing direct identifiers, an attacker with enough imformation can still infer something about the individuals.
<br>
If we assume that an attacker can have a lot of side-information, it is better to use differential privacy.
<br>
In this task the policy is released and can be used by the public. Then the data have to be anonymized before $\pi(a|x)$ is obtained. This can be done using a local privacy model, where independent Laplace noise $\omega_{i}$ is added to each individual. (See local privacy page 78 in notes)

# b) 
Here we assume that the analysts can be trusted with private information, so only the result made available for the public have to be privatized. Then we can use a centralized privacy model. We obtain $\pi(a|x)$ with $a=n^{-1}\sum_{i=1}^{n}x_{i}+\omega$

# c) 

Let us now try to implement a policy, and see how the utility is affected by the privacy.

In [1]:
import numpy as np
import pandas as pd
from aux_file import symptom_names
import simulator
from IPython import embed
from sklearn.linear_model import LinearRegression

In [None]:
class Policy:
    """ A policy for treatment/vaccination. """
    def __init__(self, n_actions, action_set):
        """ Initialise.
        Args:
        n_actions (int): the number of actions
        action_set (list): the set of actions
        """
        self.n_actions = n_actions
        self.action_set = action_set
        print("Initialising policy with ", n_actions, "actions")
        print("A = {", action_set, "}")
    ## Observe the features, treatments and outcomes of one or more individuals
    def observe(self, features, action, outcomes):
        pass 
          
    def get_utility(self, features, action, outcome):
        """ Obtain the empirical utility of the policy on a set of one or more people. 
        If there are t individuals with x features, and the action
        
        Args:
        features (t*|X| array)
        actions (t*|A| array)
        outcomes (t*|Y| array)
        Returns:
        Empirical utility of the policy on this data.
      
        Here the utiliy is defined in terms of the outcomes obtained only, ignoring both the treatment and the previous condition.
        """

        utility = 0
        utility -= 0.2 * sum(outcome[:,symptom_names['Covid-Positive']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Taste']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Fever']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Headache']])
        utility -= 0.5 * sum(outcome[:,symptom_names['Pneumonia']])
        utility -= 0.2 * sum(outcome[:,symptom_names['Stomach']])
        utility -= 0.5 * sum(outcome[:,symptom_names['Myocarditis']])
        utility -= 1.0 * sum(outcome[:,symptom_names['Blood-Clots']])
        utility -= 100.0 * sum(outcome[:,symptom_names['Death']])
        return utility
        
    def get_reward(self, features, actions, outcome):
        
        rewards = np.zeros(len(outcome))
        for t in range(len(features)):
            utility = 0
            utility -= 0.2 * outcome[t,symptom_names['Covid-Positive']]
            utility -= 0.1 * outcome[t,symptom_names['Taste']]
            utility -= 0.1 * outcome[t,symptom_names['Fever']]
            utility -= 0.1 * outcome[t,symptom_names['Headache']]
            utility -= 0.5 * outcome[t,symptom_names['Pneumonia']]
            utility -= 0.2 * outcome[t,symptom_names['Stomach']]
            utility -= 0.5 * outcome[t,symptom_names['Myocarditis']]
            utility -= 1.0 * outcome[t,symptom_names['Blood-Clots']]
            utility -= 100.0 * outcome[t,symptom_names['Death']]
            rewards[t] = utility
        return rewards

    def get_action(self, features):
        """Get actions for one or more people. 
        This is done by making a random policy with 3 treatments,
        then fitting a linear model on each of the 3 subgroups.
        The action is then calculated by which of the three models that predicts
        the highest utility for each individual. 
        """
        n_population = features.shape[0]
        model1, model2, model3 = self.linear_model(n_population)
    
        actions = np.zeros([n_population, self.n_actions])
        pred1 = model1.predict(self.feature_select(features))
        pred2 = model2.predict(self.feature_select(features))
        pred3 = model3.predict(self.feature_select(features))
        for t in range(n_population):
    
            if pred1[t] >= pred2[t] and pred1[t] >= pred3[t]:
                actions[t, 0] = 1
            elif pred2[t] >= pred1[t] and pred2[t] >= pred3[t]:
                actions[t, 1] = 1
            elif pred3[t] >= pred1[t] and pred3[t] >= pred2[t]:
                actions[t, 2] = 1
    
        return actions
    
    def linear_model(self, n_population):
        """
        Fit a linear model on random data. The data is first randomly generated
        and a random policy is made. We then divide the data by the different
        treatments given (which was random), and fit one linear model on each data.
        """
        population = simulator.Population(128, 3, 3)
        treatment_policy = RandomPolicy(3, list(range(3))) # make sure to add -1 for 'no vaccine'
        X = population.generate(n_population)
        A = treatment_policy.get_action(X)
        U = population.treat(list(range(n_population)), A)
        x_data = self.feature_select(X)
        x_data1 = x_data[A[:, 0] == 1] # Action 1
        x_data2 = x_data[A[:, 1] == 1] # Action 2
        x_data3 = x_data[A[:, 2] == 1] # Action 3
        y_data1 = treatment_policy.get_reward(x_data1, 0, U[A[:, 0] == 1])
        y_data2 = treatment_policy.get_reward(x_data2, 0, U[A[:, 1] == 1])
        y_data3 = treatment_policy.get_reward(x_data3, 0, U[A[:, 2] == 1])
                
        linear_model_test1 = LinearRegression()
        linear_model_test2 = LinearRegression()
        linear_model_test3 = LinearRegression()

        model1 = linear_model_test1.fit(x_data1, y_data1)
        model2 = linear_model_test2.fit(x_data2, y_data2)
        model3 = linear_model_test3.fit(x_data3, y_data3)

        return model1, model2, model3
        
    def feature_select(self, X):
        """
        Chooses some columns in X. For now, we just omit the genes
        """
        df = add_feature_names(X)
        temp1 = df.iloc[:, :13]
        temp2 = df.iloc[:, -9:-3]
        return np.asmatrix(temp1.join(temp2))



In [None]:
class RandomPolicy(Policy):
    """ This is a purely random policy!"""

    def get_utility(self, features, action, outcome):
        """Here the utiliy is defined in terms of the outcomes obtained only, ignoring both the treatment and the previous condition.
        """
        actions = self.get_action(features)
        utility = 0
        utility -= 0.2 * sum(outcome[:,symptom_names['Covid-Positive']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Taste']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Fever']])
        utility -= 0.1 * sum(outcome[:,symptom_names['Headache']])
        utility -= 0.5 * sum(outcome[:,symptom_names['Pneumonia']])
        utility -= 0.2 * sum(outcome[:,symptom_names['Stomach']])
        utility -= 0.5 * sum(outcome[:,symptom_names['Myocarditis']])
        utility -= 1.0 * sum(outcome[:,symptom_names['Blood-Clots']])
        utility -= 100.0 * sum(outcome[:,symptom_names['Death']])
        return utility
    
    def get_action(self, features):
        """Get a completely random set of actions, but only one for each individual.
        If there is more than one individual, feature has dimensions t*x matrix, otherwise it is an x-size array.
        
        It assumes a finite set of actions.
        Returns:
        A t*|A| array of actions
        """

        n_people = features.shape[0]
        ##print("Acting for ", n_people, "people");
        actions = np.zeros([n_people, self.n_actions])
        for t in range(features.shape[0]):
            action = np.random.choice(self.action_set)
            if (action >= 0):
                actions[t,action] = 1
            # embed()
            
        return actions

In [None]:
def add_feature_names(X):
    """
    This functions simply makes X to a dataframe and adds the column names, 
    so it is easier to work with.
    """
    features_data = pd.DataFrame(X)
    # features =  ["Covid-Recovered", "Age", "Gender", "Income", "Genome", "Comorbidities", "Vaccination status"]
    features = []
    # features += ["Symptoms" + str(i) for i in range(1, 11)]
    features += ["Covid-Recovered", "Covid-Positive", "No-Taste/Smell", "Fever", 
                 "Headache", "Pneumonia", "Stomach", "Myocarditis", 
                 "Blood-Clots", "Death"]
    features += ["Age", "Gender", "Income"]
    features += ["Genome" + str(i) for i in range(1, 129)]
    # features += ["Comorbidities" + str(i) for i in range(1, 7)]
    features += ["Asthma", "Obesity", "Smoking", "Diabetes", 
                 "Heart disease", "Hypertension"]
    features += ["Vaccination status" + str(i) for i in range(1, 4)]
    features_data.columns = features
    return features_data
    
def add_action_names(actions):
    """
    Add names for actions. Converts array to pandas DataFrame.
    """
    df = pd.DataFrame(actions)
    names = ["Action" + str(i) for i in range(1, len(actions.shape[0]) + 1)]
    df.columns = names
    return df

def add_outcome_names(outcomes):
    """
    Add names for the outcomes. Converts array to pandas DataFrame.
    """
    df = pd.DataFrame(outcomes)
    df.columns = ["Covid-Recovered", "Covid-Positive", "No-Taste/Smell", "Fever", 
                  "Headache", "Pneumonia", "Stomach", "Myocarditis", 
                  "Blood-Clots", "Death"]
    return df
    
def privatize(X, theta):
    """
    Adds noice to the data, column by column. The continious and discreet 
    columns are treated differently. 
    """
    df = add_feature_names(X).copy()
    df["Age"] = randomize_age(df["Age"], theta)
    df["Income"] = randomize_income(df["Income"], theta)
    for column in df.columns:
        if column != "Age" or column != "Income":
            df[column] = randomize(df[column], theta)
    return np.asarray(df)
    
def randomize(a, theta):
    """
    Randomize a single column. Simply add a cointoss to "theta" amount of the data
    """
    coins = np.random.choice([True, False], p=(theta, (1-theta)), size=a.shape)
    noise = np.random.choice([0, 1], size=a.shape)
    response = np.array(a)
    response[~coins] = noise[~coins]
    return response 
    
def randomize_income(a, theta):
    """
    Randomize by drawing from the same population again
    """
    coins = np.random.choice([True, False], p=(theta, (1-theta)), size=a.shape)
    noise = np.random.gamma(1,10000, size=a.shape)
    response = np.array(a)
    response[~coins] = noise[~coins]
    return response 
    
def randomize_age(a, theta):
    """
    Randomize by drawing from the same population again
    """
    coins = np.random.choice([True, False], p=(theta, (1-theta)), size=a.shape)
    noise = np.random.gamma(3,11, size=a.shape)
    response = np.array(a)
    response[~coins] = noise[~coins]
    return response

In [2]:
np.random.seed(57)
n_genes = 128
n_vaccines = 3
n_treatments = 3
n_population = 1000
population = simulator.Population(n_genes, n_vaccines, n_treatments)
treatment_policy = Policy(n_treatments, list(range(n_treatments)))
X = population.generate(n_population)
np.random.seed(57)
A = treatment_policy.get_action(X)
np.random.seed(57)
U = population.treat(list(range(n_population)), A)
X_priv = privatize(X, 0.9)
np.random.seed(57)
A_priv = treatment_policy.get_action(X_priv)
np.random.seed(57)
U_priv = population.treat(list(range(n_population)), A_priv)
utility = treatment_policy.get_utility(X, A, U)
utility_priv = treatment_policy.get_utility(X_priv, A_priv, U_priv)
embed()

NameError: name 'Policy' is not defined