# Machine Problem 0.0

The goal of this machine problem is to help students prepare for the upcoming live computational challenge.
In particular, one should be familiar with generating data for a Pandas dataframe, reading the data in from a csv file, and writing the dataframe to a csv file.

## Statistical Structure

The statistical task at hand consists in creating a decision rule for a binary detection problem in the Bayesian setting.
The prior probability are $\Pr( H_0 ) = \Pr (H_1) = \frac{1}{2}$.
The probability distribution function under Hypothesis 0 is
$$f(y;\theta_0) = \frac{1}{\sqrt{2 \pi}} \exp \left( - \frac{y^2}{2} \right) .$$
And, the probability distribution function under Hypothesis 1 is
$$f(y;\theta_1) = \frac{1}{\sqrt{2 \pi}} \exp \left( - \frac{(y-1)^2}{2} \right) .$$
The generation code for the data file appears below.

In [7]:
import numpy as np
import pandas as pd

mean0 = 0.0
mean1 = 1.0

Z = np.random.randint(0,2,10)
Y0 = np.random.randn(10) + mean0
Y1 = np.random.randn(10) + mean1

Y = [h0*(1-h) + h1*h for h,h0,h1 in zip(Z,Y0,Y1)]

source_df = pd.DataFrame({'Y0':Y0, 'Y1':Y1, 'Y':Y, 'Z':Z})
sample_df = pd.DataFrame({'Y':Y})

print("Generated Data")
print(source_df, "\n")

print("Available Sample")
print(sample_df, "\n")

sample_df.to_csv("DataSet0.csv")

Generated Data
(          Y        Y0        Y1  Z
0 -1.157911 -1.157911  1.605662  0
1  1.553757  1.553757  1.276565  0
2  1.168600  0.684641  1.168600  1
3  0.767144  1.460516  0.767144  1
4  1.715646  1.715646  0.728306  0
5  1.957654  1.957654  0.475231  0
6 -0.103198 -0.416062 -0.103198  1
7 -0.322989 -0.322989 -1.025568  0
8  1.609299  0.914469  1.609299  1
9  1.298129  1.214280  1.298129  1, '\n')
Available Sample
(          Y
0 -1.157911
1  1.553757
2  1.168600
3  0.767144
4  1.715646
5  1.957654
6 -0.103198
7 -0.322989
8  1.609299
9  1.298129, '\n')


## Data Set Provided to Students

Actual data sets will be given in the form of CSV files.
Program should be able to load the appropriate data set in a Pandas dataframe and subsequently process it.

In [8]:
sample_df = pd.DataFrame.from_csv("DataSet0.csv")

## Decision Rule

This part of the code simply translates a mathematical decision rule into Python code.

In [12]:
#d Y_hat[]
Y_hat = (sample_df > 0.5)
Y_hat['Y'] = Y_hat['Y'].map({False: 0, True: 1})
print(Y_hat)

   Y
0  0
1  1
2  1
3  1
4  1
5  1
6  0
7  0
8  1
9  1


We now read both sets of data from the csv files into Pandas dataframes.

In [None]:
Y_hat.to_csv("Answer0.csv")
