This is a binary classification problem, where the goal is to classify each passenger of the Titanic ship into survived (1) or not survived (0) groups. 

The training and test datasets are available. Your task is to develop OneR models (from the training set) to predict which passengers (in the test set) survived the tragedy.  

OneR, short for "One Rule", is a simple, yet accurate, classification algorithm that generates one rule for each feature in the data. There are 7 features in the Titanic dataset (item-1 to item-7 in the following list) and one target class (item-8) as shown below:
1.	pclass: Ticket class (1 = 1st class, 2 = 2nd class, 3 = 3rd class)
2.	gender: Gender of the passenger (Male, Female)	
3.	age: Age in years	
4.	sibsp:	# of siblings or spouses aboard the Titanic	
5.	parch: # of parents or children aboard the Titanic 
   a.	Parent = mother, father
   b.	Child = daughter, son, stepdaughter, stepson
   c.	Some children travelled only with a nanny, therefore parch=0 for them.	
6.	fare: Passenger fare	
7.	embarked: Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
8.	survived: Survival (0 = Not survived, 1 = survived). This is a binary class. 

OneR essentially is a majority voting. For instance, if the majority of the female passengers (gender = female) in the training set survived the tragedy, the OneR classifies all the females in the test set as survived (1). If the majority of the male passengers (gender = male) in the training set did not survive the tragedy, the OneR classifies all the males in the test set as not survived (0). Therefore, OneR for this feature could be the following: 

OneR(gender) = if the passenger was female, she survived.  

Similarly, if the majority of the first class passengers (pclass = 1) in the training set survived the tragedy, the OneR classifies all the first class passengers in the test set as survived. Next, its checks the second class passenger (pclass=2). If the majority of the second class passengers (pclass = 2) in the training set survived the tragedy, the OneR classifies all the second class passengers in the test set as survived. 

Finally, repeat this step for the third class passengers (pclass = 3). Let’s assume that the majority of the third class passenger in the training set did not survived. In this case, OneR for this feature could be the following: 

OneR(pclass) = if the passenger travelled in the first class or the second class, s/he survived.

Develop a Python script to
1.	Read-in the training set and test set.
2.	Construct a machine learning model using OneR for each of the following features: gender, pclass, sibsp, parch, embarked.
3.	For each feature, predict class label (survived or not survived) for every passenger in the test set.
4.	For each feature, store the passenger IDs and class labels in the appropriate worksheet (e.g., Gender_Based_Prediction for gender based OneR) of the attached output file (titanic_test_predictions.xlsx).
5.	Compute success rate for the test set. Success rate is the total number of correct predictions divide by the total instances.

In [2]:
import pandas as pd

In [3]:
df_training = pd.read_excel("titanic_traning.xlsx")
df_test = pd.read_excel("titanic_test.xlsx")

def OneR(df, feature):
    # This function will caluate the unique value of each feature and count the survive frequency of each unique 
    # value. Then, assign the survival condition to that particular unique value of each feature.
    import pandas as pd
    result = pd.Series(dtype = int)
    for i in df[feature].unique():
        survive_stat = df[df[feature] == i]["survived"].value_counts()
        survive = survive_stat.idxmax()
        result.loc[i] = survive
    return result

In [5]:
features = ["gender", "pclass", "sibsp", "parch", "embarked"]

# Built the OneR models for each feature.
OneR_gender = OneR(df_training, "gender")
OneR_pclass = OneR(df_training, "pclass")
OneR_sibsp = OneR(df_training, "sibsp")
OneR_parch = OneR(df_training, "parch")
OneR_embarked = OneR(df_training, "embarked")

# Predict the test set based on the OneR models built before.
pred_gender = df_test["gender"].map(lambda x: OneR_gender.loc[x])
pred_pclass = df_test["pclass"].map(lambda x: OneR_pclass.loc[x])
pred_sibsp = df_test["sibsp"].map(lambda x: OneR_sibsp.loc[x])
pred_parch = df_test["parch"].map(lambda x: OneR_parch.loc[x])
# Since there are missing values in embarked, we used foward fill to fill it.
df_test["embarked"].fillna(method = "ffill", inplace = True)
pred_embarked = df_test["embarked"].map(lambda x: OneR_embarked.loc[x])

sheet_list = ["Gender_Based_Prediction", "pclass_Based_Prediction", 
              "sibsp_Based_Prediction", "parch_Based_Prediction",
              "embarked_Based_Prediction", "Prediction_Success_Rate"]

success_rate_df = pd.DataFrame({"Feature": sheet_list[:5]})

# Write the prediction into its repsective sheet and calculate the accuracy of the prediction based on each feature.
writer = pd.ExcelWriter("titanic_test_predictions.xlsx", mode = "w")
for i in range(5):
    temp_df = pd.read_excel("titanic_test_predictions.xlsx", 
                            sheet_name = sheet_list[i])
    temp_df["Prediction"] = globals()["pred_" + features[i]]
    temp_df.to_excel(writer, sheet_name = sheet_list[i], index = False)
    success_rate = (temp_df["Prediction"] == temp_df["Ground truth"]).mean()
    success_rate_df.loc[i, "Success Rate"] = "{:.2%}".format(success_rate)
success_rate_df.to_excel(writer, sheet_name = sheet_list[-1], index = False)

writer.save()