### READ ME
This is a final project for the course 'Business Data Analytics, Quantitative Methods and Visualization' (2021, Copenhagen Business School).
The goal of this project is to be able to predict the outcome of a certain UFC fight, more specifically to predict whether the Red or Blue fighter is more likely to win the match.

We are going to do this through the following steps:
1. __Importing necessary libraries and tools.__
2. __Loading, Cleaning & Exploring the dataset.__
3. __Data pre-processing.__
4. __Machine Learning Models.__
5. __Making a prediction.__
6. __Showcasing results.__

### Legend

- ufc = the original dataset
- column_names = list of column names
- ufc_df = the dataset without NaNs
- datatypes = dictionary with column names + their datatypes
- ufc_ohe = one hot encoded dataset

# 1. Importing necessary libraries and tools.

In [None]:
# libraries

import pandas as pd #used
import matplotlib.pyplot as plt #used
import numpy as np #used
import seaborn as sns #used

# models

from sklearn.neighbors import KNeighborsClassifier #used
from sklearn.neural_network import MLPClassifier #used
from sklearn.tree import DecisionTreeClassifier #used
from sklearn.ensemble import RandomForestClassifier #used
from sklearn.preprocessing import StandardScaler 



from sklearn.preprocessing import MinMaxScaler #used
from sklearn.feature_selection import SelectKBest #used
from sklearn.linear_model import Lasso
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

# use cross validation with grid search as well. 
from sklearn.model_selection import GridSearchCV

# other tools

from sklearn.model_selection import train_test_split #used
from sklearn.metrics import accuracy_score #used

import warnings #used

# 2. Loading, Cleaning & Exploring the dataset.

### 2.1 Load excel file (ufc.xlsx) and print out the first couple rows.

In [None]:
ufc = pd.read_excel('newestdataset.xlsx')

In [None]:
ufc.head()

### 2.2 Get number of rows and columns.

In [None]:
# print('Number of rows:', ufc.shape[0])
# print('Number of columns:', ufc.shape[1])

### 2.3 Get the list of column names.

In [None]:
column_names = ufc.columns

#### *We can call this list whenever we want to check the name of a column.*

### 2.4 Check for missing data.

In [None]:
# ufc.info()

#### As we can see, there are some NaNs in these columns that should later be taken care of:
- B_avg_SIG_STR_landed
- B_avg_SIG_STR_pct
- B_avg_SUB_ATT     
- B_avg_TD_landed      
- B_avg_TD_pct
- R_avg_SIG_STR_landed
- R_avg_SIG_STR_pct
- R_avg_SUB_ATT     
- R_avg_TD_landed      
- R_avg_TD_pct  

### 2.5 Check for duplicates.

In [None]:
duplicates = ufc.duplicated()
ufc[duplicates]

#### *It seems that there are no duplicates, therefore, we only need to focus on NaN values.*

### 2.6 Fill NaNs with 0.

In [None]:
ufc_df = ufc.fillna(0)

print('Number of rows:', ufc_df.shape[0])
print('Number of columns:', ufc_df.shape[1])

### 2.7 Check if columns have the right datatype.

In [None]:
# datatypes = ufc_df.dtypes.to_dict()
# datatypes

- O = object
- int64 = integer
- float64 = float
- <M8[ns] = date ("On a machine whose byte order is little endian, there is no difference between *np.dtype('datetime64[ns]')* and *np.dtype('<M8[ns]')*")
- bool = True/False

### 2.8 Fix datatypes if necessary.

### 2.9 Use info() again to check missing values and datatypes.

### 2.10 Create a dataframes for visualizations.

In [None]:
# General features
ufc_gen = ufc_df[['R_odds_dec', 'B_odds_dec', 'date', 'location', 'Winner', 'weight_class', 'Reach_diff_ins', 'Age_diff']]

### 2.11 Use head() for a better overview.

In [None]:
ufc_gen.head()

### 2.12 General features.

#### 2.12.1 Odds.

In [None]:
plt.hist(ufc_gen[['R_odds_dec', 'B_odds_dec']], color = ['Red', 'Blue'], bins = 6, orientation = 'horizontal', label = ['Red fighter', 'Blue fighter'])
plt.xlabel('Frequency')
plt.ylabel('Odds')
plt.title('Freqency of different odds, divided between Red and Blue fighters.')
plt.legend()
plt.show()

*The graph basically shows that there were approximately 2400 fights where the Red Fighter's odds were around 1-2, while there were only around 1900 where Blue Fighters had such a low odds (which means higher chance to win). However, in case of higher odds (which means less chance of winning the fight) Blue Fighters are dominating.* 

*From these, we can see that Red Fighters have lower odds in general which proves that statistically, they have higher chances to win. This seems logical as Red fighters are usually Favorites and Blue Fighters are underdogs.*

#### 2.12.2 Location.

In [None]:
ufc_gen['location'].value_counts().head(10).sort_values(ascending = True).plot(kind = 'barh', color = 'Lightgrey')
plt.xlabel('Number of fights')
plt.title('10 locations with the most fights.')
plt.savefig('location.png')
plt.show()

*Seems skewed, it's worth considering leaving location out.*

#### 2.12.3 Winner.

In [None]:
ax = sns.countplot(ufc_gen['Winner'], palette = ['Red', 'Blue', 'Lightgrey'])

for rect in ax.patches:
    ax.text (rect.get_x() + rect.get_width()  / 2,rect.get_height()+ 0.75,rect.get_height(), ha = 'center', va = 'baseline', color = 'black', size = 12
            )

In [None]:
ufc_gen['Winner'].value_counts().plot.pie(colors = ['Red', 'Blue', 'Lightgrey'], 
                labels=['Red fighter won', 'Blue fighter won', 'Draw'], autopct='%.1f%%', figsize = (7,7))
plt.ylabel('')
plt.title('Distribution (in %) of Outcomes of the Fights.')
plt.savefig('piechart.png')
plt.show()

*Based on the 2 graphs above, we can say that Red Fighters win more often which is probably because the Red Fighter is the Favorite.*

*If 56.2% of UFC fights end with a Red-win, one could say that he should always bet on the Red Fighter for higher chances.*

*Our goal is to exceed that 56.2% with our machine learning algorithm, and to be able to make more exact and appropiate decisions.* 

#### 2.12.4 Weight class.

In [None]:
# would be nice to divide it to two graphs by gender

ufc_gen.groupby('weight_class').mean().plot.bar(figsize=(20,5), color = ['Red', 'Blue', 'Lightgrey', 'Grey', 'Black'])

#### *We can see that the features included in this visualization highly depend on what weight class we are talking about, therefore, we should definetely include 'weight_class' in our most important features.*

#### 2.12.5 Reach & Age Difference.

In [None]:
plt.figure(figsize=(20,6))
ufc_gen.Age_diff.hist(bins = 40, color = 'grey')
plt.xlabel('Age difference')
plt.ylabel('Number of Fights')
plt.title('Number of Fights with various Age differences')
plt.show()

In [None]:
plt.figure(figsize=(20,6))
ufc_gen.Reach_diff_ins.hist(bins = 40, color = 'grey')
plt.xlabel('Reach difference (inch)')
plt.ylabel('Number of Fights')
plt.title('Number of Fights with various Reach differences')
plt.show()

*Similar curve, similar values - wouldn't it be enough to include only one of them (either age or reach difference)?*

### 2.13 Fighter Features.

In [None]:
corr_features = ['B_current_lose_streak', 'B_current_win_streak', 'B_total_SIG_STR_landed',
                'B_avg_SIG_STR_landed_per_fight', 'B_losses', 'B_total_title_bouts',
                 'B_wins', 'B_Height_cms', 'B_Reach_ins', 'B_UFC_fights', 'B_age', 'B_Reach_cms', 'R_current_lose_streak',
                 'R_current_win_streak', 'R_total_SIG_STR_landed', 'R_avg_SIG_STR_landed_per_fight', 'R_losses',
                 'R_total_title_bouts', 'R_wins', 'R_Height_cms', 'R_Reach_ins', 'R_UFC_fights', 'R_age', 'R_Reach_cms',
                 'Reach_diff_ins', 'Age_diff']
corr = ufc_df[corr_features].corr(method='pearson')

f, ax = plt.subplots(figsize=(20, 20))
cmap = sns.color_palette("crest", as_cmap=True)
sns.heatmap(corr, square= True, annot = True, cmap = cmap)
plt.savefig('heatmap.png')

- height & reach are highly correlated, we could keep only one of them (as we included reach difference, i think we should keep reach difference)

# 3. Data Pre-processing.

### 4.1 Drop unnecessary columns & drop fights that ended in a draw.

In [None]:
ufc_df = ufc_df.drop(labels = ['R_fighter', 'B_fighter', 'location',
       'Referee', 'date', 'R_odds', 'B_odds', 'R_Weight_lbs', 'B_Weight_lbs', 'no_of_rounds'], axis = 1)

ufc_df = ufc_df.drop(ufc_df[ufc_df.Winner == 'Draw'].index)

### 4.2 Deal with nominal features.

In [None]:
# one hot encoding
onehot_columns = ['weight_class', 'R_Stance', 'B_Stance']

ufc_ohe = pd.get_dummies(ufc_df, columns = onehot_columns, drop_first=True)
ufc_ohe.head()

In [None]:
ufc_ohe.columns

In [None]:
X = ufc_ohe[['title_bout',
       'B_current_lose_streak', 'B_current_win_streak', 'B_avg_KD', 'B_SLpM',
       'B_SApM', 'B_Sd', 'B_total_SIG_STR_landed',
       'B_avg_SIG_STR_landed_per_fight', 'B_losses',
       'B_avg_SIG_STR_absorberd_per_fight', 'B_total_SIG_STR_absorbed',
       'B_avg_opp_TOTAL_STR_landed', 'B_total_rounds_fought',
       'B_total_time_fought(minutes)', 'B_total_time_fought(seconds)',
       'B_avg_time_fought_per_fight(seconds)', 'B_total_title_bouts',
       'B_win_by_Decision_Majority', 'B_win_by_Decision_Split',
       'B_win_by_Decision_Unanimous', 'B_win_by_KO/TKO', 'B_win_by_Submission',
       'B_win_by_TKO_Doctor_Stoppage', 'B_wins', 'B_Height_cms', 'B_Reach_ins',
       'B_UFC_fights', 'B_age', 'B_Reach_cms', 'R_current_lose_streak',
       'R_current_win_streak', 'R_avg_KD', 'R_SLpM', 'R_SApM', 'R_Sd',
       'R_total_SIG_STR_absorbed', 'R_total_SIG_STR_landed',
       'R_avg_SIG_STR_landed_per_fight', 'R_losses',
       'R_avg_SIG_STR_absorbed_per_fight', 'R_avg_opp_TOTAL_STR_landed',
       'R_total_rounds_fought', 'R_total_time_fought(minutes)',
       'R_total_time_fought(seconds)', 'R_avg_time_fought_per_fight(seconds)',
       'R_total_title_bouts', 'R_win_by_Decision_Majority',
       'R_win_by_Decision_Split', 'R_win_by_Decision_Unanimous',
       'R_win_by_KO/TKO', 'R_win_by_Submission',
       'R_win_by_TKO_Doctor_Stoppage', 'R_wins', 'R_Height_cms', 'R_Reach_ins',
       'R_Reach_cms', 'R_UFC_fights', 'R_age', 'Reach_diff_ins', 'Age_diff',
       'weight_class_Catch Weight', 'weight_class_Featherweight',
       'weight_class_Flyweight', 'weight_class_Heavyweight',
       'weight_class_Light Heavyweight', 'weight_class_Lightweight',
       'weight_class_Middleweight', 'weight_class_Open Weight',
       'weight_class_Welterweight', 'weight_class_Women\'s Bantamweight',
       'weight_class_Women\'s Featherweight', 'weight_class_Women\'s Flyweight',
       'weight_class_Women\'s Strawweight', 'R_Stance_Open Stance',
       'R_Stance_Orthodox', 'R_Stance_Sideways', 'R_Stance_Southpaw',
       'R_Stance_Switch', 'B_Stance_Open Stance', 'B_Stance_Orthodox',
       'B_Stance_Sideways', 'B_Stance_Southpaw', 'B_Stance_Switch']]
y = ufc_ohe['Winner']

print('X shape:', X.shape)
print('y shape:', y.shape)

### 4.7 Select the 20 most influential features & select the best features.

In [None]:
# selecting best features
#print(X.shape)

k_best = SelectKBest(k = 20)
k_best.fit(X, y)
X_train_k_best = k_best.transform(X)
# X_test_k_best = k_best.transform(X)

#print(X_train_k_best.shape)
#print(X.columns[k_best.get_support()])

best_features = X.columns[k_best.get_support()]

#warnings.filterwarnings("ignore")

X = ufc_ohe[best_features]
y = ufc_ohe['Winner']

print('X shape:', X.shape)
print('y shape:', y.shape)

In [None]:
# define the train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

### 4.5 Deal with numerical features.

In [None]:
# scale_columns = ['B_current_lose_streak', 'B_current_win_streak', 'B_total_SIG_STR_landed',
#                'B_avg_SIG_STR_landed_per_fight', 'B_losses', 'B_total_title_bouts',
#                 'B_wins', 'B_Height_cms', 'B_Reach_ins', 'B_UFC_fights', 'B_age', 'B_Reach_cms', 'R_current_lose_streak',
#                 'R_current_win_streak', 'R_total_SIG_STR_landed', 'R_avg_SIG_STR_landed_per_fight', 'R_losses',
#                 'R_total_title_bouts', 'R_wins', 'R_Height_cms', 'R_Reach_ins', 'R_UFC_fights', 'R_age', 'R_Reach_cms',
#                 'Reach_diff_ins', 'Age_diff']

scaler = MinMaxScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

#warnings.filterwarnings("ignore")

# 5. Machine Learning Models.

### 5.1 Build models.

In [None]:
# knn
knn = KNeighborsClassifier()

# neural network
mpl = MLPClassifier()

# decision tree
tree = DecisionTreeClassifier()

# random forest
forest = RandomForestClassifier()

classifiers = (knn, mpl, tree, forest)

### 5.2 Build the 'applyModel' function.

In [None]:
def applyModel(model,name,X_train, y_train, X_test, y_test):
    m = model.fit(X_train,y_train)
    print(name, '- Training accuracy:', m.score(X_train, y_train))
    print(name, '- Testing accuracy:', m.score(X_test, y_test))

### 5.3 Train and test with the different models.

In [None]:
for c in classifiers:
    n = str(c)
    applyModel(c, n, X_train, y_train, X_test, y_test)
    print('')
    
warnings.filterwarnings("ignore")

### Prinipal component analysis (PCA)

In [None]:
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

In [None]:
principalDf

In [None]:
finalDf = pd.concat([principalDf, y], axis = 1)

In [None]:
finalDf

In [None]:
# explained of the variance. We can see that the first explains most of the variance 
pca.explained_variance_ratio_

In [None]:
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['Red', 'Blue']
colors = ['r', 'b']
for target, color in zip(targets,colors):
    indicesToKeep = finalDf['Winner'] == target
    ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
               , finalDf.loc[indicesToKeep, 'principal component 2']
               , c = color
               , s = 50)
ax.legend(targets)
ax.grid()

#### Heatmap of components 

In [None]:
pca.components_

In [None]:
map = pd.DataFrame(pca.components_,columns=[best_features])
plt.figure(figsize=(12,6))
sns.heatmap(map,cmap='viridis')

### Fit the PCA model & transform the X_train & and X_test

In [None]:
pca.fit(X_train)

In [None]:
pca_train = pca.transform(X_train)
pca_test = pca.transform(X_test)

### Testing neural network & Random Forest on PCA transformed data

In [None]:
mlp = MLPClassifier(alpha=4, learning_rate="invscaling")

mlp.fit(X_train, y_train)

print('Decision KNeighborsClassifier, Cancer dataset, weights set to "distance", and 5 neighbors')
print('Accuracy on the training set: {:.3f}'.format(mlp.score(X_train, y_train)))
print('Accuracy on test set: {:.3f}'.format(mlp.score(X_test, y_test)))

In [None]:
knn = RandomForestClassifier(max_depth=6, criterion='entropy')

knn.fit(pca_train, y_train)

print('Decision KNeighborsClassifier, Cancer dataset, weights set to "distance", and 5 neighbors')
print('Accuracy on the training set: {:.3f}'.format(knn.score(pca_train, y_train)))
print('Accuracy on test set: {:.3f}'.format(knn.score(pca_test, y_test)))

### 5.4 Play around with KNN on normal X_train,X_test, y_train, & y_test

In [None]:
training_accuracy = []
testing_accuracy = []
number_of_neighbors =[]
weighting_choice = []

weight_values = ['distance', 'uniform']

for n_neighbors in range(1,100):
    for weights in weight_values:
        clf = KNeighborsClassifier(n_neighbors = n_neighbors, weights = weights)
        clf.fit(X_train, y_train)
        training_accuracy.append(clf.score(X_train, y_train))
        testing_accuracy.append(clf.score(X_test, y_test))
        number_of_neighbors.append(n_neighbors)
        weighting_choice.append(weights)
      
     
combinations_sorted_knn = sorted(list(zip(number_of_neighbors, weighting_choice, training_accuracy, testing_accuracy)), key = lambda e:e[3], reverse = True)

print('Top 5 results, sorted by test accuracy:\n')
print(*combinations_sorted_knn[0:5], sep = "\n")

# save the best variables
knn_best_n_neighbors = combinations_sorted_knn[0][0]
knn_best_weights = combinations_sorted_knn[0][1]

#### 5.4.1 KNN - Cross validation

In [None]:
#We define the parameters which want to test for the KNN model
param_grid = {
    "n_neighbors": [32,54,74,100], "weights": ["uniform", "distance"], "metric": ["euclidean", "manhatten"], "leaf_size": [10,30,60]
}

#We make the model with cross validation and grid search
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=10)

#We fit the model, with the best parameters: 
gs_result = grid_search.fit(X_train, y_train)

#print the results: 

print(
    "\n Training score: ", gs_result.best_score_, 
    "\n Best estimator: ", gs_result.best_estimator_
    # ,"\n best parameters: ", gs_result.best_params_
)

print("Test score: ", grid_search.score(X_test,y_test))


### 5.5 Play around with Neural Network.

In [None]:
train_acc = []
test_acc = []
alpha_value = []
learning_rate_value = []


alphas = (0.0001, 0.001, 0.1, 0, 1, 5, 100)
learning_rates = ('constant', 'invscaling', 'adaptive')

for a in alphas:
    for l in learning_rates:
        mpl = MLPClassifier(alpha=a, learning_rate = l)
        mpl.fit(X_train, y_train)
        train_acc.append(accuracy_score(mpl.predict(X_train), y_train))
        test_acc.append(accuracy_score(mpl.predict(X_test), y_test))
        alpha_value.append(a)
        learning_rate_value.append(l)
        

combinations_sorted_mpl = sorted(list(zip(alpha_value, learning_rate_value, train_acc, test_acc)), key = lambda e:e[3], reverse = True)

print('Top 5 results, sorted by test accuracy:\n')
print(*combinations_sorted_mpl[0:5], sep = "\n")

# save the best variables
mpl_best_alpha = combinations_sorted_knn[0][0]
mpl_best_learning_rate = combinations_sorted_knn[0][1]

#### Neural Network - Cross validation

In [None]:
#We define the parameters which want to test for the decision tree model
param_grid = {"learning_rate":["constant", "invscaling", "adaptive"], "alpha":[0.01,0.01,1,5,10,15]}

#We make the model with cross validation and grid search
grid_search = GridSearchCV(MLPClassifier(), param_grid, cv=10)

#We fit the model, with the best parameters: 
gs_result = grid_search.fit(X_train, y_train)

#print the results: 

print(
    "\n Training score: ", gs_result.best_score_, 
    "\n Best estimator: ", gs_result.best_estimator_,
    "\n best parameters: ", gs_result.best_params_,
    "Test score: ", grid_search.score(X_test,y_test))

### 5.6 Play around with Decision Tree.

In [None]:
train_acc = []
test_acc = []
max_depth_value = []

for i in range(1,30):
    dt = DecisionTreeClassifier(max_depth = i, random_state=0)
    dt.fit(X_train, y_train)
    train_acc.append(accuracy_score(dt.predict(X_train), y_train))
    test_acc.append(accuracy_score(dt.predict(X_test), y_test))
    max_depth_value.append(i)

combinations_sorted_tree = sorted(list(zip(max_depth_value, train_acc, test_acc)), key = lambda e:e[2], reverse = True)

print('Top 5 results, sorted by test accuracy:\n   (Values: depth, training accracy, test accuracy)\n')
print(*combinations_sorted_tree[0:5], sep = "\n")

# save the best variable
tree_best_max_depth = combinations_sorted_tree[0][0]

##### 5.6.1 Decision Tree - Cross validation

In [None]:
#We define the parameters which want to test for the decision tree model
param_grid = {'criterion':['gini','entropy'],'max_depth':[1,2,3,4,5,6,7,8,9,10,12,15,20]}

#We make the model with cross validation and grid search
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=10)

#We fit the model, with the best parameters: 
gs_result = grid_search.fit(X_train, y_train)

#print the results: 

print(
    "\n Training score: ", gs_result.best_score_, 
    "\n Best estimator: ", gs_result.best_estimator_,
    "\n best parameters: ", gs_result.best_params_,
    "Test score: ", grid_search.score(X_test,y_test)
)

### 5.7 Play around with Random Forest.

In [None]:
train_acc = []
test_acc = []
max_depth_value = []
criterion_value = []
max_features_value = []

criterions = ('gini', 'entropy')
number_of_features = range(1,6)

for i in range(1,9):
    for c in criterions:
        for f in number_of_features:
                rf = RandomForestClassifier(criterion = c, max_depth = i, max_features = f, random_state=0)
                rf.fit(X_train, y_train)
                train_acc.append(accuracy_score(rf.predict(X_train), y_train))
                test_acc.append(accuracy_score(rf.predict(X_test), y_test))
                max_depth_value.append(i)
                criterion_value.append(c)
                max_features_value.append(f)

combinations_sorted_forest = sorted(list(zip(max_features_value, criterion_value, max_depth_value, train_acc, test_acc)), key = lambda e:e[3], reverse = True)

print('Top 5 results, sorted by test accuracy:\n   (Values: depth, training accracy, test accuracy)\n')
print(*combinations_sorted_forest[0:5], sep = "\n")

# save the best variable


#### Random forest - Cross validation 

In [None]:
#We define the parameters which want to test for the decision tree model
param_grid = {
    "max_features":[2,4,6,8,10,12,20,40], 'max_depth':[1,2,3,4,5,6,7,8,9,10,12,15,20], "criterion":['gini', 'entropy']
             }

#We make the model with cross validation and grid search
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=10)

#We fit the model, with the best parameters: 
gs_result = grid_search.fit(X_train, y_train)

#print the results: 

print(
    "\n Training score: ", gs_result.best_score_, 
    "\n Best estimator: ", gs_result.best_estimator_,
    "\n best parameters: ", gs_result.best_params_,
    "Test score: ", grid_search.score(X_test,y_test)
)

### 5.8 Look at the best combinations with each model.

In [None]:
print('KNN:', combinations_sorted_knn[0])
print('Neural Network:', combinations_sorted_mpl[0])
print('Decision Tree:', combinations_sorted_tree[0])
print('Random Forest:', combinations_sorted_forest[0])

best_combos = {'kNN': combinations_sorted_knn[0][-1], 'Neural Network': combinations_sorted_mpl[0][-1], 'Decision Tree': combinations_sorted_tree[0][-1], 'Random Forest': combinations_sorted_forest[0][-1]}
# for now, the KNN model has the highest test accuracy

In [None]:
keys = list(best_combos.keys())
# get values in the same order as keys, and parse percentage values
vals = [float(best_combos[k]) for k in keys]

splot = sns.barplot(x = keys, y = vals, palette = ['Grey', 'Red', 'Grey', 'Grey'])

for p in splot.patches:
    splot.annotate(format(p.get_height(), '.4f'), 
                   (p.get_x() + p.get_width() / 2., p.get_height()), 
                   ha = 'center', va = 'top', 
                   xytext = (0, 9), 
                   textcoords = 'offset points')