# Assignment 1 
## Group name: ID2214 - 5
### Project members: 
[Francesco Luce, email]

[Leandro Duarte, leandrod@kth.se]

[Stefano Bosoppi, bosoppi@kth.se]

### Declaration:
By submitting this assignment, it is hereby declared that all group members listed above have contributed to the solution, either with code that appear in the final solution below, or with code that has been evaluated and compared to the final solution, but for some reason has been excluded. It is also declared that all project members fully understand all parts of the final solution and can explain it upon request.

It is furthermore declared that the code below is a contribution by the project members only, and specifically that no part of the solution has been copied from any other source (except for lecture slides at the course ID2214/FID3214), no part of the solution has been provided by someone not listed as a project member above, and no part of the solution has been generated by a system.

It is furthermore declared that the submitted assignment will not be shared during the course, with any individual other than the group members listed above and teachers of the course ID2214/FID3214. In particular, the assignment will not be uploaded to any public repository. The submitted assignment can be shared after the course only if written consent has been provided by the course responsible of ID2214/FID3214.

It is furthermore declared that it has been understood that no other library/package than the Python 3 standard library, NumPy and pandas may be used in the solution for this assignment.

### Instructions
All parts of the assignment starting with number 1 below are mandatory. Satisfactory solutions
will give 1 point (in total). If they in addition are good (all parts work more or less 
as they should), completed on time (submitted before the deadline in Canvas) and according
to the instructions, together with satisfactory solutions of all parts of the assignment starting 
with number 2 below, then the assignment will receive 2 points (in total).

Note that you do not have to develop the code directly within the notebook
but may instead copy the comments and test cases to a more convenient development environment
and when everything works as expected, you may paste your functions into this
notebook, do a final testing (all cells should succeed), save the notebook including the output
and submit the whole notebook (a single file) in Canvas (do not forget to fill in your group number
and names above).

## Load NumPy and pandas

In [None]:
import numpy as np
import pandas as pd

In [None]:
from platform import python_version

print(f"Python version: {python_version()}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 1a. Create and apply column filter

In [None]:
# Insert the functions create_column_filter and apply_column_filter below (after the comments)
#
# Input to create_column_filter:
# df - a dataframe (where the column names "CLASS" and "ID" have special meaning)
#
# Output from create_filter:
# df            - a new dataframe, where columns, except "CLASS" and "ID", containing only missing values 
#                 or only one unique value (apart from the missing values) have been dropped
# column_filter - a list of the names of the remaining columns, including "CLASS" and "ID"
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Iterate through all columns and consider to drop a column only if it is not labeled "CLASS" or "ID"
#
# Hint 3: You may check the number of unique (non-missing) values in a column by applying the pandas functions
#         dropna and unique to drop missing values and get the unique (remaining) values
#
# Input to apply_column_filter:
# df            - a dataframe
# column_filter - a list of the names of the columns to keep (see above)
#
# Output from apply_column_filter:
# df - a new dataframe, where each column that is not included in column_filter has been dropped
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)



In [None]:
def create_column_filter(df: pd.DataFrame) -> tuple[pd.DataFrame, list[str]]:
    """
    Creates a filtered dataframe by removing columns with only missing values or one unique value.
    Keeps CLASS and ID columns.
    """
    # Copy df
    df_copy = df.copy()
    
    # filter with CLASS and ID if they exist
    column_filter = [col for col in df.columns if col in ['CLASS', 'ID']]
    
    # Check each column
    for col in df.columns:
        if col not in ['CLASS', 'ID']:
            unique_vals = df[col].dropna().unique()
            if len(unique_vals) > 1:  # Keep if more than 1 unique non-null value
                column_filter.append(col)
    
    return df_copy[column_filter], column_filter

def apply_column_filter(df: pd.DataFrame, column_filter: list[str]) -> pd.DataFrame:
    """
    Applies a column filter to keep only specified columns.
    """
    return df.copy()[column_filter]

In [None]:
# Test your code (leave this part unchanged)

df = pd.DataFrame({"CLASS":[1,0,1,0,1],"A":[1,2,np.nan,4,5],"B":[1,1,1,1,np.nan],"C":["h","h",np.nan,"i","h"],"D":[np.nan,np.nan,np.nan,np.nan,np.nan]})

filtered_df, column_filter = create_column_filter(df)

new_df = pd.DataFrame({"CLASS":[1,0,0],"A":[4,5,6],"B":[1,2,1],"C":[np.nan,np.nan,np.nan],"D":[np.nan,4,5]})

filtered_new_df = apply_column_filter(new_df,column_filter)

display("df",df)
display("filtered_df",filtered_df)
display("new_df",new_df)
display("filtered_new_df",filtered_new_df)

## 1b. Create and apply normalization

In [None]:
# Insert the functions create_normalization and apply_normalization below (after the comments)
#
# Input to create_normalization:
# df: a dataframe (where the column names "CLASS" and "ID" have special meaning)
# normalizationtype: "minmax" (default) or "zscore"
#
# Output from create_normalization:
# df            - a new dataframe, where each numeric value in a column has been replaced by a normalized value
# normalization - a mapping (dictionary) from each column name to a triple, consisting of
#                ("minmax",min_value,max_value) or ("zscore",mean,std)
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Consider columns of type "float" or "int" only (and which are not labeled "CLASS" or "ID"),
#         the other columns should remain unchanged
#
# Hint 3: Take a close look at the lecture slides on data preparation
#
# Input to apply_normalization:
# df            - a dataframe
# normalization - a mapping (dictionary) from column names to triples (see above)
#
# Output from apply_normalization:
# df - a new dataframe, where each numerical value has been normalized according to the mapping
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: For minmax-normalization, you may consider to limit the output range to [0,1]



In [None]:
def minmax_normalize(
    col: pd.Series, min_val: float = None, max_val: float = None
) -> tuple[pd.Series, tuple[float, float]]:
    """
    Returns MinMax-normalized `col` and the min and max values, respectively, used for normalization.
    If values for min and/or max are provided, they are used, otherwise they are derived from `col`.
    """
    norm_col = col.copy()

    col_min = col.min() if min_val is None else min_val
    col_max = col.max() if max_val is None else max_val

    norm_col = (norm_col - col_min) / (col_max - col_min)

    return norm_col, (col_min, col_max)


def zscore_normalize(
    col: pd.Series, mean_val: float = None, std_val: float = None
) -> tuple[pd.Series, tuple[float, float]]:
    """
    Returns z-normalized `col` and the mean and standard deviation values, respectively, used for normalization.
    If values for mean and/or standard deviation are provided, they are used, otherwise they are dervied from `col`.
    """
    norm_col = col.copy()

    col_mean = col.mean() if mean_val is None else mean_val
    col_std = col.std() if std_val is None else std_val

    norm_col = (norm_col - col_mean) / col_std

    return norm_col, (col_mean, col_std)


def get_normalizer(normalizationtype: str):
    """
    Returns the normalizer function corresponding to the provided type.
    Accepted types are "minmax" and "zscore".
    """
    match normalizationtype:
        case "minmax":
            return minmax_normalize
        case "zscore":
            return zscore_normalize
        case _:
            raise Exception(f'Normalization type "{normalizationtype}" not supported.')


def create_normalization(
    df: pd.DataFrame, normalizationtype: str = "minmax"
) -> tuple[pd.DataFrame, dict[str, tuple[str, float, float]]]:
    """
    Normalizes `df`'s columns (excluding "CLASS" and "ID") with the normalization type provided.
    Returns the normalized dataframe and a dictionary associating each column with the normalization type and the parameters used by the corresponding normalizer.
    """
    new_df = df.copy()
    normalization = {}

    normalizer = get_normalizer(normalizationtype)

    columns = set(new_df.columns).difference({"CLASS", "ID"})
    for col in columns:
        norm_col, params = normalizer(new_df[col])
        new_df[col] = norm_col

        normalization[col] = tuple([normalizationtype] + [val for val in params])

    return new_df, normalization


def apply_normalization(
    df: pd.DataFrame, normalization: dict[str, tuple[str, float, float]]
) -> pd.DataFrame:
    """
    Normalizes `df`'s column (excluding "CLASS" and "ID") using the normalization type and parameters specified in `normalization`.
    """
    new_df = df.copy()
    columns = set(new_df.columns).difference({"CLASS", "ID"})

    for col in columns:
        col_dets = normalization[col]

        normalizer = get_normalizer(col_dets[0])

        norm_col, _ = normalizer(new_df[col], *col_dets[1:])

        new_df[col] = norm_col

    return new_df

In [None]:
# Test your code (leave this part unchanged)

glass_train_df = pd.read_csv("glass_train.csv")

glass_test_df = pd.read_csv("glass_test.csv")

glass_train_norm, normalization = create_normalization(glass_train_df,normalizationtype="minmax")
print("normalization:\n")
for f in normalization:
    print("{}:{}".format(f,normalization[f]))

print()
    
glass_test_norm = apply_normalization(glass_test_df,normalization)
display("glass_test_norm",glass_test_norm)

### Comment on assumptions, things that do not work properly, etc.


## 1c. Create and apply imputation

In [None]:
# Insert the functions create_imputation and apply_imputation below (after the comments)
#
# Input to create_imputation:
# df: a dataframe (where the column names "CLASS" and "ID" have special meaning)
#
# Output from create_imputation:
# df         - a new dataframe, where each missing numeric value in a column has been replaced by the mean of that column 
#              and each missing categoric value in a column has been replaced by the mode of that column
# imputation - a mapping (dictionary) from column name to value that has replaced missing values
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Handle columns of type "float" or "int" only (and which are not labeled "CLASS" or "ID") in one way
#         and columns of type "object" and "category" in other ways
#
# Hint 3: Consider using the pandas functions mean and mode respectively, as well as fillna
#
# Hint 4: In the rare case of all values in a column being missing*, replace numeric values with 0,
#         object values with "" and category values with the first category (cat.categories[0])  
#
#         *Note that this will not occur if the previous column filter function has been applied
#
# Input to apply_imputation:
# df         - a dataframe
# imputation - a mapping (dictionary) from column name to value that should replace missing values
#
# Output from apply_imputation:
# df - a new dataframe, where each missing value has been replaced according to the mapping
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Consider using fillna

In [None]:
def create_imputation(df: pd.DataFrame) -> tuple[pd.DataFrame, dict]:
    """
    Create imputation values and apply them to missing values in dataframe.
    """

    df_copy = df.copy()
    imputation = {}

    for col in df.columns:
        if col in ["CLASS", "ID"]:
            continue

        # Handle numeric columns
        if pd.api.types.is_numeric_dtype(df[col]):
            fill_value = df[col].mean()
            if pd.isna(fill_value):  # All values missing
                fill_value = 0

        # Handle categorical/object columns
        elif df[col].dtype == 'category':
            fill_value = (
                df[col].mode().iloc[0]
                if not df[col].mode().empty
                else df[col].cat.categories[0]
            )
        else:  # object type
            fill_value = df[col].mode().iloc[0] if not df[col].mode().empty else ""

        df_copy[col] = df_copy[col].fillna(fill_value)
        imputation[col] = fill_value

    return df_copy, imputation


def apply_imputation(df: pd.DataFrame, imputation: dict) -> pd.DataFrame:
    """
    Apply existing imputation values to missing values in dataframe.
    """
    df_copy = df.copy()

    for col, value in imputation.items():
        if col in df_copy.columns:
            df_copy[col] = df_copy[col].fillna(value)

    return df_copy

In [None]:
# Test your code (leave this part unchanged)

anneal_train_df = pd.read_csv("anneal_train.csv")
anneal_test_df = pd.read_csv("anneal_test.csv")

anneal_train_imp, imputation = create_imputation(anneal_train_df)
anneal_test_imp = apply_imputation(anneal_test_df,imputation)

print("Imputation:\n")
for f in imputation:
    print("{}:{}".format(f,imputation[f]))

print("\nNo. of replaced missing values in training data:\n{}".format(anneal_train_imp.count()-anneal_train_df.count()))
print("\nNo. of replaced missing values in test data:\n{}".format(anneal_test_imp.count()-anneal_test_df.count()))

### Comment on assumptions, things that do not work properly, etc.

## 1d. Create and apply discretization

In [None]:
# Insert the functions create_bins and apply_bins below
#
# Input to create_bins:
# df      - a dataframe
# nobins  - no. of bins (default = 10)
# bintype - either "equal-width" (default) or "equal-size" 
#
# Output from create_bins:
# df      - a new dataframe, where each numeric feature value has been replaced by a categoric (corresponding to some bin)
# binning - a mapping (dictionary) from column name to bins (threshold values for the bin)
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Discretize columns of type "float" or "int" only (and which are not labeled "CLASS" or "ID")
#
# Hint 3: Consider using pd.cut and pd.qcut respectively, with labels=False and retbins=True
#
# Hint 4: Set all columns in the new dataframe to be of type "category"
#
# Hint 5: Set the categories of the discretized features to be [0,...,nobins-1]
#
# Hint 6: Change the first and the last element of each binning to -np.inf and np.inf respectively 
#
# Input to apply_bins:
# df      - a dataframe
# binning - a mapping (dictionary) from column name to bins (threshold values for the bin)
#
# Output from apply_bins:
# df - a new dataframe, where each numeric feature value has been replaced by a categoric (corresponding to some bin)
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Consider using pd.cut 
#
# Hint 3: Set all columns in the new dataframe to be of type "category"
#
# Hint 4: Set the categories of the discretized features to be [0,...,nobins-1]



In [None]:
def create_bins(
    df: pd.DataFrame, nobins: int = 10, bintype: str = "equal-width"
) -> tuple[pd.DataFrame, dict]:
    """
    Create bins for numeric features and apply discretization.
    """

    df_copy = df.copy()
    binning = {}

    for col in df.columns:
        if col in ["CLASS", "ID"]:
            continue

        # Only process numeric columns
        if not np.issubdtype(df[col].dtype, np.number):
            continue

        # Create bins based on bintype
        if bintype == "equal-width":
            discretized, bins = pd.cut(df[col], bins=nobins, labels=False, retbins=True)
        else:  # equal-size
            discretized, bins = pd.qcut(
                df[col], q=nobins, labels=False, retbins=True, duplicates="drop"
            )

        # Adjust bin edges
        bins[0] = -np.inf
        bins[-1] = np.inf

        # Store bins and update column
        binning[col] = bins
        df_copy[col] = pd.Categorical(discretized, categories=range(nobins))

    return df_copy, binning


def apply_bins(df: pd.DataFrame, binning: dict) -> pd.DataFrame:
    """
    Apply existing bins to numeric features.
    """

    df_copy = df.copy()

    for col, bins in binning.items():
        if col not in df_copy.columns:
            continue

        nobins = len(bins) - 1  # number of bins is one less than number of thresholds
        discretized = pd.cut(df_copy[col], bins=bins, labels=False)
        df_copy[col] = pd.Categorical(discretized, categories=range(nobins))

    return df_copy

In [None]:
# Test your code  (leave this part unchanged)

glass_train_df = pd.read_csv("glass_train.csv")

glass_test_df = pd.read_csv("glass_test.csv")

glass_train_disc, binning = create_bins(glass_train_df,nobins=10,bintype="equal-size")
print("binning:")
for f in binning:
    print("{}:{}".format(f,binning[f]))

print()    
glass_test_disc = apply_bins(glass_test_df,binning)
display("glass_test_disc",glass_test_disc)

### Comment on assumptions, things that do not work properly, etc.

## 1e. Create and apply one-hot encoding

In [None]:
# Insert the functions create_one_hot and apply_one_hot below
#
# Input to create_one_hot:
# df: a dataframe
#
# Output from create_one_hot:
# df      - a new dataframe, where each categoric feature has been replaced by a set of binary features 
#           (as many new features as there are possible values)
# one_hot - a mapping (dictionary) from column name to a set of categories (possible values for the feature)
#
# Hint 1: First copy the input dataframe and modify the copy (the input dataframe should be kept unchanged)
#
# Hint 2: Consider columns of type "object" or "category" only (and which are not labeled "CLASS" or "ID")
#
# Hint 3: Consider creating new column names by merging the original column name and the categorical value
#
# Hint 4: Set all new columns to be of type "float"
#
# Hint 5: Do not forget to remove the original categoric feature
#
# Input to apply_one_hot:
# df      - a dataframe
# one_hot - a mapping (dictionary) from column name to categories
#
# Output from apply_one_hot:
# df - a new dataframe, where each categoric feature has been replaced by a set of binary features
#
# Hint: See the above Hints



In [None]:
def create_one_hot(df: pd.DataFrame) -> tuple[pd.DataFrame, dict]:
    """
    Create one-hot encoding for categorical features.
    """

    df_copy = df.copy()
    one_hot = {}

    for col in df.columns:
        if col in ["CLASS", "ID"]:
            continue

        # Only process object or category columns
        if df[col].dtype not in ["object", "category"]:
            continue

        # Get unique categories
        categories = df[col].unique()
        one_hot[col] = categories

        # Create one-hot encoded columns
        for category in categories:
            new_col_name = f"{col}_{category}"
            df_copy[new_col_name] = (df[col] == category).astype(float)

        # Drop original column
        df_copy.drop(columns=[col], inplace=True)

    return df_copy, one_hot


def apply_one_hot(df: pd.DataFrame, one_hot: dict) -> pd.DataFrame:
    """
    Apply one-hot encoding using existing categories.
    """

    df_copy = df.copy()

    for col, categories in one_hot.items():
        if col not in df_copy.columns:
            continue

        # Create one-hot encoded columns
        for category in categories:
            new_col_name = f"{col}_{category}"
            df_copy[new_col_name] = (df[col] == category).astype(float)

        # Drop original column
        df_copy.drop(columns=[col], inplace=True)

    return df_copy

In [None]:
# Test your code  (leave this part unchanged)

train_df = pd.read_csv("tic-tac-toe_train.csv")

new_train, one_hot = create_one_hot(train_df)

test_df = pd.read_csv("tic-tac-toe_test.csv")

new_test_df = apply_one_hot(test_df,one_hot)
display("new_test_df",new_test_df)

### Comment on assumptions, things that do not work properly, etc.

## 1f. Divide a dataset into a training and a test set

In [None]:
# Insert the function split below
#
# Input to split:
# df           - a dataframe
# testfraction - a float in the range (0,1) (default = 0.5)
#
# Output from split:
# trainingdf - a dataframe consisting of a random sample of (1-testfraction) of the rows in df
# testdf     - a dataframe consisting of the rows in df that are not included in trainingdf
#
# Hint: You may use np.random.permutation(df.index) to get a permuted list of indexes where a 
#       prefix corresponds to the test instances, and the suffix to the training instances 



In [None]:
# Test your code  (leave this part unchanged)

glass_df = pd.read_csv("glass.csv")

glass_train, glass_test = split(glass_df,testfraction=0.25)

print("Training IDs:\n{}".format(glass_train["ID"].values))

print("\nTest IDs:\n{}".format(glass_test["ID"].values))

print("\nOverlap: {}".format(set(glass_train["ID"]).intersection(set(glass_test["ID"]))))

### Comment on assumptions, things that do not work properly, etc.

## 1g. Calculate accuracy of a set of predictions

In [None]:
# Insert the function accuracy below
#
# Input to accuracy:
# df            - a dataframe with class labels as column names and each row corresponding to
#                 a prediction with estimated probabilities for each class
# correctlabels - an array (or list) of the correct class label for each prediction
#                 (the number of correct labels must equal the number of rows in df)
#
# Output from accuracy:
# accuracy - the fraction of cases for which the predicted class label coincides with the correct label
#
# Hint: In case the label receiving the highest probability is not unique, you may
#       resolve that by picking the first (as ordered by the column names) or 
#       by randomly selecting one of the labels with highest probaility.



In [None]:
# Test your code  (leave this part unchanged)

predictions = pd.DataFrame({"A":[0.5,0.5,0.5,0.25,0.25],"B":[0.5,0.25,0.25,0.5,0.25],"C":[0.0,0.25,0.25,0.25,0.5]})
display("predictions",predictions)

In [None]:
correctlabels = ["B","A","B","B","C"]

print("Accuracy: {}".format(accuracy(predictions,correctlabels))) # Note that depending on how ties are resolved the accuracy may be 0.6 or 0.8

### Comment on assumptions, things that do not work properly, etc.

## 2a. Divide a dataset into a number of folds

In [None]:
# Insert the function folds below
#
# Input to folds:
# df      - a dataframe
# nofolds - an integer greater than 1 (default = 10)
#
# Output from folds:
# folds - a list (of length = nofolds) dataframes consisting of random non-overlapping, 
#         approximately equal-sized subsets of the rows in df
#
# Hint: You may use np.random.permutation(df.index) to get a permuted list of indexes from which a 
#       prefix corresponds to the test instances, and the suffix to the training instances 



In [None]:
# Test your code  (leave this part unchanged)

glass_df = pd.read_csv("glass.csv")

glass_folds = folds(glass_df,nofolds=5)

fold_sizes = [len(f) for f in glass_folds]

print("Fold sizes:{}\nTotal no. instances: {}".format(fold_sizes,sum(fold_sizes)))

### Comment on assumptions, things that do not work properly, etc.

## 2b. Calculate Brier score of a set of predictions

In [None]:
# Insert the function brier_score below
#
# Input to brier_score:
# df            - a dataframe with class labels as column names and each row corresponding to
#                 a prediction with estimated probabilities for each class
# correctlabels - an array (or list) of the correct class label for each prediction
#                 (the number of correct labels must equal the number of rows in df)
#
# Output from brier_score:
# brier_score - the average square error of the predicted probabilties 
#
# Hint: Compare each predicted vector to a vector for each correct label, which is all zeros except 
#       for at the index of the correct class. The index can be found using np.where(df.columns==l)[0] 
#       where l is the correct label.



In [None]:
# Test your code  (leave this part unchanged)

predictions = pd.DataFrame({"A":[0.5,0.5,0.5,0.25,0.25],"B":[0.5,0.25,0.25,0.5,0.25],"C":[0.0,0.25,0.25,0.25,0.5]})

correctlabels = ["B","A","B","B","C"]

print("Brier score: {}".format(brier_score(predictions,correctlabels)))

### Comment on assumptions, things that do not work properly, etc.

## 2c. Calculate AUC of a set of predictions

In [None]:
# Insert the function auc below
#
# Input to auc:
# df            - a dataframe with class labels as column names and each row corresponding to
#                 a prediction with estimated probabilities for each class
# correctlabels - an array (or list) of the correct class label for each prediction
#                 (the number of correct labels must equal the number of rows in df)
#
# Output from auc:
# auc - the weighted area under ROC curve
#
# Hint 1: Calculate the binary AUC first for each class label c, i.e., treating the
#         predicted probability of this class for each instance as a score; the true positives
#         are the ones belonging to class c and the false positives the rest
#
# Hint 2: When calculating the binary AUC, first find the scores of the true positives and then
#         the scores of the true negatives
#
# Hint 3: You may use a dictionary with a mapping from each score to an array of two numbers; 
#         the number of true positives with this score and the number of true negatives with this score
#
# Hint 4: Created a (reversely) sorted (on the scores) list of pairs from the dictionary and
#         iterate over this to additively calculate the AUC
#
# Hint 5: For each pair in the above list, there are three cases to consider; the no. of false positives
#         is zero, the no. of true positives is zero, and both are non-zero
#
# Hint 6: Calculate the weighted AUC by summing the individual AUCs weighted by the relative
#         frequency of each class (as estimated from the correct labels)



In [None]:
# Test your code  (leave this part unchanged)

predictions = pd.DataFrame({"A":[0.9,0.9,0.6,0.55],"B":[0.1,0.1,0.4,0.45]})

correctlabels = ["A","B","B","A"]

print("AUC: {}".format(auc(predictions,correctlabels)))

In [None]:
predictions = pd.DataFrame({"A":[0.5,0.5,0.5,0.25,0.25],"B":[0.5,0.25,0.25,0.5,0.25],"C":[0.0,0.25,0.25,0.25,0.5]})

correctlabels = ["B","A","B","B","C"]

print("AUC: {}".format(auc(predictions,correctlabels)))

### Comment on assumptions, things that do not work properly, etc.