# Kings of War Custom Unit Cost Estimator

## Project Goal
This project aims to estimate the cost of a custom unit for the Kings of War miniature game.

## Methodology
The goal will be achieved by the following steps:
1. Create a dataframe with all Kings of War units in a given army and their associated point cost. Include the following information:
    - all basic stats (melee, ranged, etc.)
    - non-numerical data:
        - unit size
        - special abilities
        - army name
        - army alignment (evil, neutral, good)
2. Break data up into linear regression format
    - Each numerical field will have its value
    - Each non-numerical field (exception of name) will be separated into groups and given a binary value based on if a unit has it
3. Separate database into two groups:
    - training data
    - testing data
4. Develop or implement (using scipy) multiple linear regression model on training data
5. Test the model using testing data to estimate accuracy
6. Evaluate performance and decide if further development is needed
7. Create function that returns cost for a given custom unit input

## Step 1: Create main DataFrame
Ideally, the information needed can be found online; however, the search has come up empty on publically available informaiton so the data will be created and stored in a .csv.  In order to reduce the scope, the Undead Armies will be the initial focus and will move on to other armies once complete.  The list of armies can be seen below. This list will be used to import .csvs into the main DataFrame

In [None]:
armies = ['Undead Armies']

In [None]:
# create a list of df from the army list above
# use pd.concat method to create final df per pandas documentation
import pandas as pd

units_list = []
for army in armies:
    units_list.append(pd.read_csv(army + '.csv', encoding="utf-8-sig"))

units = pd.concat(units_list)
units.head()

## Step 2: Equation Setup
This step will require many functions to separate the qualitative data into groups for analysis. As an example, the 'Army Allegiance' field will be divided into 3 groups: Evil, Neutral, and Good. Any given row of the dataframe will recieve a 1 value in the column that matches its allegiance and a 0 value in the other two columns.  This approach should work for each field and exceptions will be noted in the associated method. 

In [None]:
import re

class MLRTransform:
    """Transforms raw data into Multiple Linear Regression ready dataframe"""
    #ToDo: turn this class into a child of pd.DataFrame instead of standalone
    def __init__(self, df):
        self.raw_data = df
          
    def transform(self):
        method_man = [self.__army_name(), self.__army_allegiance(), self.__unit_name(),
                     self.__unit_type(), self.__unit_size(), self.__sp(),
                     self.__me(), self.__ra(), self.__de(),
                     self.__att(), self.__ne(), self.__special()]
        
        x = pd.concat(method_man, axis=1)
        y = self.__points()
        return x, y
    
    def __army_name(self):
        return self.__transform_column('Army Name')
    
    def __army_allegiance(self):
        return self.__transform_column(' Army Allegiance')
    
    def __unit_name(self):
        #include individuals and irregulars
        df = pd.DataFrame()
        df['Unique'] = ""
        df['Irregular'] = ""
        for row in self.raw_data[' Unit Name']:
            if row.endswith('[1]'):
                i = {'Unique': 1.0,
                    'Irregular': 0.0}
                df = df.append(i, ignore_index=True)
            elif row.endswith('*'):
                i = {'Unique': 0.0,
                    'Irregular': 1.0}
                df = df.append(i, ignore_index=True)
            else:
                i = {'Unique': 0.0,
                    'Irregular': 0.0}
                df = df.append(i, ignore_index=True)
        
        df.fillna(0, inplace=True)
        return df
    
    def __unit_type(self):
        return self.__transform_column(' Unit Type')
    
    def __unit_size(self):
        return self.__transform_column(' Unit Size')
    
    def __sp(self):
        return pd.to_numeric(self.raw_data[' Sp'], downcast='float')
    
    def __me(self):
        return pd.to_numeric(self.raw_data[' Me'], downcast='float')
    
    def __ra(self):
        df = pd.to_numeric(self.raw_data[' Ra'], downcast='float')
        values = [i for i in range(7, 1, -1)]
        
        for i, v in enumerate(df):
            if v == 0.0:
                pass
            else:
                df[i] = values[int(v)]
            
        return df
    
    def __de(self):
        return pd.to_numeric(self.raw_data[' De'], downcast='float')
        
    def __att(self):
        df = pd.to_numeric(self.raw_data[' Att'], downcast='float')
    
    def __ne(self):
        # iterate, divide at '/', and turn into waver and route columns
        # turn 0 values into new column; 1 if fearless, else 0
        df = pd.DataFrame()
        columns = ['Fearless', 'NeW', 'NeR']
        for col in columns:
            df[col] = ""
            
        for row in self.raw_data[' Ne']:
            new, ner = row.split('/')
            new = new[1::]
            new = float(new)
            ner = float(ner)
            if new == 0.0:
                i = {'Fearless': 1.0,
                    'NeW': 0.0,
                    'NeR': ner}
                df = df.append(i, ignore_index=True)
            else:
                i = {'Fearless': 0.0,
                    'NeW': new,
                    'NeR': ner}
                df = df.append(i, ignore_index=True)
                    
        df.fillna(0, inplace=True)
        return df
    
    def __points(self):
        return pd.to_numeric(self.raw_data[' Pts'], downcast='float')
    
    def __special(self):
        df = pd.DataFrame()
        unique_values = []
        for row in self.raw_data[' Special']:
            values = row.split(';')
            for value in values:
                if value in unique_values:
                    pass
                else:
                    unique_values.append(value)
                    
        new_unique_values = []            
        for v in unique_values:
            new_v = re.sub(r"\(.*\)","", v)
            if new_v in new_unique_values:
                pass
            else:
                new_unique_values.append(new_v)
            
        for value in new_unique_values:
            df[value] = ""
            
        for row in self.raw_data[' Special']:
            values = row.split(';')
            i = {}
            for value in values:
                if 'Base Size' in value:
                    pass
                rec = re.compile("\d")
                digit = rec.findall(value)
                new_v = re.sub(r"\(.*\)","", value)
                i[new_v] = 1.0
                reference_index = [i for i in range(7, 1, -1)]
                if digit:
                    scalar = ""
                    for d in digit:
                        scalar = scalar + d
                    if ' Regeneration' in value:
                        i[new_v] = reference_index[int(d)]
                    else:
                        i[new_v] *= float(scalar)
            df = df.append(i, ignore_index=True)                    
            
        df.fillna(0, inplace=True)
        return df
            
    def __transform_column(self, column_name):
        unique_values = self.raw_data[column_name].unique()
        df = pd.DataFrame()
        
        # loop should create a new column for each unique value
        for value in unique_values:
            df[value] = ""
            
        # loop should iterate over each row in raw_data[column_name] and create a row in df with a 1 in the column 
        # that matches its value
        for row in self.raw_data[column_name]:
            i = {row: 1.0}
            df = df.append(i, ignore_index=True)
            
        df.fillna(0, inplace=True)
                    
        return df
    
equation_df = MLRTransform(units)
x, y = equation_df.transform()
x.describe()