### Data Normalizer: 
Class to take in a pandas dataframe and normalize the data data in one of several possible ways. Generally the goal is to get all data in a column between 0 and 1 but there are several ways to do it
Examples:
 - normalize data with min/max for each column
 - use min max between several columns
 - use a non linear function to weight different values of the spectrum differently 
   for example: changes in values less then X are more significant than changes in values greator than X

In [3]:
import pandas as pd
import numpy as np

In [74]:
class dataNormalizer():
    
    def columnMax(self, df, maxVals = None):
        """Normalizes all columns to be a percentage of the max value in the column"""
        
        if maxVals == None:
            maxVals = map(lambda x: max(df[x]), df)
        
        # Loop through all the data
        newData = []
        for column, maxVal in zip(df.columns, maxVals):
            # Gets values for column
            values = list(df[column])
            new = list(map(lambda x: x/maxVal, values))
            newData.append(new)
        return pd.DataFrame(np.transpose(newData), columns=df.columns)
    
    def columnGlobalMax(self, df):
        globalMax = max(map(lambda x: max(df[x]), df))
        return self.columnMax(df, maxVals = [globalMax]*len(df))
    
    def invert_columnMax(self, df, maxVals):
        """ Function to return a dataframe back to it's origional values using 
        given max values as target max values. In this function maxvals are a 
        required parameter because there needs to be a target value.
        
        If given the output of columnMax using the same parameters, returns dataframe
        origionally given to columMax() (with some minor difference due to issues with
        floating point numbers.)"""
        
        # Loop through all the data
        newData = []
        for column, maxVal in zip(df.columns, maxVals):
            # Gets values for column
            values = list(df[column])
            new = list(map(lambda x: x*maxVal, values))
            newData.append(new)
        return pd.DataFrame(np.transpose(newData), columns=df.columns), maxValues
    
    def invert_globalMax(self, df, globalMax):
        return self.invert_columnMax(df, maxVals = [globalMax]*len(df))
        

### Test Data 
We will need some pandas DataFrames to do some testing with. In order to do this easily, I created a function randomDataframe(specs, n, [columns]) that will create datatables using a distribution of random values over a given range

In [70]:
def randomValues(specs, n):
    """specs is a list of tuples where each tuple cooresponds to the min and max of a column of data
    n is the number of rows to create"""
    colData = []
    for col in specs:
        colData.append(np.random.uniform(col[0],col[1], n))
        
    return np.transpose(colData)
randomValues([(0,1), (0,5)], 3)

def randomDataframe(specs, n, columns=None):
    """specs is a list of tuples where each tuple cooresponds to the min and max of a column of data
    n is the number of rows to create. Set columns=[] list of columns to use custom headers.
    Otherwise columns will assigned using the letters of the alphabet"""
    if columns == None:
        # Default columns to use is just the alphabet. need to make
        # this go indefinately like in Excel
        columns = ['A','B','C','D', 'E', 'F', 'G', 'H', 'I', 'J']
    return pd.DataFrame(randomValues(specs, n), columns=columns[0:len(specs)])

In [71]:
dn = dataNormalizer()
df = randomDataframe([(0,1), (0,100), (0,0.1)], 5)
df_converted = dn.columnMax(df, maxVals=[1,100,0.1])
df_reverted = dn.invert_columnMax(df_converted, maxVals=[1,100,0.1])

In [73]:
print df
print df_converted
print df_reverted

          A          B         C
0  0.090619  73.663785  0.014367
1  0.152615  68.408236  0.089407
2  0.615779  13.191674  0.005290
3  0.609411  64.907519  0.023705
4  0.476859   6.549739  0.065780
          A         B         C
0  0.090619  0.736638  0.143670
1  0.152615  0.684082  0.894073
2  0.615779  0.131917  0.052903
3  0.609411  0.649075  0.237051
4  0.476859  0.065497  0.657802
          A          B         C
0  0.090619  73.663785  0.014367
1  0.152615  68.408236  0.089407
2  0.615779  13.191674  0.005290
3  0.609411  64.907519  0.023705
4  0.476859   6.549739  0.065780


### DataConverterWrapper (TODO):
Should bascially be a class/function that wraps around a dataframe/dataNormalizer class that makes it easy to invert and revert dataframes