## Berechnungen

In diesem Notebook ist es möglich den Mittelwert, Varianz und Standardabeichung für Messwerte zu berechnen.   
Zudem wird die Gieränderung in rad bestimmt. Die Ergebnisse können zurück in die Merkmal CSV Datei geschrieben werden.

In [1]:
import pandas
import numpy as np
import numpy.testing as npt
import math
import os

In [2]:
featuresDf = pandas.read_csv("merkmaleRoh.csv")
featuresDf.describe()

Unnamed: 0,Zeitstempel,Breitengrad,Laengengrad,Geschwindigkeit,Messwerte,StartBewegungsD,StartBelichtung,Belichtungszeit
count,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0
mean,1520521000000.0,53.621404,10.175575,15.419228,20.0,3972071000000.0,3972161000000.0,19314400.0
std,1226303.0,0.006939,0.018062,3.447197,0.0,1226303000000.0,1226303000000.0,8842717.0
min,1520519000000.0,53.610477,10.137807,5.004,20.0,1618455000000.0,1618559000000.0,4127597.0
25%,1520520000000.0,53.61411,10.161995,13.608,20.0,2973751000000.0,2973839000000.0,10001840.0
50%,1520521000000.0,53.62176,10.177186,15.804,20.0,3963729000000.0,3963809000000.0,20003680.0
75%,1520523000000.0,53.62762,10.190325,17.712,20.0,4998830000000.0,4998939000000.0,29996320.0
max,1520523000000.0,53.635113,10.203202,24.984,20.0,5977850000000.0,5977940000000.0,29996320.0


In [3]:
featuresDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12149 entries, 0 to 12148
Data columns (total 15 columns):
Zeitstempel          12149 non-null int64
Breitengrad          12149 non-null float64
Laengengrad          12149 non-null float64
Geschwindigkeit      12149 non-null float64
AccelerometerX       12149 non-null object
AccelerometerY       12149 non-null object
AccelerometerZ       12149 non-null object
Azimuth              12149 non-null object
Nick                 12149 non-null object
Roll                 12149 non-null object
SensorZeitstempel    12149 non-null object
Messwerte            12149 non-null int64
StartBewegungsD      12149 non-null int64
StartBelichtung      12149 non-null int64
Belichtungszeit      12149 non-null int64
dtypes: float64(3), int64(5), object(7)
memory usage: 1.4+ MB


In [4]:
# Zeigt die ersten 5 Reihen Beschleunigungssensordaten der X-Achse 
featuresDf.AccelerometerX.head()

0    1.21672 0.42177 -0.3521 -0.86394 -1.08952 0.06...
1    0.54781 0.31568 0.06868 -0.03698 -0.382 -0.933...
2    -0.19796 -0.23496 -0.24925 -0.29133 -0.27901 -...
3    0.14269 0.03755 0.13732 0.33972 0.40969 0.0672...
4    0.30531 0.52007 -0.02829 -0.0073 0.33128 0.081...
Name: AccelerometerX, dtype: object

Die folgenden Spalten des Datenframes haben als Type ein Python object.   
AccelerometerX       non-null object   
AccelerometerY       non-null object   
AccelerometerZ       non-null object   
Azimuth              non-null object   
Nick                 non-null object   
Roll                 non-null object   
Implizit haben diese Spalten als Typ Strings. Aber in einem DataFrame werden Strings als    
Python objects erkannt. Um den Datentyp auf float zu casten wird aus den Spalten zunächst ein Liste erstellt.    
Welche dann wiederum als Numpy Array auf den Typ float konvertiert wird.   

In [5]:
accXList = featuresDf.AccelerometerX.str.split(" ").tolist()

In [6]:
# Direkt mit dem Numpy Array ist es nicht möglich die Konvertierung durchzuführen, weil die Arrays von Typ object sind.   
pandas.DataFrame(featuresDf.AccelerometerX.str.split(" ")).values[1]

array([ ['0.54781', '0.31568', '0.06868', '-0.03698', '-0.382', '-0.93382', '-0.62446', '0.02143', '0.81395', '1.06489', '0.57611', '0.20042', '-0.13079', '-0.42639', '-0.69353', '-0.03384', '-0.3948', '-0.11662', '0.05994', '-0.04397']], dtype=object)

In [7]:
accXList[1] # zeige Liste

['0.54781',
 '0.31568',
 '0.06868',
 '-0.03698',
 '-0.382',
 '-0.93382',
 '-0.62446',
 '0.02143',
 '0.81395',
 '1.06489',
 '0.57611',
 '0.20042',
 '-0.13079',
 '-0.42639',
 '-0.69353',
 '-0.03384',
 '-0.3948',
 '-0.11662',
 '0.05994',
 '-0.04397']

In [8]:
np.array(accXList).dtype

dtype('<U9')

In [9]:
np.array(accXList).astype(float).dtype 

dtype('float64')

In [10]:
accXNp = np.array(accXList).astype(float) 
accXDf = pandas.DataFrame(accXNp)
accXDf.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
count,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0,12149.0
mean,0.002799,-0.001824,0.001869,0.016475,-0.020737,-0.02052,-0.006363,0.00048,0.002782,-0.00477,0.004328,-0.00261,-0.019852,0.004108,0.013693,0.008763,0.016994,0.024725,0.007168,0.002134
std,1.485179,1.451201,1.517089,1.508852,1.478076,1.451643,1.503341,1.528072,1.476431,1.491179,1.459933,1.476301,1.472261,1.446254,1.467466,1.521096,1.478443,1.479666,1.473852,1.490268
min,-27.81447,-17.42361,-22.45623,-16.56143,-13.17869,-14.74343,-23.78581,-26.31241,-17.66358,-15.94054,-18.89619,-15.75701,-15.33183,-11.3011,-18.73568,-18.14484,-23.82762,-12.6449,-19.59751,-20.89236
25%,-0.69352,-0.68723,-0.69577,-0.68404,-0.70575,-0.70006,-0.71028,-0.70436,-0.67863,-0.69166,-0.68625,-0.68487,-0.69241,-0.67921,-0.67304,-0.68956,-0.67816,-0.6615,-0.68674,-0.68631
50%,0.00486,-0.00418,-0.01494,0.01055,-0.01725,-0.02005,-0.01397,-0.01401,0.01262,-0.00069,-0.01158,0.00136,-0.016,0.00409,0.01544,0.01337,0.00788,0.02195,0.00188,-0.00737
75%,0.69577,0.6935,0.68879,0.68572,0.65967,0.6755,0.64948,0.68254,0.69108,0.69379,0.69251,0.68105,0.66991,0.6825,0.69667,0.69959,0.70627,0.69667,0.68808,0.6899
max,12.88807,10.65815,20.14684,24.01247,20.18391,14.12983,16.71888,19.33571,17.76701,18.46634,17.11487,15.27432,16.43951,14.61263,13.33155,16.5806,16.48229,18.1353,22.35358,13.90986


In [11]:
accXDf.shape

(12149, 20)

In [12]:
# Die Anzahl der Messerte muss immer gleich sein, ansonsten werden fehlende Messwerte
# mit NaN gefüllt.
featuresDf.Messwerte[featuresDf.Messwerte != 20]

Series([], Name: Messwerte, dtype: int64)

In [13]:
# Konvertierung von Object zu Float Werten in einer Zeile 
accYDf = pandas.DataFrame(np.array(featuresDf.AccelerometerY.str.split(" ").tolist()).astype(float))

In [14]:
accYDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12149 entries, 0 to 12148
Data columns (total 20 columns):
0     12149 non-null float64
1     12149 non-null float64
2     12149 non-null float64
3     12149 non-null float64
4     12149 non-null float64
5     12149 non-null float64
6     12149 non-null float64
7     12149 non-null float64
8     12149 non-null float64
9     12149 non-null float64
10    12149 non-null float64
11    12149 non-null float64
12    12149 non-null float64
13    12149 non-null float64
14    12149 non-null float64
15    12149 non-null float64
16    12149 non-null float64
17    12149 non-null float64
18    12149 non-null float64
19    12149 non-null float64
dtypes: float64(20)
memory usage: 1.9 MB


In [15]:
accYDf.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,1.06042,0.64921,0.53476,0.61175,0.19835,-0.36223,-0.56553,-0.74348,-0.42617,0.39463,0.49966,0.62964,0.88686,0.7402,0.37772,0.02643,-0.10137,-0.34151,-0.5796,0.10334
1,-0.37129,-0.86392,-0.73702,-0.77342,0.22409,4.22459,3.76281,0.43607,-0.90755,-2.02841,-1.95977,-0.55643,1.50094,2.01293,1.64107,-0.43389,-2.50756,-2.20519,-1.62616,-1.92911


In [16]:
accZDf = pandas.DataFrame(np.array(featuresDf.AccelerometerZ.str.split(" ").tolist()).astype(float))

In [17]:
nickDf = pandas.DataFrame(np.array(featuresDf.Nick.str.split(" ").tolist()).astype(float))

In [18]:
rollDf = pandas.DataFrame(np.array(featuresDf.Roll.str.split(" ").tolist()).astype(float))

In [19]:
azimuthDf = pandas.DataFrame(np.array(featuresDf.Azimuth.str.split(" ").tolist()).astype(float))

In [20]:
accXMean = accXDf.T.mean()
accXMean.head(2)

0    0.150673
1   -0.007415
dtype: float64

In [21]:
# Neue Spalten werden erstellt und Mittelwerte zugewiesen
featuresDf['MittelX'] = accXMean
featuresDf['MittelY'] = accYDf.T.mean()
featuresDf['MittelZ'] = accZDf.T.mean()
featuresDf['MittelNick'] = nickDf.T.mean()
featuresDf['MittelRoll'] = rollDf.T.mean()

In [22]:
featuresDf.MittelX.head(2)

0    0.150673
1   -0.007415
Name: MittelX, dtype: float64

In [23]:
featuresDf.columns

Index(['Zeitstempel', 'Breitengrad', 'Laengengrad', 'Geschwindigkeit',
       'AccelerometerX', 'AccelerometerY', 'AccelerometerZ', 'Azimuth', 'Nick',
       'Roll', 'SensorZeitstempel', 'Messwerte', 'StartBewegungsD',
       'StartBelichtung', 'Belichtungszeit', 'MittelX', 'MittelY', 'MittelZ',
       'MittelNick', 'MittelRoll'],
      dtype='object')

Berechnet die Varianz. Dieser Funktion muss als Paramter der Mittelwert (mean) und der DataFrame mit   
den Float Werten übergeben werden, um die Varianz zu berechnen. Als Varianz wird der Durchschnitt der quadrierten  
Differenzen zum Mittelwert bezeichnet.   
Prec.:    
Postc.: Gibt die berechnete Variance als Float zurück oder 0 wenn die Anzahl der Werte <= 0  

In [24]:
def calcVariance(meansDf, dfValues):
    variance = []
    for i, it in dfValues.iterrows():
        sum = 0
        for value in it:
            tempDifference = value-meansDf[i]
            sum += tempDifference * tempDifference
        variance.append("{0:.5f}".format(round(sum / it.count(),5)))
    return np.array(variance).astype(float)

Hier wird die Standardabweichung der Varianz berechnet. Dies ist die Wurzel der Varianz.   
Für die Berechnung wird der Absolutwert von der Varianz genommen.    
Prec.:
Postc.:  Standardabweichung wird zurückgegeben.

In [25]:
def calculateStandardDeviation(varianceDf):
    deviation = []
    for v in varianceDf:
        temp = math.sqrt(np.abs(v))
        deviation.append("{0:.5f}".format(round(temp,5)))
    return np.array(deviation).astype(float)

Die Funktion berechnet die Winkeldifferenz in rad zwischen den ersten gemessenen Gierwinkel und den letzten Gierwinkel   
innerhalb einer als Argument übergebenen DataFrames. Der zurückgegebene Radiant    
ist immer positiv und gibt die relative änderung in rad an.      
Prec.:   
Postc.: Radiant berechnet   

In [26]:
def calculateAngelChangeAzimuth(azimuthDf):
    result = []
    #counter = 0
    for i,values in azimuthDf.iterrows():
        #for rad in values:
            #azimuthL.append(rad * (180 / math.pi))
            #counter = counter + 1
            #print(counter)
        first = values[0]*(180/math.pi)
        last = values.iloc[-1]*(180/math.pi)
        resultTemp = np.abs(first - last)
        if(resultTemp > 180):
            resultTemp = 360 - resultTemp
        result.append("{0:.5f}".format(round(resultTemp * ( math.pi / 180),5)))
        #counter = 0
    return np.array(result).astype(float)

In [27]:
featuresDf['AzimuthAenderung'] = calculateAngelChangeAzimuth(azimuthDf)

In [28]:
# Berechne Varianz
featuresDf['VarianzX'] = calcVariance(featuresDf.MittelX,accXDf)
featuresDf['VarianzY'] = calcVariance(featuresDf.MittelY,accYDf)
featuresDf['VarianzZ'] = calcVariance(featuresDf.MittelZ,accZDf)
featuresDf['VarianzNick'] = calcVariance(featuresDf.MittelNick,nickDf)
featuresDf['VarianzRoll'] = calcVariance(featuresDf.MittelRoll,rollDf)

In [29]:
featuresDf.VarianzX.head()

0    0.40178
1    0.24189
2    0.04570
3    0.08168
4    0.12548
Name: VarianzX, dtype: float64

In [30]:
# Berechne Standardabweichung
featuresDf['AbweichungX'] = calculateStandardDeviation(featuresDf.VarianzX)
featuresDf['AbweichungY'] = calculateStandardDeviation(featuresDf.VarianzY)
featuresDf['AbweichungZ'] = calculateStandardDeviation(featuresDf.VarianzZ)
featuresDf['AbweichungNick'] = calculateStandardDeviation(featuresDf.VarianzNick)
featuresDf['AbweichungRoll'] = calculateStandardDeviation(featuresDf.VarianzRoll)

In [31]:
featuresDf.AbweichungX.head()

0    0.63386
1    0.49182
2    0.21378
3    0.28580
4    0.35423
Name: AbweichungX, dtype: float64

In [32]:
accXDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12149 entries, 0 to 12148
Data columns (total 20 columns):
0     12149 non-null float64
1     12149 non-null float64
2     12149 non-null float64
3     12149 non-null float64
4     12149 non-null float64
5     12149 non-null float64
6     12149 non-null float64
7     12149 non-null float64
8     12149 non-null float64
9     12149 non-null float64
10    12149 non-null float64
11    12149 non-null float64
12    12149 non-null float64
13    12149 non-null float64
14    12149 non-null float64
15    12149 non-null float64
16    12149 non-null float64
17    12149 non-null float64
18    12149 non-null float64
19    12149 non-null float64
dtypes: float64(20)
memory usage: 1.9 MB


In [33]:
accXDf.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,1.21672,0.42177,-0.3521,-0.86394,-1.08952,0.06309,0.84727,0.923,0.95294,0.90027,0.2299,-0.33705,-0.33092,-0.46391,-0.55499,-0.09155,0.2179,0.2816,0.5624,0.48058
1,0.54781,0.31568,0.06868,-0.03698,-0.382,-0.93382,-0.62446,0.02143,0.81395,1.06489,0.57611,0.20042,-0.13079,-0.42639,-0.69353,-0.03384,-0.3948,-0.11662,0.05994,-0.04397
2,-0.19796,-0.23496,-0.24925,-0.29133,-0.27901,-0.14658,-0.05596,-0.01411,-0.01127,0.05229,0.37895,0.24189,0.1782,0.1579,0.01907,-0.26054,-0.03987,0.07538,-0.3534,0.36086
3,0.14269,0.03755,0.13732,0.33972,0.40969,0.06728,-0.22198,-0.26951,-0.3535,0.25352,0.11089,-0.40161,-0.09143,-0.33361,-0.03703,0.33814,-0.58755,0.41869,0.25836,-0.00783
4,0.30531,0.52007,-0.02829,-0.0073,0.33128,0.08117,0.17221,0.13778,-0.65591,-0.26423,-0.15008,-0.48779,-0.09908,-0.6462,-0.2258,-0.48709,-0.46628,-0.7714,-0.4792,-0.26076


In [34]:
featuresDf.MittelX.head()

0    0.150673
1   -0.007415
2   -0.033485
3    0.010490
4   -0.174079
Name: MittelX, dtype: float64

In [35]:
# Änderungen in CSV Datei schreiben
featuresDf.set_index('Zeitstempel', inplace=True)
featuresDf.to_csv('../merkmale.csv')

In [39]:
pandas.read_csv("../merkmale.csv").columns

Index(['Zeitstempel', 'Breitengrad', 'Laengengrad', 'Geschwindigkeit',
       'AccelerometerX', 'AccelerometerY', 'AccelerometerZ', 'Azimuth', 'Nick',
       'Roll', 'SensorZeitstempel', 'Messwerte', 'StartBewegungsD',
       'StartBelichtung', 'Belichtungszeit', 'MittelX', 'MittelY', 'MittelZ',
       'MittelNick', 'MittelRoll', 'AzimuthAenderung', 'VarianzX', 'VarianzY',
       'VarianzZ', 'VarianzNick', 'VarianzRoll', 'AbweichungX', 'AbweichungY',
       'AbweichungZ', 'AbweichungNick', 'AbweichungRoll'],
      dtype='object')

In [38]:
# Unittests
import unittest

class CalcTest(unittest.TestCase):
    
    
    # Testet die Funktion calcVariance mit einem DataFrame mit den Werten [2.0, 2.0]
    # Der Mittelwert sollte 2.0 sein
    # Das erwartete Ergebnis ist 0.0
    def testCalcVariance2(self):
        meanL = [2.0]
        meanDf = pandas.DataFrame(meanL)
        valuesA = [2.0,2.0]
        valuesDf = pandas.DataFrame([[2.0]])
        npt.assert_almost_equal(calcVariance(meanDf[0],valuesDf), 0.0,2)
        
    # Testet die Funktion calcVariance mit einem DataFrame mit den folgenden Werten:
    # [1.24, 2.5213, 10.434, 42.45, 5.9]
    # Der Mittelwert ist 12.508
    # Die erwartete Varianz ist 234,247016
    def testCalcVariance5(self):
        meanDf = pandas.DataFrame([12.508])
        valuesDf = pandas.DataFrame([[1.24, 2.52, 10.43, 42.45, 5.9]])
        npt.assert_almost_equal(calcVariance(meanDf[0],valuesDf), 234.24702,5)   
    
    # Testet calcVariance mit 0 Datenwerten   
    def testCalcVariance0(self):
        meanDf = pandas.DataFrame([0.0])
        valuesDf = pandas.DataFrame([[0.0]])
        npt.assert_almost_equal(calcVariance(meanDf[0],valuesDf),0.0,2)  
        
    # Testet calcVariance mit negativen Datenwerten   
    def testCalcVarianceNegative(self):
        meanDf = pandas.DataFrame([-24.0])
        valuesDf = pandas.DataFrame([[-2.0,-24.0,-5.0,7.0]])
        npt.assert_almost_equal(calcVariance(meanDf[0],valuesDf),451.5,1) 
        
    # Testet calcVariance mit Muultidimensionalen DataFrame   
    def testCalcVarianceMultidim(self):
        meanDf = pandas.DataFrame([-24.0,0.0])
        valuesDf = pandas.DataFrame([[-2.0,-24.0,-5.0,7.0],[0.0,0.0,0.0,0.0]])
        # Hier wird Numpy Test verwendet um das Ergebnis (zwei Arrays) zu vergleichen
        npt.assert_array_equal(calcVariance(meanDf[0],valuesDf),np.array([ 451.5,0.0])) 
        
    # Dieser Test testet die Funktion calculateStandardDeviation.
    def testCalculateStandardDeviation(self):
        varianceDf = pandas.DataFrame([451.5])
        npt.assert_almost_equal(calculateStandardDeviation(varianceDf[0]),21.24853,5) 
        
    # Testet die Funktion calculateStandardDeviation.
    # Dabei hat das Argument den Wert 0.0.
    def testCalculateStandardDeviationZero(self):
        varianceDf = pandas.DataFrame([0.0])
        npt.assert_almost_equal(calculateStandardDeviation(varianceDf[0]),0.0,2)
    
    #  Testet die Funktion calculateAngelChangeAzimuth mit zwei gleichen rad Werten
    def testCalculateAngelChangeAzimuthEqual(self):
        radiants = pandas.DataFrame([[1.0,1.0]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants),0.0,2)
    
    # Testet ob die Winkeldifferenz korrekt berechnet wurde für 
    # die Radianten -1.0 und 1.0
    def testCalculateAngelChangeAzimuthOne(self):
        radiants = pandas.DataFrame([[-1.0,1.0]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants),2.0,2)  
        
    # Testet ob die Winkeldifferenz korrekt berechnet wurde für 
    # die Radianten 1.0 und -1.0
    def testCalculateAngelChangeAzimuthOne2(self):
        radiants = pandas.DataFrame([[1.0,-1.0]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants),2.0,2)  

    # Testet die Funktion calculateAngelChangeAzimuthValues mit mehreren Werten
    def testCalculateAngelChangeAzimuthValues(self):
        radiants = pandas.DataFrame([[-1.57,1.0,2.0,1.3,1.57]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants),3.14,2)
        
    # Testet ob die Winkeldifferenz korrekt berechnet wurde für 
    # die Radianten 0.05 und 0.06
    def testCalculateAngelChangeAzimuthSmall(self):
        radiants = pandas.DataFrame([[0.01,0.02]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants), 0.01,2)  
        
    # Testet ob die Winkeldifferenz korrekt berechnet wurde mit 
    # negative Radianten 
    def testCalculateAngelChangeAzimuthNegativ(self):
        radiants = pandas.DataFrame([[-2.9,-2.0,-1.28]])
        npt.assert_almost_equal(calculateAngelChangeAzimuth(radiants), 1.62,2) 

unittest.main(argv=[''], verbosity=2, exit=False)

testCalcVariance0 (__main__.CalcTest) ... ok
testCalcVariance2 (__main__.CalcTest) ... ok
testCalcVariance5 (__main__.CalcTest) ... ok
testCalcVarianceMultidim (__main__.CalcTest) ... ok
testCalcVarianceNegative (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthEqual (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthNegativ (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthOne (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthOne2 (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthSmall (__main__.CalcTest) ... ok
testCalculateAngelChangeAzimuthValues (__main__.CalcTest) ... ok
testCalculateStandardDeviation (__main__.CalcTest) ... ok
testCalculateStandardDeviationZero (__main__.CalcTest) ... ok

----------------------------------------------------------------------
Ran 13 tests in 0.090s

OK


<unittest.main.TestProgram at 0x7fb0e8b0e9e8>