# Converting Timestamps to Minutes Per Hour

The following attempts to take non-uniform timestamps and do the following:
* resample those timestamps into minutes
* forward-fill the states
* resample the minutes into hours, summing the minutes

The function `.resample()` changed in pandas 0.18.1, so this was a learning process.

In [19]:
import numpy as np
import pandas as pd
print "This should be '0.20.1':"
print "pandas:         " + str(pd.__version__)
print "This should be '1.12.1':"
print "numpy:          " + str(np.__version__)

This should be '0.20.1':
pandas:         0.20.1
This should be '1.12.1':
numpy:          1.12.1


importing the data from the Library data sample.

In [22]:
useData = pd.read_csv(r'../data/170830_StateData.csv')

In [23]:
useData.head()

Unnamed: 0,computerName,datestamp,state
0,DMC0021,2017-08-31 00:00:46.393,offline
1,DMC0014,2017-08-31 00:00:46.450,offline
2,DMC0009,2017-08-31 00:00:46.480,offline
3,CRR028,2017-08-31 00:00:46.507,offline
4,CRR017,2017-08-31 00:00:46.510,offline


In [28]:
useData[pd.to_datetime(useData.datestamp).isnull()]

Unnamed: 0,computerName,datestamp,state
106,TC701\t2017-08-31 00:21:17.583\tin-use,,
107,TC5001\t2017-08-31 00:21:32.160\tin-use,,
119,TC701\t2017-08-31 02:02:57.047\tin-use,,
121,INC013\t2017-08-31 02:19:04.093\tin-use,,
124,JLPL0059\t2017-08-31 03:45:57.133\tin-use,,
125,INC004\t2017-08-31 03:57:31.240\tin-use,,
129,TC20001\t2017-08-31 05:04:03.520\tin-use,,
130,INC005\t2017-08-31 05:10:20.537\tin-use,,
132,TC211\t2017-08-31 05:57:32.640\tin-use,,
244,RRK001\t2017-08-31 06:30:33.860\tin-use,,


Verifying the status of the columns. 'datastamp' should be a datetime64 field.

In [17]:
useData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3873 entries, 0 to 3872
Data columns (total 4 columns):
machineName    3873 non-null object
location       3873 non-null object
state          3873 non-null object
datestamp      3873 non-null object
dtypes: object(4)
memory usage: 121.1+ KB


Testing functionality with a single computer. Working with iterating across the numpy array later.

In [4]:
computerDataName = 'CRR005'

In [5]:
computerTimeArray = useData[useData.machineName == computerDataName]

In [6]:
computerTimeArray

Unnamed: 0,machineName,location,state,datestamp
0,CRR005,Current Periodicals,in-use,2017-08-31 09:09:00
4,CRR005,Current Periodicals,restarted,2017-08-31 09:07:00
6,CRR005,Current Periodicals,offline,2017-08-31 09:07:00
7,CRR005,Current Periodicals,available,2017-08-31 09:07:00
64,CRR005,Current Periodicals,in-use,2017-08-31 08:45:00
248,CRR005,Current Periodicals,restarted,2017-08-31 06:30:00
461,CRR005,Current Periodicals,available,2017-08-30 22:39:00
655,CRR005,Current Periodicals,in-use,2017-08-30 20:18:00
687,CRR005,Current Periodicals,available,2017-08-30 19:57:00
784,CRR005,Current Periodicals,in-use,2017-08-30 19:17:00


This turns the 'in-use' value into true, and all of the others into false. Since this analysis is based upon when a machine is not being used (as opposed to when it is offline/available/restarted) all other states are irrelevant.

Will need to investigate copy/view on this error. May need to do this as a dataframe with `.concat()`.

In [7]:
computerTimeArray.loc[:,'state'] = pd.Series(computerTimeArray.state == 'in-use')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Location data is irrelevant for a machine at this point. Also, location can be derived from machine name, machine location is (at this point) not that precise.

Pandas was giving a duplicate error due to the three non- 'in-use' states occurring simultaneously. Dropping unused columns for data duplication (machinename and location), and dropping timestamp duplications.

In [8]:
computerTimeArray = computerTimeArray.loc[:,'state':'datestamp'].drop_duplicates()

In [9]:
computerTimeArray = computerTimeArray.set_index('datestamp').sort_index()

In [10]:
computerTimeArray

Unnamed: 0_level_0,state
datestamp,Unnamed: 1_level_1
2017-08-30 06:30:00,False
2017-08-30 09:23:00,True
2017-08-30 09:43:00,False
2017-08-30 10:03:00,True
2017-08-30 11:18:00,False
2017-08-30 11:23:00,False
2017-08-30 11:25:00,True
2017-08-30 11:44:00,False
2017-08-30 11:47:00,True
2017-08-30 13:35:00,False


This takes the above data and resamples it into minute increments. Value for the specific minute is put into place, while 'NaN' values will take on the previous non-NaN data.

In [11]:
computerTimeArrayMin = computerTimeArray.resample('T').ffill()

In [12]:
computerTimeArrayPerHour = computerTimeArrayMin.resample('H').sum()

In [13]:
computerTimeArrayPerHour

Unnamed: 0_level_0,state
datestamp,Unnamed: 1_level_1
2017-08-30 06:00:00,0.0
2017-08-30 07:00:00,0.0
2017-08-30 08:00:00,0.0
2017-08-30 09:00:00,20.0
2017-08-30 10:00:00,57.0
2017-08-30 11:00:00,50.0
2017-08-30 12:00:00,60.0
2017-08-30 13:00:00,46.0
2017-08-30 14:00:00,60.0
2017-08-30 15:00:00,47.0


In [14]:
computerNames = useData.iloc[:,0].unique()

In [15]:
for i in computerNames.tolist():
    print i

CRR005
INC001
CRR006
RIS01
TC209
JLPL0028
INC004
INC015
RIS20
MLC0006
CRR003
CRR004
INC007
MLC0005
INC003
CRR031
RIS003
CRR024
INC006
MLC0002
INC013
DMC0022
CRR009
RIS010
RIS24
RIS016
CRR012
INC009
TC206
BL001
DMC0006
DMC0010
INC010
MLC0015
MLC0013
MLC0014
TC701
MLC0001
RIS027
CRR017
RIS23
MLC0003
INC005
RIS004
CRR019
CRR015
RIS18
TC30001
RIS09
MLC0004
INC008
CRR002
CRR011
CRR029
JLPL0008
RIS006
DMC0002
DMC026
JLPL041
TC212
MLC0007
TC211
CRR039
INC014
DMC0014
TC20001
INC012
DMC0032
TL30002
RRK001
DMC0009
DMC027
RIS035
CRR013
DMC025
DMC029
TL702
TL901
CRR028
RIS029
RIS030
MLC0009
CRR014
RIS02
CRR016
DMC0011
TC8001
CRR010
CRR037
TL6002
DMC030
CRR018
DMC0034
TL5002
CRR020
DMC0001
DMC0018
TC901
CRR021
DMC0017
CITI018
INC011
RIS14
DMC0013
MLC0011
DMC0020
DMC0012
RIS26
RIS21
DMC0033
DMC0015
DMC0019
DMC0016
MLC0008
TC205
TC208
CRR001
CRR027
CRR034
MLC0016
RIS13
RIS12
DMC0023
CRR032
RIS15
CRR023
MLC0010
MLC0012
DMC0021
MLC0017
DMC0031
DMC0007
DMC0024
DMC0008
DMC0005
DDL0001
TC204
MLC0018
CRR03