<a href="https://colab.research.google.com/github/EvanWAppel/work-examples/blob/main/LogParser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To get the slices out of Cloudwatch, run the following code on the OnTimeProd-Events logs:

```
fields @timestamp, `detail-type`, detail.associate, detail.duration, detail.project, `detail.slice-type`, detail.end
| filter `detail-type` = 'slice' and detail.project != 'general'
| sort @timestamp desc
| limit 10000
```
Remember to set the limit to be greater than the anticipated number of records or else CloudWatch will truncate the list at 1000


In order for this to work, the uploaded excel document needs to have a column in it called "trunc_date" and is the timestamp processed into a simple date. The formula that can be used to produce this is =LEFT(A2,10).

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
  file_name = fn
print("This file's name is "+fn)

Saving logs-insights-results (3).xlsx to logs-insights-results (3) (1).xlsx
User uploaded file "logs-insights-results (3).xlsx" with length 500533 bytes
This file's name is logs-insights-results (3).xlsx


In [None]:
test = pd.read_excel('./' + fn)
print(test.head())
test['new_date'] = pd.to_datetime(test['trunc_date']).dt.date
print(test.new_date)

           @timestamp  ...          trunc_date
0 2021-08-21 06:56:39  ... 2021-08-21 06:56:39
1 2021-08-21 05:41:25  ... 2021-08-21 05:41:25
2 2021-08-21 04:14:16  ... 2021-08-21 04:14:16
3 2021-08-21 04:01:59  ... 2021-08-21 04:01:59
4 2021-08-21 04:01:25  ... 2021-08-21 04:01:25

[5 rows x 8 columns]
0       2021-08-21
1       2021-08-21
2       2021-08-21
3       2021-08-21
4       2021-08-21
           ...    
7205    2021-08-14
7206    2021-08-14
7207    2021-08-14
7208    2021-08-14
7209    2021-08-14
Name: new_date, Length: 7210, dtype: object


Pandas provides a resource online to help convert SQL ideas to Pandas:
https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#select

I ended up using the ROW_NUMBER() window function equivalent.

In [None]:
import pandas as pd
# importing the excel file. 
logs = pd.read_excel('./' + fn)
# truncates the timestamp to trunc_date
logs['trunc_date'] = pd.to_datetime(logs['@timestamp']).dt.date
print("Original Excel File")
print(logs.head())
# This is effectively a ROW_NUMBER() function. 
# It returns the most recent row by associate and day
new_logs = (
    logs.assign(
        rn=logs.sort_values(["@timestamp"], ascending=False)
        .groupby(["detail.associate","trunc_date"])
        .cumcount()
        + 1
    )
    .query("rn < 2")
    .sort_values(["detail.associate","trunc_date"])
)
print("Processed table")
print(logs.head())
# Now what we're left with is the last non-general slice per person per day
final = new_logs[["detail.associate","trunc_date","detail.project"]]
print("Print Preview")
print(final.head())
# Download
final.to_csv('result.csv')
files.download('result.csv')

Original Excel File
           @timestamp detail-type  ...                   detail.end  trunc_date
0 2021-08-21 06:56:39       slice  ...  2021-08-21T06:47:27.143252Z  2021-08-21
1 2021-08-21 05:41:25       slice  ...  2021-08-21T05:30:11.966613Z  2021-08-21
2 2021-08-21 04:14:16       slice  ...  2021-08-21T04:14:04.539354Z  2021-08-21
3 2021-08-21 04:01:59       slice  ...  2021-08-21T04:01:58.481551Z  2021-08-21
4 2021-08-21 04:01:25       slice  ...  2021-08-21T04:01:24.439082Z  2021-08-21

[5 rows x 8 columns]
Processed table
           @timestamp detail-type  ...                   detail.end  trunc_date
0 2021-08-21 06:56:39       slice  ...  2021-08-21T06:47:27.143252Z  2021-08-21
1 2021-08-21 05:41:25       slice  ...  2021-08-21T05:30:11.966613Z  2021-08-21
2 2021-08-21 04:14:16       slice  ...  2021-08-21T04:14:04.539354Z  2021-08-21
3 2021-08-21 04:01:59       slice  ...  2021-08-21T04:01:58.481551Z  2021-08-21
4 2021-08-21 04:01:25       slice  ...  2021-08-21T04:01:24.43

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>