# 统计GRAPES业务模式运行时间

## GRAPES MESO 3km

**准备数据**

统计运行时间需要从ecFlow日志读取日志条目，为了加快运行速度，建议使用`grep`等命令将与节点相关的日志条目提取到单独的日志中。

例如下面的语句提取GRAPES MESO 3km模式所有时次模式积分模块的日志条目。

```bash
grep -P '/grapes_meso_3km_v4_4/(00|06|12|18)/model/fcst(?!_)' \
    /g1/u/nwp_qu/ecfworks/ecflow/login_b01.31067.ecf.log \
    > fcst.txt
```

提取后的文件类似

```
LOG:[16:37:29 13.2.2020]  active: /grapes_meso_3km_v4_4/12/model/fcst
MSG:[17:44:33 13.2.2020] chd:complete /grapes_meso_3km_v4_4/12/model/fcst
LOG:[17:44:33 13.2.2020]  complete: /grapes_meso_3km_v4_4/12/model/fcst
LOG:[22:36:46 13.2.2020]  submitted: /grapes_meso_3km_v4_4/18/model/fcst job_size:5267
MSG:[22:36:47 13.2.2020] chd:init /grapes_meso_3km_v4_4/18/model/fcst
LOG:[22:36:47 13.2.2020]  active: /grapes_meso_3km_v4_4/18/model/fcst
MSG:[23:34:17 13.2.2020] chd:complete /grapes_meso_3km_v4_4/18/model/fcst
LOG:[23:34:17 13.2.2020]  complete: /grapes_meso_3km_v4_4/18/model/fcst
LOG:[23:36:15 13.2.2020]  queued: /grapes_meso_3km_v4_4/00/model/fcst
LOG:[23:36:15 13.2.2020]  queued: /grapes_meso_3km_v4_4/06/model/fcst
LOG:[23:36:15 13.2.2020]  queued: /grapes_meso_3km_v4_4/12/model/fcst
```

载入本示例中需要的所有模块，包括

- 数据分析工具`pandas`和用于计算切尾均值的`scipy`库
- 抽象日志分析模型`nwpc_workflow_log_model`
- ecFlow日志读取和解析工具`nwpc_workflow_log_collector`
- 节点运行状态计算器`SituationCalculator`
- 节点表格数据处理器`NodeTableProcessor`
- 时间段展示器`TimePeriodPresenter`

In [1]:
import datetime

import pandas as pd
from scipy import stats

from nwpc_workflow_log_model.analytics import (
    TaskStatusChangeDFA,
    TaskSituationType,
)

from nwpc_workflow_log_collector.ecflow.log_file_util import get_record_list

from nwpc_workflow_log_tool.situation import SituationCalculator
from nwpc_workflow_log_tool.presenter import TimePeriodPresenter
from nwpc_workflow_log_tool.processor import NodeTableProcessor

设置日志文件路径、节点路径和起止时间

In [2]:
file_path = "../dist/log/grapes_meso_3km/fcst.txt"
node_path = "/grapes_meso_3km_v4_4/00/model/fcst"
stop_date = datetime.datetime(2020, 3, 24)
start_date = stop_date - datetime.timedelta(days=30)

使用`get_record_list`函数从日志文件中加载日志条目

In [3]:
records = get_record_list(file_path, node_path, start_date, stop_date)

2020-03-25 07:54:07.441 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_record_list:161 - Finding line range in date range: 2020-02-23 00:00:00, 2020-03-24 00:00:00
2020-03-25 07:54:07.557 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:54 - counting total line number...
2020-03-25 07:54:07.558 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:56 - got total line number: 2126
2020-03-25 07:54:07.559 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:63 - finding begin line number for start_date 2020-02-23
0% [############################  ] 100% | ETA: 00:00:002020-03-25 07:54:07.564 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:93 - found begin line number for start_date 2020-02-23: 1368
2020-03-25 07:54:07.564 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:95 - finding end line number for stop_date 2020-03-24
0

使用`SituationCalculator`计算节点逐日运行状态

In [4]:
dfa_engine = TaskStatusChangeDFA
stop_states = (
    TaskSituationType.Complete,
)
dfa_kwargs = None
target_state = TaskSituationType.Complete

calculator = SituationCalculator(
    dfa_engine=dfa_engine,
    stop_states=stop_states,
    dfa_kwargs=dfa_kwargs,
)

situations = calculator.get_situations(
    records=records,
    node_path=node_path,
    start_date=start_date,
    end_date=stop_date,
)

2020-03-25 07:54:07.590 | INFO     | nwpc_workflow_log_tool.situation.situation_calculator:get_situations:46 - Finding StatusLogRecord for /grapes_meso_3km_v4_4/00/model/fcst
2020-03-25 07:54:07.590 | INFO     | nwpc_workflow_log_tool.situation.situation_calculator:get_situations:52 - Calculating node status change using DFA...
2020-03-25 07:54:07.613 | INFO     | nwpc_workflow_log_tool.situation.situation_calculator:get_situations:80 - Calculating node status change using DFA...Done


使用`NodeTableProcessor`构建表格数据，包含逐日运行情况。其中

- `situation`表示节点运行状况(`TaskSituationType`)，本示例中只关心正常结束的任务。
- `submitted`表示提交状态，`active`表示运行状态，`complete`表示完成状态。
- `in_all`表示所有运行时间，`in_submitted`表示节点排队时间，`in_active`表示节点运行时间。

In [5]:
processor = NodeTableProcessor(
    node_path=node_path,
    target_state=target_state,
)
table_data = processor.process(situations)
table_data

Unnamed: 0,start_time,situation,time_point_submitted,time_point_active,time_point_complete,time_point_aborted,time_period_in_all,time_period_in_all_start,time_period_in_all_end,time_period_in_submitted,time_period_in_submitted_start,time_period_in_submitted_end,time_period_in_active,time_period_in_active_start,time_period_in_active_end
2020022300,2020-02-23,Complete,2020-02-23 04:38:51,2020-02-23 04:38:52,2020-02-23 05:42:41,,01:03:50,2020-02-23 04:38:51,2020-02-23 05:42:41,00:00:01,2020-02-23 04:38:51,2020-02-23 04:38:52,01:03:49,2020-02-23 04:38:52,2020-02-23 05:42:41
2020022400,2020-02-24,Complete,2020-02-24 04:39:36,2020-02-24 04:39:39,2020-02-24 05:44:26,,01:04:50,2020-02-24 04:39:36,2020-02-24 05:44:26,00:00:03,2020-02-24 04:39:36,2020-02-24 04:39:39,01:04:47,2020-02-24 04:39:39,2020-02-24 05:44:26
2020022500,2020-02-25,Complete,2020-02-25 04:39:08,2020-02-25 04:39:11,2020-02-25 05:46:19,,01:07:11,2020-02-25 04:39:08,2020-02-25 05:46:19,00:00:03,2020-02-25 04:39:08,2020-02-25 04:39:11,01:07:08,2020-02-25 04:39:11,2020-02-25 05:46:19
2020022600,2020-02-26,Complete,2020-02-26 04:36:31,2020-02-26 04:36:32,2020-02-26 05:43:19,,01:06:48,2020-02-26 04:36:31,2020-02-26 05:43:19,00:00:01,2020-02-26 04:36:31,2020-02-26 04:36:32,01:06:47,2020-02-26 04:36:32,2020-02-26 05:43:19
2020022700,2020-02-27,Complete,2020-02-27 04:36:43,2020-02-27 04:36:46,2020-02-27 05:44:18,,01:07:35,2020-02-27 04:36:43,2020-02-27 05:44:18,00:00:03,2020-02-27 04:36:43,2020-02-27 04:36:46,01:07:32,2020-02-27 04:36:46,2020-02-27 05:44:18
2020022800,2020-02-28,Complete,2020-02-28 04:37:35,2020-02-28 04:37:35,2020-02-28 05:46:51,,01:09:16,2020-02-28 04:37:35,2020-02-28 05:46:51,00:00:00,2020-02-28 04:37:35,2020-02-28 04:37:35,01:09:16,2020-02-28 04:37:35,2020-02-28 05:46:51
2020022900,2020-02-29,Complete,2020-02-29 04:36:52,2020-02-29 04:36:52,2020-02-29 05:39:03,,01:02:11,2020-02-29 04:36:52,2020-02-29 05:39:03,00:00:00,2020-02-29 04:36:52,2020-02-29 04:36:52,01:02:11,2020-02-29 04:36:52,2020-02-29 05:39:03
2020030100,2020-03-01,Complete,2020-03-01 04:37:08,2020-03-01 04:37:09,2020-03-01 06:25:50,,01:48:42,2020-03-01 04:37:08,2020-03-01 06:25:50,00:00:01,2020-03-01 04:37:08,2020-03-01 04:37:09,01:48:41,2020-03-01 04:37:09,2020-03-01 06:25:50
2020030200,2020-03-02,Complete,2020-03-02 04:36:34,2020-03-02 04:36:34,2020-03-02 05:46:23,,01:09:49,2020-03-02 04:36:34,2020-03-02 05:46:23,00:00:00,2020-03-02 04:36:34,2020-03-02 04:36:34,01:09:49,2020-03-02 04:36:34,2020-03-02 05:46:23
2020030300,2020-03-03,Complete,2020-03-03 04:36:48,2020-03-03 04:36:49,2020-03-03 05:41:54,,01:05:06,2020-03-03 04:36:48,2020-03-03 05:41:54,00:00:01,2020-03-03 04:36:48,2020-03-03 04:36:49,01:05:05,2020-03-03 04:36:49,2020-03-03 05:41:54


使用`TimePeriodPresenter`显示统计结果，包括：

- `start_clock`：启动时间点
- `end_clock`：结束时间点
- `duration`：运行时长

In [6]:
presenter = TimePeriodPresenter(
    target_state=target_state,
)
presenter.present(table_data)

           start_time start_clock end_clock duration
2020022300 2020-02-23    04:38:51  05:42:41 01:03:50
2020022400 2020-02-24    04:39:36  05:44:26 01:04:50
2020022500 2020-02-25    04:39:08  05:46:19 01:07:11
2020022600 2020-02-26    04:36:31  05:43:19 01:06:48
2020022700 2020-02-27    04:36:43  05:44:18 01:07:35
2020022800 2020-02-28    04:37:35  05:46:51 01:09:16
2020022900 2020-02-29    04:36:52  05:39:03 01:02:11
2020030100 2020-03-01    04:37:08  06:25:50 01:48:42
2020030200 2020-03-02    04:36:34  05:46:23 01:09:49
2020030300 2020-03-03    04:36:48  05:41:54 01:05:06
2020030400 2020-03-04    04:36:59  05:46:48 01:09:49
2020030500 2020-03-05    04:36:51  05:45:18 01:08:27
2020030600 2020-03-06    04:37:39  05:51:35 01:13:56
2020030700 2020-03-07    04:36:56  05:46:16 01:09:20
2020030800 2020-03-08    04:36:50  05:49:17 01:12:27
2020030900 2020-03-09    04:37:09  06:34:52 01:57:43
2020031000 2020-03-10    04:36:55  05:50:38 01:13:43
2020031100 2020-03-11    04:36:42  06:00:40 01

下面我们修改`TimePeriodPresenter`的代码，方便进一步计算。

In [7]:
df = table_data.copy()

df["start_clock"] = table_data.time_period_in_all_start - table_data.start_time
df["end_clock"] = table_data.time_period_in_all_end - table_data.start_time

df.rename(columns={
    "time_period_in_all": "duration"
}, inplace=True)

df = df[["start_time", "duration", "start_clock", "end_clock"]]
df

Unnamed: 0,start_time,duration,start_clock,end_clock
2020022300,2020-02-23,01:03:50,04:38:51,05:42:41
2020022400,2020-02-24,01:04:50,04:39:36,05:44:26
2020022500,2020-02-25,01:07:11,04:39:08,05:46:19
2020022600,2020-02-26,01:06:48,04:36:31,05:43:19
2020022700,2020-02-27,01:07:35,04:36:43,05:44:18
2020022800,2020-02-28,01:09:16,04:37:35,05:46:51
2020022900,2020-02-29,01:02:11,04:36:52,05:39:03
2020030100,2020-03-01,01:48:42,04:37:08,06:25:50
2020030200,2020-03-02,01:09:49,04:36:34,05:46:23
2020030300,2020-03-03,01:05:06,04:36:48,05:41:54


统计切尾平均值（比例0.25），得到的结束时间`end_clock_mean`作为标准结束时间。

In [8]:
ratio = 0.25

start_clock_mean = pd.to_timedelta(stats.trim_mean(table_data["start_clock"].values, ratio))
print()
print(f"Trimmed Mean for start time ({ratio}):")
print(start_clock_mean)

end_clock_mean = pd.to_timedelta(stats.trim_mean(table_data["end_clock"].values, ratio))
print()
print(f"Trimmed Mean for end time ({ratio}):")
print(end_clock_mean)


Trimmed Mean for start time (0.25):
0 days 04:36:55.312500

Trimmed Mean for end time (0.25):
0 days 05:46:19.375000


计算结束时间是否超过标准结束时间三十分钟

In [9]:
time_threshold = pd.Timedelta(minutes=30)
df["end_clock_late"] = df["end_clock"] - end_clock_mean > time_threshold
df

Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,01:03:50,04:38:51,05:42:41,False
2020022400,2020-02-24,01:04:50,04:39:36,05:44:26,False
2020022500,2020-02-25,01:07:11,04:39:08,05:46:19,False
2020022600,2020-02-26,01:06:48,04:36:31,05:43:19,False
2020022700,2020-02-27,01:07:35,04:36:43,05:44:18,False
2020022800,2020-02-28,01:09:16,04:37:35,05:46:51,False
2020022900,2020-02-29,01:02:11,04:36:52,05:39:03,False
2020030100,2020-03-01,01:48:42,04:37:08,06:25:50,True
2020030200,2020-03-02,01:09:49,04:36:34,05:46:23,False
2020030300,2020-03-03,01:05:06,04:36:48,05:41:54,False


列出超过阈值的条目

In [10]:
df[df["end_clock_late"] == True]

Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020030100,2020-03-01,01:48:42,04:37:08,06:25:50,True
2020030900,2020-03-09,01:57:43,04:37:09,06:34:52,True
2020031300,2020-03-13,01:46:16,04:36:38,06:22:54,True
2020031600,2020-03-16,01:41:41,04:36:48,06:18:29,True
2020031700,2020-03-17,01:40:04,04:39:01,06:19:05,True


保存为csv文件

In [11]:
df.to_csv("grapes_meso_3km_fcst.csv")

## 整合

In [12]:
def run(file_path, node_path, stop_date = datetime.datetime(2020, 3, 24)):
    start_date = stop_date - datetime.timedelta(days=30)
    
    records = get_record_list(file_path, node_path, start_date, stop_date)
    
    dfa_engine = TaskStatusChangeDFA
    stop_states = (
        TaskSituationType.Complete,
    )
    dfa_kwargs = None
    target_state = TaskSituationType.Complete
    
    calculator = SituationCalculator(
        dfa_engine=dfa_engine,
        stop_states=stop_states,
        dfa_kwargs=dfa_kwargs,
    )
    
    situations = calculator.get_situations(
        records=records,
        node_path=node_path,
        start_date=start_date,
        end_date=stop_date,
    )
    
    processor = NodeTableProcessor(
        node_path=node_path,
        target_state=target_state,
    )
    table_data = processor.process(situations)
    
    df = table_data.copy()

    df["start_clock"] = table_data.time_period_in_all_start - table_data.start_time
    df["end_clock"] = table_data.time_period_in_all_end - table_data.start_time
    
    df.rename(columns={
        "time_period_in_all": "duration"
    }, inplace=True)

    df = df[["start_time", "duration", "start_clock", "end_clock"]]
    start_clock_mean = pd.to_timedelta(stats.trim_mean(df["start_clock"].values, ratio))
    print("start:", start_clock_mean)
    end_clock_mean = pd.to_timedelta(stats.trim_mean(df["end_clock"].values, ratio))
    print("end:", end_clock_mean)
    time_threshold = pd.Timedelta(minutes=30)
    df["end_clock_late"] = df["end_clock"] - end_clock_mean > time_threshold
    return df

重新运行GRAPES MESO 3km统计

In [13]:
file_path = "../dist/log/grapes_meso_3km/fcst.txt"
node_path = "/grapes_meso_3km_v4_4/00/model/fcst"
stop_date = datetime.datetime(2020, 3, 24)
df = run(file_path, node_path, stop_date)
df

2020-03-25 07:54:08.024 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_record_list:161 - Finding line range in date range: 2020-02-23 00:00:00, 2020-03-24 00:00:00
2020-03-25 07:54:08.024 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:54 - counting total line number...
2020-03-25 07:54:08.025 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:56 - got total line number: 2126
2020-03-25 07:54:08.026 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:63 - finding begin line number for start_date 2020-02-23
0% [############################  ] 100% | ETA: 00:00:002020-03-25 07:54:08.030 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:93 - found begin line number for start_date 2020-02-23: 1368
2020-03-25 07:54:08.031 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:95 - finding end line number for stop_date 2020-03-24
0

start: 0 days 04:36:55.312500
end: 0 days 05:46:19.375000


Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,01:03:50,04:38:51,05:42:41,False
2020022400,2020-02-24,01:04:50,04:39:36,05:44:26,False
2020022500,2020-02-25,01:07:11,04:39:08,05:46:19,False
2020022600,2020-02-26,01:06:48,04:36:31,05:43:19,False
2020022700,2020-02-27,01:07:35,04:36:43,05:44:18,False
2020022800,2020-02-28,01:09:16,04:37:35,05:46:51,False
2020022900,2020-02-29,01:02:11,04:36:52,05:39:03,False
2020030100,2020-03-01,01:48:42,04:37:08,06:25:50,True
2020030200,2020-03-02,01:09:49,04:36:34,05:46:23,False
2020030300,2020-03-03,01:05:06,04:36:48,05:41:54,False


## GRAPES GFS

In [14]:
file_path = "../dist/log/grapes_gfs/fcst.txt"
node_path = "/gmf_grapes_gfs_v2.4/00/model/fcst_long"
stop_date = datetime.datetime(2020, 3, 24)
df = run(file_path, node_path, stop_date)
df

2020-03-25 07:54:08.293 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_record_list:161 - Finding line range in date range: 2020-02-23 00:00:00, 2020-03-24 00:00:00
2020-03-25 07:54:08.293 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:54 - counting total line number...
2020-03-25 07:54:08.294 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:56 - got total line number: 2019
2020-03-25 07:54:08.295 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:63 - finding begin line number for start_date 2020-02-23
0% [############################# ] 100% | ETA: 00:00:002020-03-25 07:54:08.298 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:93 - found begin line number for start_date 2020-02-23: 1268
2020-03-25 07:54:08.299 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:95 - finding end line number for stop_date 2020-03-24
0

start: 0 days 04:15:35.812500
end: 0 days 04:43:55.062500


Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,00:25:59,04:14:58,04:40:57,False
2020022400,2020-02-24,00:26:28,04:15:12,04:41:40,False
2020022500,2020-02-25,00:26:15,04:14:25,04:40:40,False
2020022600,2020-02-26,00:30:58,04:40:54,05:11:52,False
2020022700,2020-02-27,00:25:10,05:29:57,05:55:07,True
2020022800,2020-02-28,00:25:09,04:20:02,04:45:11,False
2020022900,2020-02-29,01:03:13,04:21:04,05:24:17,True
2020030100,2020-03-01,01:09:25,04:07:57,05:17:22,True
2020030200,2020-03-02,00:53:32,04:11:59,05:05:31,False
2020030300,2020-03-03,00:24:41,04:12:14,04:36:55,False


有无法识别的日期，使用`dropna()`方法删除无效行。

In [15]:
df = df.dropna()
df

Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,00:25:59,04:14:58,04:40:57,False
2020022400,2020-02-24,00:26:28,04:15:12,04:41:40,False
2020022500,2020-02-25,00:26:15,04:14:25,04:40:40,False
2020022600,2020-02-26,00:30:58,04:40:54,05:11:52,False
2020022700,2020-02-27,00:25:10,05:29:57,05:55:07,True
2020022800,2020-02-28,00:25:09,04:20:02,04:45:11,False
2020022900,2020-02-29,01:03:13,04:21:04,05:24:17,True
2020030100,2020-03-01,01:09:25,04:07:57,05:17:22,True
2020030200,2020-03-02,00:53:32,04:11:59,05:05:31,False
2020030300,2020-03-03,00:24:41,04:12:14,04:36:55,False


In [16]:
df.to_csv("grapes_gfs.csv")

## GRAPES MESO 10km

In [17]:
file_path = "../dist/log/grapes_meso_10km/fcst.txt"
node_path = "/grapes_meso_10km_v4_3/00/model/fcst"
stop_date = datetime.datetime(2020, 3, 24)
df = run(file_path, node_path, stop_date)
df

2020-03-25 07:54:09.478 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_record_list:161 - Finding line range in date range: 2020-02-23 00:00:00, 2020-03-24 00:00:00
2020-03-25 07:54:09.478 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:54 - counting total line number...
2020-03-25 07:54:09.479 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:56 - got total line number: 2096
2020-03-25 07:54:09.480 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:63 - finding begin line number for start_date 2020-02-23
0% [############################  ] 100% | ETA: 00:00:002020-03-25 07:54:09.484 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:93 - found begin line number for start_date 2020-02-23: 1352
2020-03-25 07:54:09.484 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:95 - finding end line number for stop_date 2020-03-24
0

start: 0 days 03:24:53.250000
end: 0 days 04:00:03.812500


Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,00:35:06,03:24:22,03:59:28,False
2020022400,2020-02-24,00:33:21,03:33:04,04:06:25,False
2020022500,2020-02-25,00:34:44,03:25:38,04:00:22,False
2020022600,2020-02-26,00:36:24,03:24:53,04:01:17,False
2020022700,2020-02-27,00:35:27,03:24:27,03:59:54,False
2020022800,2020-02-28,00:35:19,03:25:53,04:01:12,False
2020022900,2020-02-29,00:36:11,03:25:19,04:01:30,False
2020030100,2020-03-01,00:34:38,03:25:20,03:59:58,False
2020030200,2020-03-02,00:35:13,03:26:32,04:01:45,False
2020030300,2020-03-03,00:34:44,03:24:20,03:59:04,False


In [18]:
df.to_csv("grapes_meso_10km.csv")

## GRAPES TYM

In [19]:
file_path = "../dist/log/grapes_tym/fcst.txt"
node_path = "/grapes_tym/grapes_d01/00/model/grapes"
stop_date = datetime.datetime(2020, 3, 24)
df = run(file_path, node_path, stop_date)
df

2020-03-25 07:54:09.731 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_record_list:161 - Finding line range in date range: 2020-02-23 00:00:00, 2020-03-24 00:00:00
2020-03-25 07:54:09.732 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:54 - counting total line number...
2020-03-25 07:54:09.733 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:56 - got total line number: 1157
2020-03-25 07:54:09.733 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:63 - finding begin line number for start_date 2020-02-23
0% [#########################     ] 100% | ETA: 00:00:012020-03-25 07:54:09.740 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:93 - found begin line number for start_date 2020-02-23: 723
2020-03-25 07:54:09.740 | INFO     | nwpc_workflow_log_collector.ecflow.log_file_util:get_line_no_range:95 - finding end line number for stop_date 2020-03-24
0%

start: 0 days 04:25:28.812500
end: 0 days 05:48:20.875000


Unnamed: 0,start_time,duration,start_clock,end_clock,end_clock_late
2020022300,2020-02-23,01:21:20,04:26:26,05:47:46,False
2020022400,2020-02-24,01:23:54,04:25:40,05:49:34,False
2020022500,2020-02-25,01:22:46,04:25:57,05:48:43,False
2020022600,2020-02-26,01:22:32,04:24:55,05:47:27,False
2020022700,2020-02-27,01:19:25,04:24:36,05:44:01,False
2020022800,2020-02-28,01:21:10,04:24:46,05:45:56,False
2020022900,2020-02-29,01:20:32,04:26:32,05:47:04,False
2020030100,2020-03-01,02:11:55,04:30:30,06:42:25,True
2020030200,2020-03-02,01:22:40,04:24:30,05:47:10,False
2020030300,2020-03-03,01:21:53,04:24:55,05:46:48,False


In [20]:
df.to_csv("grapes_tym.csv")