Skip to content

Commit

Permalink
Creating big query tables for perf tests - 3 (#1450)
Browse files Browse the repository at this point in the history
* Fix DiffFiles utility function

1. It chunkifies DiffFiles utility function.
This rationalizes the memory requirement
of file-op of this utility. This utility
earlier used to the read the complete files
in memory, but now only a chunk is read and
compared at a time.

2. Rename DiffFiles to AreFilesIdentical

Renames file-op utility DiffFiles to AreFilesIdentical
and changes it to return bool instead of int.

* Copying logs on kokoro artifacts incase of integration test failure. (#1380)

* adding print log file statement

* failure test

* testing error

* undo testing changes

* formating

* fix comment

* adding test error

* small fix

* small fix

* testing

* adding action artifacts

* small fix

* small fix

* removing error statement

* testing

* testing

* testing

* undo changes

* undo testing changes

* fixing comments

* Upgrading go version from 1.21.0 to 1.21.1 (#1401)

* Upgrading go version to 1.21.1

* Temp changes to run perf/integration test

* Revert "Temp changes to run perf/integration test"

This reverts commit 18dd533.

* Updating semantics docs (#1407)

* Print stack-trace on crash

For now, it just prints stack-trace
whenever anyone calls logger.Fatalf(...).
This needs to be enhanced further to
include more scenarios (i.e. more
sources of crashes).

* Throw error in case of unexpected fields (#1416)

Also, improved the doc for config-file.

* Upgraded the fuse library (#1419)

* add "create-empty-file: true" tests to run_tests_mounted_directory.sh (#1413)

* Upgrade golang from 1.21.1 to 1.21.2 (#1431)

Upgrade from golang 1.21.1 to 1.21.2

* Update yaml package version from v2 to v3 in integration tests (#1434)

* update go yaml package version from v2 to v3 in integration tests

* Empty-Commit

* Passing gcsfuse flags from build.sh to mount the test bucket for perf… (#1430)

* Passing gcsfuse flags from build.sh to mount the test bucket for performing list benchmarking

* removing __pycache

* small fix

* small fix

* unmount after fio tests

* small fix

* empty commit

* testing kokoro perf tests

* undo testing changes

* adding upload flag

* small fix

* small fix

* upload to upload_gs

* unnecessary change

* adding big query table setup

* changes for small pr

* adding requirements and setup scripts

* indentation fix

* removing unnecessary changes

* adding start build time

* adding requirements and setup scripts

* python file changes

* small fix

* testing

* testing gsheet upload

* perfmetrics/scripts/ls_metrics/listing_benchmark.py

* testing

* removing unnecessary changes

* Creating big query tables for perf tests - 1 (#1444)

* adding big query table setup

* changes for small pr

* indentation fix

* removing unnecessary changes

* adding start build time

* changes for small pr

* indentation fix

* removing unnecessary changes

* adding requirements and setup scripts

* merge with parent

* testing

* testing gsheet upload

* merge with parent

* removing unnecessary changes

* fixing requirements.in

* Creating big query tables for perf tests - 2 (#1445)

* adding big query table setup

* changes for small pr

* adding requirements and setup scripts

* fixing requirements.in

* merge

* merge

* small fixes

* small fixes

* small fix

* undo testing changes

* formating

* testing changes

* undo testing changes

* remove unnecessary functions

* fixing indentation

* small fix

* adding testing changes

* small fix

* undo testing changes

* formating

* fixing unit tests

* formatting

* fixing commets and adding test changes

* undo testing changes

---------

Co-authored-by: Nitin Garg <gargnitin@google.com>
Co-authored-by: Prince Kumar <princer@google.com>
Co-authored-by: Ayush Sethi <ayushsethi@google.com>
Co-authored-by: Ashmeen Kaur <57195160+ashmeenkaur@users.noreply.github.com>
Co-authored-by: Nitin Garg <113666283+gargnitingoogle@users.noreply.github.com>
  • Loading branch information
6 people committed Nov 1, 2023
1 parent 26adae8 commit 3fef64b
Show file tree
Hide file tree
Showing 13 changed files with 265 additions and 258 deletions.
2 changes: 1 addition & 1 deletion perfmetrics/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ gsutil cp gs://your-bucket-name/creds.json ./gsheet
11. Change the Google sheet id in this [line](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/perfmetrics/scripts/gsheet/gsheet.py#L5) to `your-gsheet-id`.
12. Finally, execute fetch_metrics.py to extract FIO and VM metrics and write to your Google Sheet by running
```bash
python3 fetch_metrics.py output.json
python3 fetch_and_upload_metrics.py output.json
```
The FIO output JSON file is passed as an argument to the fetch_metrics module.

Expand Down
1 change: 1 addition & 0 deletions perfmetrics/scripts/bigquery/experiments_gcsfuse_bq.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ def setup_dataset_and_tables(self):
CREATE TABLE IF NOT EXISTS {}.{}.{}(
configuration_id STRING,
start_time_build INT64,
mount_type STRING,
test_description string,
command STRING,
num_files INT64,
Expand Down
3 changes: 3 additions & 0 deletions perfmetrics/scripts/continuous_test/gcp_ubuntu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ chmod +x perfmetrics/scripts/build_and_install_gcsfuse.sh
# Mounting gcs bucket
cd "./perfmetrics/scripts/"

echo Installing Bigquery module requirements...
pip install --require-hashes -r bigquery/requirements.txt --user

# Upload data to the gsheet only when it runs through kokoro.
UPLOAD_FLAGS=""
if [ "${KOKORO_JOB_TYPE}" == "RELEASE" ] || [ "${KOKORO_JOB_TYPE}" == "CONTINUOUS_INTEGRATION" ] || [ "${KOKORO_JOB_TYPE}" == "PRESUBMIT_GITHUB" ];
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Executes fio_metrics.py and vm_metrics.py by passing appropriate arguments.
"""
import socket
Expand All @@ -7,6 +21,8 @@
from fio import fio_metrics
from vm_metrics import vm_metrics
from gsheet import gsheet
from bigquery import constants
from bigquery import experiments_gcsfuse_bq

INSTANCE = socket.gethostname()
PERIOD_SEC = 120
Expand Down Expand Up @@ -40,6 +56,27 @@ def _parse_arguments(argv):
default=False,
required=False,
)
parser.add_argument(
'--upload_bq',
help='Upload the results to the BigQuery.',
action='store_true',
default=False,
required=False,
)
parser.add_argument(
'--config_id',
help='Configuration ID of the experiment.',
action='store',
nargs=1,
required=False,
)
parser.add_argument(
'--start_time_build',
help='Start time of the build.',
action='store',
nargs=1,
required=False,
)
return parser.parse_args(argv[1:])


Expand All @@ -51,15 +88,22 @@ def _parse_arguments(argv):

args = _parse_arguments(argv)

temp = fio_metrics_obj.get_metrics(args.fio_json_output_path)
metrics_data = fio_metrics_obj.get_values_to_upload(temp)

if args.upload_gs:
temp = fio_metrics_obj.get_metrics(args.fio_json_output_path, FIO_WORKSHEET_NAME)
else:
temp = fio_metrics_obj.get_metrics(args.fio_json_output_path)
gsheet.write_to_google_sheet(FIO_WORKSHEET_NAME,metrics_data)

if args.upload_bq:
if not args.config_id or not args.start_time_build:
raise Exception("Pass required arguments experiments configuration ID and start time of build for uploading to BigQuery")
bigquery_obj = experiments_gcsfuse_bq.ExperimentsGCSFuseBQ(constants.PROJECT_ID, constants.DATASET_ID)
bigquery_obj.upload_metrics_to_table(constants.FIO_TABLE_ID, args.config_id[0], args.start_time_build[0], metrics_data)

print('Waiting for 360 seconds for metrics to be updated on VM...')
# It takes up to 240 seconds for sampled data to be visible on the VM metrics graph
# So, waiting for 360 seconds to ensure the returned metrics are not empty.
# Intermittenly custom metrics are not available after 240 seconds, hence
# Intermittently custom metrics are not available after 240 seconds, hence
# waiting for 360 secs instead of 240 secs
time.sleep(360)

Expand Down Expand Up @@ -96,3 +140,9 @@ def _parse_arguments(argv):

if args.upload_gs:
gsheet.write_to_google_sheet(VM_WORKSHEET_NAME, vm_metrics_data)

if args.upload_bq:
if not args.config_id or not args.start_time_build:
raise Exception("Pass required arguments experiments configuration ID and start time of build for uploading to BigQuery")
bigquery_obj = experiments_gcsfuse_bq.ExperimentsGCSFuseBQ(constants.PROJECT_ID, constants.DATASET_ID)
bigquery_obj.upload_metrics_to_table(constants.VM_TABLE_ID, args.config_id[0], args.start_time_build[0], vm_metrics_data)
51 changes: 29 additions & 22 deletions perfmetrics/scripts/fio/fio_metrics.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Extracts required metrics from fio output file and writes to google sheet.
Takes fio output json filepath as command-line input
Expand All @@ -18,14 +32,17 @@
from fio import constants as consts
from gsheet import gsheet

from bigquery import constants
from bigquery import experiments_gcsfuse_bq


@dataclass(frozen=True)
class JobParam:
"""Dataclass for a FIO job parameter.
name: Can be any suitable value, it refers to the output dictionary key for
the parameter. To be used when creating parameter dict for each job.
json_name: Must match the FIO job specification key. Key for parameter inside
json_name: Must match the FIO job specification key. Key for parameter inside
'global options'/'job options' dictionary
Ex: For output json = {"global options": {"filesize":"50M"}, "jobs": [
"job options": {"rw": "read"}]}
Expand All @@ -48,7 +65,7 @@ class JobParam:
class JobMetric:
"""Dataclass for a FIO job metric.
name: Can be any suitable value, it is used as key for the metric
name: Can be any suitable value, it is used as key for the metric
when creating metric dict for each job
levels: Keys for the metric inside 'read'/'write' dictionary in each job.
Each value in the list must match the key in the FIO output JSON
Expand Down Expand Up @@ -403,12 +420,13 @@ def _extract_metrics(self, fio_out) -> List[Dict[str, Any]]:

return all_jobs

def _add_to_gsheet(self, jobs, worksheet_name):
"""Add the metric values to respective columns in a google sheet.
def get_values_to_upload(self, jobs):
"""Get the metrics values in a list to export to Google Spreadsheet and BigQuery.
Args:
jobs: list of dicts, contains required metrics for each job
worksheet_name: str, worksheet where job metrics should be written.
jobs: List of dicts, contains required metrics for each job
Returns:
list: A 2-d list consisting of metrics values for each job
"""

values = []
Expand All @@ -422,29 +440,19 @@ def _add_to_gsheet(self, jobs, worksheet_name):
for metric_val in job[consts.METRICS].values():
row.append(metric_val)
values.append(row)
return values

gsheet.write_to_google_sheet(worksheet_name, values)

def get_metrics(self,
filepath,
worksheet_name=None) -> List[Dict[str, Any]]:
"""Returns job metrics obtained from given filepath and writes to gsheets.
def get_metrics(self, filepath) -> List[Dict[str, Any]]:
"""Returns job metrics obtained from given filepath.
Args:
filepath : str
Path of the json file to be parsed
worksheet_name: str, optional, default:None
Worksheet where job metrics should be written.
Pass '' or None to skip writing to Google sheets
filepath (str): Path of the json file to be parsed
Returns:
List of dicts, contains list of jobs and required metrics for each job
"""
fio_out = self._load_file_dict(filepath)
job_metrics = self._extract_metrics(fio_out)
if worksheet_name:
self._add_to_gsheet(job_metrics, worksheet_name)

return job_metrics

if __name__ == '__main__':
Expand All @@ -455,6 +463,5 @@ def get_metrics(self,
'python3 -m fio.fio_metrics <fio output json filepath>')

fio_metrics_obj = FioMetrics()
temp = fio_metrics_obj.get_metrics(argv[1], 'fio_metrics_expt')
temp = fio_metrics_obj.get_metrics(argv[1])
print(temp)

0 comments on commit 3fef64b

Please sign in to comment.