Creating big query tables for perf tests - 3 (#1450)

* Fix DiffFiles utility function 1. It chunkifies DiffFiles utility function. This rationalizes the memory requirement of file-op of this utility. This utility earlier used to the read the complete files in memory, but now only a chunk is read and compared at a time. 2. Rename DiffFiles to AreFilesIdentical Renames file-op utility DiffFiles to AreFilesIdentical and changes it to return bool instead of int. * Copying logs on kokoro artifacts incase of integration test failure. (#1380) * adding print log file statement * failure test * testing error * undo testing changes * formating * fix comment * adding test error * small fix * small fix * testing * adding action artifacts * small fix * small fix * removing error statement * testing * testing * testing * undo changes * undo testing changes * fixing comments * Upgrading go version from 1.21.0 to 1.21.1 (#1401) * Upgrading go version to 1.21.1 * Temp changes to run perf/integration test * Revert "Temp changes to run perf/integration test" This reverts commit 18dd533. * Updating semantics docs (#1407) * Print stack-trace on crash For now, it just prints stack-trace whenever anyone calls logger.Fatalf(...). This needs to be enhanced further to include more scenarios (i.e. more sources of crashes). * Throw error in case of unexpected fields (#1416) Also, improved the doc for config-file. * Upgraded the fuse library (#1419) * add "create-empty-file: true" tests to run_tests_mounted_directory.sh (#1413) * Upgrade golang from 1.21.1 to 1.21.2 (#1431) Upgrade from golang 1.21.1 to 1.21.2 * Update yaml package version from v2 to v3 in integration tests (#1434) * update go yaml package version from v2 to v3 in integration tests * Empty-Commit * Passing gcsfuse flags from build.sh to mount the test bucket for perf… (#1430) * Passing gcsfuse flags from build.sh to mount the test bucket for performing list benchmarking * removing __pycache * small fix * small fix * unmount after fio tests * small fix * empty commit * testing kokoro perf tests * undo testing changes * adding upload flag * small fix * small fix * upload to upload_gs * unnecessary change * adding big query table setup * changes for small pr * adding requirements and setup scripts * indentation fix * removing unnecessary changes * adding start build time * adding requirements and setup scripts * python file changes * small fix * testing * testing gsheet upload * perfmetrics/scripts/ls_metrics/listing_benchmark.py * testing * removing unnecessary changes * Creating big query tables for perf tests - 1 (#1444) * adding big query table setup * changes for small pr * indentation fix * removing unnecessary changes * adding start build time * changes for small pr * indentation fix * removing unnecessary changes * adding requirements and setup scripts * merge with parent * testing * testing gsheet upload * merge with parent * removing unnecessary changes * fixing requirements.in * Creating big query tables for perf tests - 2 (#1445) * adding big query table setup * changes for small pr * adding requirements and setup scripts * fixing requirements.in * merge * merge * small fixes * small fixes * small fix * undo testing changes * formating * testing changes * undo testing changes * remove unnecessary functions * fixing indentation * small fix * adding testing changes * small fix * undo testing changes * formating * fixing unit tests * formatting * fixing commets and adding test changes * undo testing changes --------- Co-authored-by: Nitin Garg <gargnitin@google.com> Co-authored-by: Prince Kumar <princer@google.com> Co-authored-by: Ayush Sethi <ayushsethi@google.com> Co-authored-by: Ashmeen Kaur <57195160+ashmeenkaur@users.noreply.github.com> Co-authored-by: Nitin Garg <113666283+gargnitingoogle@users.noreply.github.com>
GoogleCloudPlatform · Nov 1, 2023 · 3fef64b · 3fef64b
1 parent 26adae8
commit 3fef64b
Show file tree

Hide file tree

Showing 13 changed files with 265 additions and 258 deletions.
diff --git a/perfmetrics/scripts/README.md b/perfmetrics/scripts/README.md
@@ -54,7 +54,7 @@ gsutil cp gs://your-bucket-name/creds.json ./gsheet
 11. Change the Google sheet id in this [line](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/perfmetrics/scripts/gsheet/gsheet.py#L5) to `your-gsheet-id`.
 12. Finally, execute fetch_metrics.py to extract FIO and VM metrics and write to your Google Sheet by running
 ```bash
-python3 fetch_metrics.py output.json
+python3 fetch_and_upload_metrics.py output.json
 ```
 The FIO output JSON file is passed as an argument to the fetch_metrics module.
 

diff --git a/perfmetrics/scripts/bigquery/experiments_gcsfuse_bq.py b/perfmetrics/scripts/bigquery/experiments_gcsfuse_bq.py
@@ -212,6 +212,7 @@ def setup_dataset_and_tables(self):
       CREATE TABLE IF NOT EXISTS {}.{}.{}(
         configuration_id STRING,
         start_time_build INT64,
+        mount_type STRING,
         test_description string,
         command STRING, 
         num_files INT64, 

diff --git a/perfmetrics/scripts/continuous_test/gcp_ubuntu/build.sh b/perfmetrics/scripts/continuous_test/gcp_ubuntu/build.sh
@@ -30,6 +30,9 @@ chmod +x perfmetrics/scripts/build_and_install_gcsfuse.sh
 # Mounting gcs bucket
 cd "./perfmetrics/scripts/"
 
+echo Installing Bigquery module requirements...
+pip install --require-hashes -r bigquery/requirements.txt --user
+
 # Upload data to the gsheet only when it runs through kokoro.
 UPLOAD_FLAGS=""
 if [ "${KOKORO_JOB_TYPE}" == "RELEASE" ] || [ "${KOKORO_JOB_TYPE}" == "CONTINUOUS_INTEGRATION" ] || [ "${KOKORO_JOB_TYPE}" == "PRESUBMIT_GITHUB" ];

diff --git a/perfmetrics/scripts/fetch_metrics.py → ...trics/scripts/fetch_and_upload_metrics.py b/perfmetrics/scripts/fetch_metrics.py → ...trics/scripts/fetch_and_upload_metrics.py
@@ -1,3 +1,17 @@
+# Copyright 2023 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http:#www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 """Executes fio_metrics.py and vm_metrics.py by passing appropriate arguments.
 """
 import socket
@@ -7,6 +21,8 @@
 from fio import fio_metrics
 from vm_metrics import vm_metrics
 from gsheet import gsheet
+from bigquery import constants
+from bigquery import experiments_gcsfuse_bq
 
 INSTANCE = socket.gethostname()
 PERIOD_SEC = 120
@@ -40,6 +56,27 @@ def _parse_arguments(argv):
       default=False,
       required=False,
   )
+  parser.add_argument(
+      '--upload_bq',
+      help='Upload the results to the BigQuery.',
+      action='store_true',
+      default=False,
+      required=False,
+  )
+  parser.add_argument(
+      '--config_id',
+      help='Configuration ID of the experiment.',
+      action='store',
+      nargs=1,
+      required=False,
+  )
+  parser.add_argument(
+      '--start_time_build',
+      help='Start time of the build.',
+      action='store',
+      nargs=1,
+      required=False,
+  )
   return parser.parse_args(argv[1:])
 
 
@@ -51,15 +88,22 @@ def _parse_arguments(argv):
 
   args = _parse_arguments(argv)
 
+  temp = fio_metrics_obj.get_metrics(args.fio_json_output_path)
+  metrics_data = fio_metrics_obj.get_values_to_upload(temp)
+
   if args.upload_gs:
-    temp = fio_metrics_obj.get_metrics(args.fio_json_output_path, FIO_WORKSHEET_NAME)
-  else:
-    temp = fio_metrics_obj.get_metrics(args.fio_json_output_path)
+    gsheet.write_to_google_sheet(FIO_WORKSHEET_NAME,metrics_data)
+
+  if args.upload_bq:
+    if not args.config_id or not args.start_time_build:
+      raise Exception("Pass required arguments experiments configuration ID and start time of build for uploading to BigQuery")
+    bigquery_obj = experiments_gcsfuse_bq.ExperimentsGCSFuseBQ(constants.PROJECT_ID, constants.DATASET_ID)
+    bigquery_obj.upload_metrics_to_table(constants.FIO_TABLE_ID, args.config_id[0], args.start_time_build[0], metrics_data)
 
   print('Waiting for 360 seconds for metrics to be updated on VM...')
   # It takes up to 240 seconds for sampled data to be visible on the VM metrics graph
   # So, waiting for 360 seconds to ensure the returned metrics are not empty.
-  # Intermittenly custom metrics are not available after 240 seconds, hence
+  # Intermittently custom metrics are not available after 240 seconds, hence
   # waiting for 360 secs instead of 240 secs
   time.sleep(360)
 
@@ -96,3 +140,9 @@ def _parse_arguments(argv):
 
   if args.upload_gs:
     gsheet.write_to_google_sheet(VM_WORKSHEET_NAME, vm_metrics_data)
+
+  if args.upload_bq:
+    if not args.config_id or not args.start_time_build:
+      raise Exception("Pass required arguments experiments configuration ID and start time of build for uploading to BigQuery")
+    bigquery_obj = experiments_gcsfuse_bq.ExperimentsGCSFuseBQ(constants.PROJECT_ID, constants.DATASET_ID)
+    bigquery_obj.upload_metrics_to_table(constants.VM_TABLE_ID, args.config_id[0], args.start_time_build[0], vm_metrics_data)
diff --git a/perfmetrics/scripts/fio/fio_metrics.py b/perfmetrics/scripts/fio/fio_metrics.py
@@ -1,3 +1,17 @@
+# Copyright 2023 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http:#www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 """Extracts required metrics from fio output file and writes to google sheet.
 
    Takes fio output json filepath as command-line input
@@ -18,14 +32,17 @@
 from fio import constants as consts
 from gsheet import gsheet
 
+from bigquery import constants
+from bigquery import experiments_gcsfuse_bq
+
 
 @dataclass(frozen=True)
 class JobParam:
   """Dataclass for a FIO job parameter.
 
   name: Can be any suitable value, it refers to the output dictionary key for
   the parameter. To be used when creating parameter dict for each job.
-  json_name: Must match the FIO job specification key. Key for parameter inside 
+  json_name: Must match the FIO job specification key. Key for parameter inside
   'global options'/'job options' dictionary
     Ex: For output json = {"global options": {"filesize":"50M"}, "jobs": [
     "job options": {"rw": "read"}]}
@@ -48,7 +65,7 @@ class JobParam:
 class JobMetric:
   """Dataclass for a FIO job metric.
 
-  name: Can be any suitable value, it is used as key for the metric 
+  name: Can be any suitable value, it is used as key for the metric
   when creating metric dict for each job
   levels: Keys for the metric inside 'read'/'write' dictionary in each job.
   Each value in the list must match the key in the FIO output JSON
@@ -403,12 +420,13 @@ def _extract_metrics(self, fio_out) -> List[Dict[str, Any]]:
 
     return all_jobs
 
-  def _add_to_gsheet(self, jobs, worksheet_name):
-    """Add the metric values to respective columns in a google sheet.
+  def get_values_to_upload(self, jobs):
+    """Get the metrics values in a list to export to Google Spreadsheet and BigQuery.
 
     Args:
-      jobs: list of dicts, contains required metrics for each job
-      worksheet_name: str, worksheet where job metrics should be written.
+      jobs: List of dicts, contains required metrics for each job
+    Returns:
+      list: A 2-d list consisting of metrics values for each job
     """
 
     values = []
@@ -422,29 +440,19 @@ def _add_to_gsheet(self, jobs, worksheet_name):
       for metric_val in job[consts.METRICS].values():
         row.append(metric_val)
       values.append(row)
+    return values
 
-    gsheet.write_to_google_sheet(worksheet_name, values)
-
-  def get_metrics(self,
-      filepath,
-      worksheet_name=None) -> List[Dict[str, Any]]:
-    """Returns job metrics obtained from given filepath and writes to gsheets.
+  def get_metrics(self, filepath) -> List[Dict[str, Any]]:
+    """Returns job metrics obtained from given filepath.
 
     Args:
-      filepath : str
-        Path of the json file to be parsed
-      worksheet_name: str, optional, default:None
-        Worksheet where job metrics should be written.
-        Pass '' or None to skip writing to Google sheets
+      filepath (str): Path of the json file to be parsed
 
     Returns:
       List of dicts, contains list of jobs and required metrics for each job
     """
     fio_out = self._load_file_dict(filepath)
     job_metrics = self._extract_metrics(fio_out)
-    if worksheet_name:
-      self._add_to_gsheet(job_metrics, worksheet_name)
-
     return job_metrics
 
 if __name__ == '__main__':
@@ -455,6 +463,5 @@ def get_metrics(self,
                     'python3 -m fio.fio_metrics <fio output json filepath>')
 
   fio_metrics_obj = FioMetrics()
-  temp = fio_metrics_obj.get_metrics(argv[1], 'fio_metrics_expt')
+  temp = fio_metrics_obj.get_metrics(argv[1])
   print(temp)
-