# Success Report

Calculate the following to compile into the HDP QAQC success report:

- % of all obs flagged

- % of obs per var flagged

- % of obs per network flagged

- % of flags per QA/QC test


## Environment set-up

In [1]:
import datetime
import boto3
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
from io import BytesIO, StringIO

# Import qaqc stage calc functions
from QAQC_pipeline import  qaqc_ds_to_df

# Silence warnings
import warnings
from shapely.errors import ShapelyDeprecationWarning

warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings(
    "ignore", category=ShapelyDeprecationWarning
)  # Warning is raised when creating Point object from coords. Can't figure out why.

plt.rcParams["figure.dpi"] = 300

In [2]:
# AWS credentials
s3 = boto3.resource("s3")
s3_cl = boto3.client("s3")

## AWS buckets
bucket = "wecc-historical-wx"
qaqcdir = "3_qaqc_wx/"
mergedir = "4_merge_wx/"

## Step 1: Read in all necessary information from log files

Read in test log file

https://pypi.org/project/aws-log-parser/



In [3]:
network = "ASOSAWOS"
log_id = "69007093217.05-01-2025"

key = "{}{}/qaqc_logs/qaqc_{}_{}.log".format(qaqcdir, network, network, log_id)

bucket_name = "wecc-historical-wx"

list_import = s3_cl.get_object(
    Bucket=bucket,
    Key=key,
)

In [None]:
log_file = list_import["Body"].read().decode() #convert to a single string, not what we want
# log_file = BytesIO(list_import["Body"].read())
# log_file

In [5]:
log_file

"2025-05-01 19:32:56,861 - INFO - Starting QAQC for station: ASOSAWOS_69007093217\n\n2025-05-01 19:32:56,862 - INFO - Reading file from AWS S3...\n2025-05-01 19:33:07,175 - INFO - Done reading. Ellapsed time: 10.32561731338501 s.\n\n2025-05-01 19:33:07,182 - INFO - Running QA/QC on: ASOSAWOS_69007093217\n\n2025-05-01 19:33:07,183 - INFO - 8 data variables will proceed through QA/QC.\n2025-05-01 19:33:07,267 - INFO - Existing observation and QC variables: ['tas', 'tdps', 'pr', 'sfcWind', 'sfcWind_dir', 'elevation', 'qaqc_process', 'ps_qc', 'ps_altimeter', 'ps_altimeter_qc', 'psl', 'psl_qc', 'tas_qc', 'tdps_qc', 'pr_qc', 'pr_duration', 'pr_depth_qc', 'sfcWind_qc', 'sfcWind_method', 'sfcWind_dir_qc']\n2025-05-01 19:33:07,270 - INFO - nans created for tas_eraqc\n2025-05-01 19:33:07,275 - INFO - nans created for tdps_eraqc\n2025-05-01 19:33:07,281 - INFO - nans created for pr_eraqc\n2025-05-01 19:33:07,286 - INFO - nans created for sfcWind_eraqc\n2025-05-01 19:33:07,291 - INFO - nans create

Print out lines containing the words "flag", "obs", and "total".

In [6]:
for line in log_file:
    if "flag" in line:
        print(line)

In [11]:
list_import

{'ResponseMetadata': {'RequestId': 'NKXRE5FQCQ1GA1FG',
  'HostId': 'Fk4lNhSHlEd3SoxsRkcg1x/DnVIb0bSBsjZ1tFt72AcA0rsmNhtfqTVJc9KzYfbvdR3rgb6I5xs=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Fk4lNhSHlEd3SoxsRkcg1x/DnVIb0bSBsjZ1tFt72AcA0rsmNhtfqTVJc9KzYfbvdR3rgb6I5xs=',
   'x-amz-request-id': 'NKXRE5FQCQ1GA1FG',
   'date': 'Thu, 08 May 2025 18:01:55 GMT',
   'last-modified': 'Thu, 01 May 2025 19:34:21 GMT',
   'etag': '"b09765a20e3073b2138cff65d53f4d84"',
   'x-amz-server-side-encryption': 'AES256',
   'accept-ranges': 'bytes',
   'content-type': 'binary/octet-stream',
   'content-length': '12571',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'AcceptRanges': 'bytes',
 'LastModified': datetime.datetime(2025, 5, 1, 19, 34, 21, tzinfo=tzutc()),
 'ContentLength': 12571,
 'ETag': '"b09765a20e3073b2138cff65d53f4d84"',
 'ContentType': 'binary/octet-stream',
 'ServerSideEncryption': 'AES256',
 'Metadata': {},
 'Body': <botocore.response.StreamingBody at 0x7ff7436cc370>}

In [None]:
lines = [line for line in log_file]

What information can we get from the figures?

In [None]:
flag_label = "{:.4f}% of data flagged".format(
    100 * len(df.loc[df[var + "_eraqc"] == flagval, var]) / len(df)
)