# MRP Outputs Review

## What this notebook does

This notebook has been set up to read in csvs, jpegs, and html files which have been sent to the review bucket, so that they can be checked for quality and disclosure. All items which are requested for export first needs to be Disclosure controlled by a seperate DisCO (Disclosure Control Officer), this cannot be the same person who ran the data and requested the export. Once disclosure controlled a Data Journey manager can be notified to move the data ready for export.

## Setup

In [None]:
## Authentification for GCS
options("googleAuthR.httr_oauth_cache" = "gce.oauth")
googleAuthR::gar_gce_auth()

In [None]:
# Details of data storage
REVIEW_BUCKET = "ons-psplus-analysis-prod-cis-review" # this is the data that has been reviewed will be located

In [None]:
devtools::install("../gcptools", upgrade = FALSE)
library(googleCloudStorageR)
library(readr)
# This ensures that people cannot commit notebooks containing evalutated outputs. [IMPORTANT SECURITY FEATURE DO NOT REMOVE] 
gcptools::commit_hooks_setup("/home/jupyter/CIS_MRP")

In [None]:
## default gives a warning about missing column name.
## custom parse function to suppress warning
f <- function(object){
  suppressWarnings(httr::content(object, encoding = "UTF-8"))
}

## Get csv files from the review bucket

<FONT COLOR="RED"> **INSTRUCTION:**</FONT> Select the csv file you want to review by replacing the string below with a filepath to the csv

In [None]:
object_name <- "20221121_mrp/probs_over_time_mrp_20221115_DTS221122_1411UTC.csv" #or "<yyyymmdd>_mrp_rerun/<filename>.csv"

Get the data with custom parse function

In [None]:
data = gcs_get_object(object_name, bucket=REVIEW_BUCKET, parseFunction = f)

<FONT COLOR="RED"> **INSTRUCTION:**</FONT> Check the contents of your csv by using the head, tail, or summary functions

In [None]:
head(data) #or head(data), summary(data) etc

## Download QA report

If 'docs' folder does not exist in the local directory, then create it. Either way, set the working directory to the docs folder to save outputs

In [1]:
main_directory <- "/home/jupyter"
sub_directory <- "QA_reports"

if (file.exists(sub_directory)){
    setwd(file.path(main_directory, sub_directory))
} else {
    dir.create(file.path(main_directory, sub_directory))
    setwd(file.path(main_directory, sub_directory))   
}


<FONT COLOR="RED"> **INSTRUCTION:**</FONT> Replace the string in the 'file' argument to select the correct file from the review bucket

In [5]:
gcptools::download_qa_report_to_notebook(file = "20230306_mrp/MRP_QA_Northern_Ireland_Datarun20230306_Co20230228_PrevCo20230221.html", 
                                        bucket = "ons-psplus-analysis-prod-cis-review" ) #or "<yyyymmdd>_mrp_rerun/<filename>.html"

[36mℹ[39m Downloading 20230306_mrp/MRP_QA_Northern_Ireland_Datarun20230306_Co20230228_P…

[32m✔[39m Downloaded and parsed 20230306_mrp/MRP_QA_Northern_Ireland_Datarun20230306_Co…





<FONT COLOR="RED"> **INSTRUCTION:**</FONT> Locate the QA_reports folder at the repo level of folders (i.e. where you would go to navigate to other repo's) to review the report