Skip to content

Latest commit

 

History

History
92 lines (74 loc) · 15.3 KB

cqube-data-replay-process.md

File metadata and controls

92 lines (74 loc) · 15.3 KB

cQube data replay process

The data replay process takes place based on the data source. Mentioned below are the steps that are involved in the data replay process:

● Admin will be provided with a screen to select the options to clear the data for each of the data sources. The admin screen will contain the following selection options:

  1. For student attendance, Teacher attendance the admin will be able to select the ‘year and month’ using the year & month drop down.
  2. For CRC and Diksha summary rollup the admin will have the calendar selection. The data will be deleted for the selected dates.
  3. For the Semester reports, admin will be able to select the required semester from the available semesters. The selected semester data will be deleted.
  4. For Periodic Assessment Test, admin selects the exam code option from the multiple select box which is having all the available exam codes. The complete data which is related to the selected exam code will be deleted.
  5. For Diksha TPD, Admin selects the Batch ID option from the select box which is having all the available Batch IDs. The complete data which is related to the selected Batch ID will be deleted.
  6. For UDISE & Infrastructure data sources, admin can delete overall data with the selection of ‘Yes or No’ option from the select box. Full refresh will happen with the new data.
  7. For the static data sources, admin can delete overall data with the selection of ‘Yes or No’ option from the select box. Full refresh will happen with the new data.

● A submit and Reset all buttons will be given in the admin screen to Submit the request and reset the options.
● When admin clicks on submit button, All the data sources will be created as JSON file as shown below

{"student_attendance": { 
"year":"2020",
"months":["01", "03"] 
}, 
"teacher_attendance": { 
"year":"2020",
"months":["01", "03"]
},
"crc": {
"year":"2020", 
"months":["01", "03"]
},"diksha_summary_rollup": { 
"from_date":"", 
"to_date":"" 
},
"semester": { 
"semester":[1,2] 
}, 
"periodic_assessment_test": { 
"exam_code":["PAT010101012021", "PAT010201012021"] 
}, 
"diksha_tpd": { 
"batch_id":["03052315462389", "046789546783"] 
}, 
"udise": { 
"selection":"yes/no" 
}, 
"Infrastructure": { 
"selection":"yes/no" 
}, 
"static": { 
"selection":"yes/no" 
}
}

● The JSON file containing the values selected by the admin will be placed in the S3 emission bucket.
● A scheduler will be created for the data replay process for all reports. And the scheduler will run based on the schedule defined by the admin.
● The scheduler will initiate the NIFI to get the file from S3 input bucket. NIFI performs the data deletion operation based on the inputs given by the admin (for all the data sources).

Data deletion process

Once the file is emitted to the S3 bucket, NIFI function will be invoked at the scheduled time and get the input parameters from the JSON file. The queries will be executed and delete the data from transaction tables. Once the workflow is run the output files will be updated according to the deleted data.

Data reprocessing (for previously deleted data) flow

Data reprocessing will take place in the normal cQube emission process.

● The latest data file will be emitted to S3 emission bucket

● The file will be processed as the regular data process from NIFI All the validations will be performed by NIFI and the validated data will be inserted into the transaction tables.

● All the metrics will be re-calculated and updates of the output files.

● The new metrics will be affected in the reports.

The complete workflow process will be like below.

Workflow Process

List of tables cleared for the data source

datasource parameter list of tables function call
student_attendance month,year student_attendance_meta,student_attendance_staging_1,student_attendance_staging_2,student_attendance_trans,school_student_total_attedance select del_data(p_data_source=>'student_attendance',p_year=>2022,VARIADIC p_month=>array[1,2]);
teacher_attendance month,year teacher_attendance_meta,teacher_attendance_staging_1,teacher_attendance_staging_1,teacher_attendance_temp,teacher_attendance_trans,school_teacher_total_attendance select del_data(p_data_source=>'teacher_attendance',p_year=>2022,VARIADIC p_month=>array[1,2]);
crc month,year crc_location_trans,crc_inspection_trans,crc_visits_frequency select del_data(p_data_source=>'crc',p_year=>2022,VARIADIC p_month=>array[1,2]);
semester_assessment_test exam_code/semester semester_exam_mst,semester_exam_result_staging_2,semester_exam_school_qst_result,semester_exam_result_temp,semester_exam_school_result,semester_exam_qst_mst,semester_exam_result_staging_1,semester_exam_result_trans select pat_del_data(p_data_source=>'periodic_assessment_test',VARIADIC p_exam_code=>array['PAT0302290720201','PAT0302290720202']);
periodic_assessment_test exam_code periodic_exam_mst,periodic_exam_result_staging_2,periodic_exam_school_qst_result,periodic_exam_result_temp,periodic_exam_school_result,periodic_exam_qst_mst,periodic_exam_result_staging_1,periodic_exam_result_trans select pat_del_data(p_data_source=>'periodic_assessment_test',VARIADIC p_exam_code=>array['PAT0302290720201','PAT0302290720202']);
diksha_tpd batch_id diksha_tpd_agg,diksha_tpd_trans,diksha_tpd_content_temp,diksha_tpd_staging select diksha_tpd_del_data(p_data_source=>'diksha_tpd',VARIADIC p_batch_id =>array['0302290720201','0302290720202']);
diksha_summary_rollup from_date,to_date diksha_content_staging,diksha_content_temp,diksha_content_trans,diksha_total_content select diksha_summary_rollup_del_data('diksh a_summary_rollup','2022-12-27','2022-1 2-31');
infrastructure all infrastructure_temp,infrastructure_trans select all_del_data('infrastructure');
static all block_tmp,block_mst,district_tmp,district_mst,cluster_tmp,cluster_mst,school_master,school_tmp,school_hierarchy_details,school_geo_master select all_del_data('static');
udise all udise_sch_incen_cwsn,udise_nsqf_plcmnt_c12 udise_sch_enr_reptr,udise_nsqf_basic_info,udise_sch_incentives,udise_nsqf_trng_prov,udise_sch_exmmarks_c10, udise_nsqf_class_cond,udise_school_metrics_trans,udise_sch_exmmarks_c12 udise_sch_pgi_details,udise_nsqf_enr_caste, udise_sch_enr_age,udise_sch_exmres_c10,udise_sch_profile,udise_nsqf_enr_sub_sec,udise_sch_enr_by_stream, udise_sch_exmres_c12,udise_sch_recp_exp,udise_nsqf_exmres_c10,udise_sch_enr_cwsn,udise_sch_exmres_c5,udise_sch_safety, udise_nsqf_exmres_c12,udise_sch_enr_fresh, udise_sch_exmres_c8,udise_sch_staff_posn,udise_nsqf_faculty,udise_sch_enr_medinstr, udise_sch_facility,udise_tch_profile,udise_nsqf_plcmnt_c10,udise_sch_enr_newadm select all_del_data('udise');