Skip to content

[Data Product] Content Rating Updater

Sowmya N Dixit edited this page Jun 29, 2022 · 1 revision

Summary

  • Type - Content rating updater- update the content model with average rating
  • Computation Level - Level 1
  • Frequency - Runs Daily

Purpose

  • Generating Graph Update Event with average rating of a Content from consumption data and pushing to ES which is used to create dashboards.

Inputs

  • Raw Telemetry: - FEEDBACK Event
  • Previous content rating summary from DB

Outputs

  1. Update content_rating_summary table in content_db
#Schema of data model
{
    "period": String, // Data sync date in YYYY-MM-DD format. For ex: 2019-05-04
    "content_id": String, // content id
    "content_type": String, // content type
    "total_rating": Double, // Sum of ratings for a content for the period
    "total_count": Long, // Number of times the content has been rated.
    "avg_rating": Double // Average rating on content
}
#Schema of table

TABLE content_rating_summary (
    period text,
    content_id text,
    content_type text,
    total_rating double,
    total_count bigint,
    avg_rating double,
    PRIMARY KEY (content_id, period)
);

2 Generate Graph Update Event and push to Kafka topic learning.graph.events.

#Schema of Graph Update Event
{
    "ets" : Long, // Event generation time in epoch
    "nodeUniqueId" : String, // content id
    "operationType": String, // default to UPDATE
    "nodeType": String, // default to DATA_NODE
    "graphId": String, // default to domain
    "objectType": String, // object type - Resolve object type from `content_type` field
    "nodeGraphId": Int, // default to 0
    "transactionData" : {
        "properties" : {
            "me_averageRating" : {
                "ov" : Double,
                "nv" : Double
            }
        }
    }
}

Algorithm Design

1. Update content_rating_summary table in content_db

Computation Table:

  • Filter FEEDBACK events and group by content_id
Field Computation Remark
content_id object.id value
content_type object.type value
period Get the sync date in YYYY-MM-DD format Period is added to avoid replay complexity. If replay is done for last 2 days, only those 2 records for each content will be updated and final average computation for graph event will be recomputed
total_rating Sum of edata.rating
total_count Count of FEEDBACK events
avg_rating total_rating/total_count

2. Generate Graph Update Event

  • Get the list of unique content_ids from FEEDBACK events.
  • Get all the entries in Cassandra table for that content.
  • Compute Sum(total_rating), Sum(total_count) from Cassandra data.
  • Compute average_rating as Sum(total_rating)/Sum(total_count) and generate a Graph Update Event for each content.