Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
Browse files
Update main_cluster.py
Remove imp=0 uckeys before calculating global std and mean
  • Loading branch information
radibnia77 committed Jan 4, 2022
1 parent 6fa27db commit 3f574a4d33d95d685ea696c7b28bbb4ff082c0fb
Showing 1 changed file with 3 additions and 0 deletions.
@@ -181,6 +181,9 @@ def run(hive_context, cluster_size_cfg, input_table_name,
df = df.withColumn('imp', udf(lambda ts: sum(
[_ for _ in ts if _]), IntegerType())(df.ts))

# remove uckeys with 0 imp
df = df.filter('imp>0')

# add popularity = mean
df = df.withColumn('p', udf(lambda ts: sum(
[_ for _ in ts if _])/(1.0 * len(ts)), FloatType())(df.ts))

0 comments on commit 3f574a4

Please sign in to comment.