## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**578. Get Highest Answer Rate Question (Medium)**

**Table: SurveyLog**

| Column Name | Type |
|-------------|------|
| id          | int  |
| action      | ENUM |
| question_id | int  |
| answer_id   | int  |
| q_num       | int  |
| timestamp   | int  |

This table may contain duplicate rows.
action is an ENUM (category) of the type: "show", "answer", or "skip".
Each row of this table indicates the user with ID = id has taken an action with the question question_id at time timestamp.
If the action taken by the user is "answer", answer_id will contain the id of that answer, otherwise, it will be null.
q_num is the numeral order of the question in the current session.
 
The answer rate for a question is the number of times a user answered the question by the number of times a user showed the question.

**Write a solution to report the question that has the highest answer rate. If multiple questions have the same maximum answer rate, report the question with the smallest question_id.**

The result format is in the following example.

Example 1:

**Input:**
**SurveyLog table:**

| id | action | question_id | answer_id | q_num | timestamp |
|----|--------|-------------|-----------|-------|-----------|
| 5  | show   | 285         | null      | 1     | 123       |
| 5  | answer | 285         | 124124    | 1     | 124       |
| 5  | show   | 369         | null      | 2     | 125       |
| 5  | skip   | 369         | null      | 2     | 126       |

**Output:**
| survey_log |
|------------|
| 285        |

**Explanation:**
- Question 285 was showed 1 time and answered 1 time. The answer rate of question 285 is 1.0
- Question 369 was showed 1 time and was not answered. The answer rate of question 369 is 0.0
- Question 285 has the highest answer rate.

In [0]:
survey_log_data_578 = [
    (5, "show", 285, None, 1, 123),
    (5, "answer", 285, 124124, 1, 124),
    (5, "show", 369, None, 2, 125),
    (5, "skip", 369, None, 2, 126)
]

survey_log_columns_578 = ["id", "action", "question_id", "answer_id", "q_num", "timestamp"]
survey_log_df_578 = spark.createDataFrame(survey_log_data_578, survey_log_columns_578)
survey_log_df_578.show()


In [0]:
answer_counts_df_578 = survey_log_df_578\
                        .filter(col("action") == 'answer')\
                            .groupBy("question_id")\
                                .agg(count('action').alias("answer_count"))

show_counts_df_578 = survey_log_df_578\
                        .filter(col("action") == "show")\
                            .groupBy("question_id")\
                                .agg(count('action').alias("show_count"))

answer_rates_df_578 = show_counts_df_578\
                .join(answer_counts_df_578, show_counts_df_578.question_id == answer_counts_df_578.question_id, 'left')\
                    .withColumn("answer_rate", when(col("answer_count").isNull(),0.0)\
                                                .otherwise(col('answer_count')/col('show_count')))\
                    .select(show_counts_df_578["question_id"], "answer_rate")

window_spec = Window.orderBy(col("answer_rate").desc(), col("question_id").asc())
ranked_df_578 = answer_rates_df_578.withColumn("rank", rank().over(window_spec))

ranked_df_578\
    .filter(col("rank") == 1)\
        .select("question_id").show()