## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1126. Active Businesses (Medium)**

**Table: Events**

| Column Name   | Type    |
|---------------|---------|
| business_id   | int     |
| event_type    | varchar |
| occurrences   | int     | 

(business_id, event_type) is the primary key (combination of columns with unique values) of this table.
Each row in the table logs the info that an event of some type occurred at some business for a number of times.
The average activity for a particular event_type is the average occurrences across all companies that have this event.

An active business is a business that has more than one event_type such that their occurrences is strictly greater than the average activity for that event.

**Write a solution to find all active businesses.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Events table:**

| business_id | event_type | occurrences |
|-------------|------------|-------------|
| 1           | reviews    | 7           |
| 3           | reviews    | 3           |
| 1           | ads        | 11          |
| 2           | ads        | 7           |
| 3           | ads        | 6           |
| 1           | page views | 3           |
| 2           | page views | 12          |

**Output:** 
| business_id |
|-------------|
| 1           |

**Explanation:**  
The average activity for each event can be calculated as follows:
- 'reviews': (7+3)/2 = 5
- 'ads': (11+7+6)/3 = 8
- 'page views': (3+12)/2 = 7.5

The business with id=1 has 7 'reviews' events (more than 5) and 11 'ads' events (more than 8), so it is an active business.

In [0]:
events_data_1126 = [
    (1, "reviews", 7),
    (3, "reviews", 3),
    (1, "ads", 11),
    (2, "ads", 7),
    (3, "ads", 6),
    (1, "page views", 3),
    (2, "page views", 12),
]

events_columns_1126 = ["business_id", "event_type", "occurrences"]
events_df_1126 = spark.createDataFrame(events_data_1126, events_columns_1126)
events_df_1126.show()

+-----------+----------+-----------+
|business_id|event_type|occurrences|
+-----------+----------+-----------+
|          1|   reviews|          7|
|          3|   reviews|          3|
|          1|       ads|         11|
|          2|       ads|          7|
|          3|       ads|          6|
|          1|page views|          3|
|          2|page views|         12|
+-----------+----------+-----------+



In [0]:
event_avg_df_1126 = events_df_1126.groupBy("event_type").agg(avg("occurrences").alias("avg_occurrences"))

In [0]:
above_avg_df_1126 = events_df_1126\
                        .join(event_avg_df_1126, on="event_type", how="inner")\
                            .filter(col("occurrences") > col("avg_occurrences"))

In [0]:
above_avg_df_1126.groupBy("business_id") \
                            .agg(countDistinct("event_type").alias("above_avg_event_count")) \
                            .filter(col("above_avg_event_count") > 1) \
                            .select("business_id").show()

+-----------+
|business_id|
+-----------+
|          1|
+-----------+

