## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1951. All the Pairs With the Maximum Number of Common Followers (Medium)**

**Table: Relations**

| Column Name | Type |
|-------------|------|
| user_id     | int  |
| follower_id | int  |

(user_id, follower_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that the user with ID follower_id is following the user with ID user_id.
 
**Write a solution to find all the pairs of users with the maximum number of common followers. In other words, if the maximum number of common followers between any two users is maxCommon, then you have to return all pairs of users that have maxCommon common followers.**

The result table should contain the pairs user1_id and user2_id where user1_id < user2_id.

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Relations table:**

| user_id | follower_id |
|---------|-------------|
| 1       | 3           |
| 2       | 3           |
| 7       | 3           |
| 1       | 4           |
| 2       | 4           |
| 7       | 4           |
| 1       | 5           |
| 2       | 6           |
| 7       | 5           |

**Output:** 
| user1_id | user2_id |
|----------|----------|
| 1        | 7        |

**Explanation:** 
- Users 1 and 2 have two common followers (3 and 4).
- Users 1 and 7 have three common followers (3, 4, and 5).
- Users 2 and 7 have two common followers (3 and 4).

Since the maximum number of common followers between any two users is 3, we return all pairs of users with three common followers, which is only the pair (1, 7). We return the pair as (1, 7), not as (7, 1).

**Note** that we do not have any information about the users that follow users 3, 4, and 5, so we consider them to have 0 followers.

In [0]:
relations_data_1951 = [
    (1, 3), (2, 3), (7, 3), (1, 4), (2, 4),
    (7, 4), (1, 5), (2, 6), (7, 5)
]

relations_columns_1951 = ["user_id", "follower_id"]
relations_df_1951 = spark.createDataFrame(relations_data_1951, relations_columns_1951)
relations_df_1951.show()

+-------+-----------+
|user_id|follower_id|
+-------+-----------+
|      1|          3|
|      2|          3|
|      7|          3|
|      1|          4|
|      2|          4|
|      7|          4|
|      1|          5|
|      2|          6|
|      7|          5|
+-------+-----------+



In [0]:
pairs_df_1951 = relations_df_1951.alias("a")\
                    .join(relations_df_1951.alias("b"), col("a.follower_id") == col("b.follower_id"))\
                        .filter(col("a.user_id") < col("b.user_id"))\
                            .select(
                                col("a.user_id").alias("user1_id"),
                                col("b.user_id").alias("user2_id"),
                                col("a.follower_id").alias("common_follower")
                                )

In [0]:
common_counts_df_1951 = pairs_df_1951\
                            .groupBy("user1_id", "user2_id")\
                                .agg(countDistinct("common_follower").alias("common_followers")
                                    )

In [0]:
max_common_df_1951 = common_counts_df_1951\
                        .agg(max("common_followers").alias("max_common")).collect()[0]["max_common"]


In [0]:
common_counts_df_1951\
    .filter(col("common_followers") == max_common_df_1951)\
        .select("user1_id", "user2_id").show()

+--------+--------+
|user1_id|user2_id|
+--------+--------+
|       1|       7|
+--------+--------+

