<pre>
Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date. 
Note that equal author_id and viewer_id indicate the same person.
 

Write a solution to find all the authors that viewed at least one of their own articles.

Return the result table sorted by id in ascending order.

The result format is in the following example.

 

Example 1:

Input: 
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+
Output: 
+------+
| id   |
+------+
| 4    |
| 7    |
+------+
</pre>

In [0]:
spark

In [0]:
# importing pyspark sql functions
from pyspark.sql.functions import *

# importing sql types from pyspark
from pyspark.sql.types import StructType, StructField, StringType, TimestampType, DoubleType, IntegerType, DateType

# importing SparkSession
from pyspark.sql import SparkSession


In [0]:
# creating spark session and providing app name
spark = SparkSession.builder.appName("leetcode-top-50-sql-solution-with-pyspark").getOrCreate()

In [0]:
# creating Schema
schema = StructType([
    StructField("article_id", IntegerType(), True),
    StructField("author_id", IntegerType(), True),
    StructField("viewer_id", IntegerType(), True),
    StructField("view_date", StringType(), True)
])

In [0]:
# Creating DataFrame for product data
views_df = spark.createDataFrame([
    (1, 3, 5, "2019-08-01"),
    (1, 3, 6, "2019-08-02"),
    (2, 7, 7, "2019-08-01"),
    (2, 7, 6, "2019-08-02"),
    (4, 7, 1, "2019-07-22"),
    (3, 4, 4, "2019-07-21"),
    (3, 4, 4, "2019-07-21")
], schema=schema)

views_df.withColumn("view_date", to_date("view_date", "yyyy-MM-dd"))


Out[35]: DataFrame[article_id: int, author_id: int, viewer_id: int, view_date: date]

In [0]:
views_df.display()

article_id,author_id,viewer_id,view_date
1,3,5,2019-08-01
1,3,6,2019-08-02
2,7,7,2019-08-01
2,7,6,2019-08-02
4,7,1,2019-07-22
3,4,4,2019-07-21
3,4,4,2019-07-21


In [0]:
# Leetcode Solution in Spark SQL

# Creating Temporary view for the product dataframe for sql queries
views_df.createOrReplaceTempView('views')
sql_result = spark.sql(
    '''
    SELECT DISTINCT
    author_id
    FROM views
    WHERE author_id = viewer_id
    ORDER BY author_id;
    '''
)

# Displaying Result
sql_result.display()

author_id
4
7


In [0]:
# Leet Code Solution in Data Frame
filter_result = views_df.select('author_id').filter(col('author_id') == col('viewer_id') ).distinct()
# Displaying the filtered Result
filter_result.display()

author_id
7
4
