## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3172. Second Day Verification (Easy)**

**Table: emails**

| Column Name | Type     | 
|-------------|----------|
| email_id    | int      |
| user_id     | int      |
| signup_date | datetime |

(email_id, user_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the email ID, user ID, and signup date.

**Table: texts**

| Column Name   | Type     | 
|---------------|----------|
| text_id       | int      |
| email_id      | int      |
| signup_action | enum     |
| action_date   | datetime |

(text_id, email_id) is the primary key (combination of columns with unique values) for this table. 
signup_action is an enum type of ('Verified', 'Not Verified'). 
Each row of this table contains the text ID, email ID, signup action, and action date.

**Write a Solution to find the user IDs of those who verified their sign-up on the second day.**

Return the result table ordered by user_id in ascending order.

The result format is in the following example.

**Example:**

**Input:**

**emails table:**

| email_id | user_id | signup_date         |
|----------|---------|---------------------|
| 125      | 7771    | 2022-06-14 09:30:00|
| 433      | 1052    | 2022-07-09 08:15:00|
| 234      | 7005    | 2022-08-20 10:00:00|

**texts table:**

| text_id | email_id | signup_action| action_date         |
|---------|----------|--------------|---------------------|
| 1       | 125      | Verified     | 2022-06-15 08:30:00|
| 2       | 433      | Not Verified | 2022-07-10 10:45:00|
| 4       | 234      | Verified     | 2022-08-21 09:30:00|
    
**Output:**

| user_id |
|---------|
| 7005    |
| 7771    |

**Explanation:**
- User with user_id 7005 and email_id 234 signed up on 2022-08-20 10:00:00 and verified on second day of the signup.
- User with user_id 7771 and email_id 125 signed up on 2022-06-14 09:30:00 and verified on second day of the signup.

In [0]:
emails_data_3172 = [
    (125, 7771, "2022-06-14 09:30:00"),
    (433, 1052, "2022-07-09 08:15:00"),
    (234, 7005, "2022-08-20 10:00:00")
]

emails_columns_3172 = ["email_id", "user_id", "signup_date"]
emails_df_3172 = spark.createDataFrame(emails_data_3172, emails_columns_3172)
emails_df_3172.show()

texts_data_3172 = [
    (1, 125, "Verified", "2022-06-15 08:30:00"),
    (2, 433, "Not Verified", "2022-07-10 10:45:00"),
    (4, 234, "Verified", "2022-08-21 09:30:00")
]

texts_columns_3172 = ["text_id", "email_id", "signup_action", "action_date"]
texts_df_3172 = spark.createDataFrame(texts_data_3172, texts_columns_3172)
texts_df_3172.show()

+--------+-------+-------------------+
|email_id|user_id|        signup_date|
+--------+-------+-------------------+
|     125|   7771|2022-06-14 09:30:00|
|     433|   1052|2022-07-09 08:15:00|
|     234|   7005|2022-08-20 10:00:00|
+--------+-------+-------------------+

+-------+--------+-------------+-------------------+
|text_id|email_id|signup_action|        action_date|
+-------+--------+-------------+-------------------+
|      1|     125|     Verified|2022-06-15 08:30:00|
|      2|     433| Not Verified|2022-07-10 10:45:00|
|      4|     234|     Verified|2022-08-21 09:30:00|
+-------+--------+-------------+-------------------+



In [0]:
emails_df_3172 = emails_df_3172\
                    .withColumn("signup_date", col("signup_date").cast("timestamp"))

In [0]:
texts_df_3172 = texts_df_3172\
                    .withColumn("action_date", col("action_date").cast("timestamp"))

In [0]:
df_3172 = emails_df_3172\
                .join(texts_df_3172, on="email_id", how="inner")\
                    .filter(col("signup_action") == "Verified")

In [0]:
df_3172\
    .withColumn( "days_diff", datediff(col("action_date"), col("signup_date")))\
        .filter(col("days_diff") == 1)\
            .select("user_id").distinct().orderBy("user_id").display()

user_id
7005
7771
