## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3374. First Letter Capitalization II (Hard)**

**Table: user_content**

| Column Name | Type    |
|-------------|---------|
| content_id  | int     |
| content_text| varchar |

content_id is the unique key for this table.
Each row contains a unique ID and the corresponding text content.

**Write a solution to transform the text in the content_text column by applying the following rules:**
- Convert the first letter of each word to uppercase and the remaining letters to lowercase
- Special handling for words containing special characters:
  - For words connected with a hyphen -, both parts should be capitalized (e.g., top-rated → Top-Rated)
- All other formatting and spacing should remain unchanged

Return the result table that includes both the original content_text and the modified text following the above rules.

The result format is in the following example.

**Example:**

**Input:**

**user_content table:**

| content_id | content_text                    |
|------------|---------------------------------|
| 1          | hello world of SQL              |
| 2          | the QUICK-brown fox             |
| 3          | modern-day DATA science         |
| 4          | web-based FRONT-end development |

**Output:**
| content_id | original_text                   | converted_text                  |
|------------|---------------------------------|---------------------------------|
| 1          | hello world of SQL              | Hello World Of Sql              |
| 2          | the QUICK-brown fox             | The Quick-Brown Fox             |
| 3          | modern-day DATA science         | Modern-Day Data Science         |
| 4          | web-based FRONT-end development | Web-Based Front-End Development |

**Explanation:**
- For content_id = 1:
  - Each word's first letter is capitalized: "Hello World Of Sql"
- For content_id = 2:
  - Contains the hyphenated word "QUICK-brown" which becomes "Quick-Brown"
  - Other words follow normal capitalization rules
- For content_id = 3:
  - Hyphenated word "modern-day" becomes "Modern-Day"
  - "DATA" is converted to "Data"
- For content_id = 4:
  - Contains two hyphenated words: "web-based" → "Web-Based"
  - And "FRONT-end" → "Front-End"

In [0]:
user_content_data_3374 = [
    (1, "hello world of SQL"),
    (2, "the QUICK-brown fox"),
    (3, "modern-day DATA science"),
    (4, "web-based FRONT-end development")
]

user_content_columns_3374 = ["content_id", "content_text"]
user_content_df_3374 = spark.createDataFrame(user_content_data_3374, user_content_columns_3374)
user_content_df_3374.show()

+----------+--------------------+
|content_id|        content_text|
+----------+--------------------+
|         1|  hello world of SQL|
|         2| the QUICK-brown fox|
|         3|modern-day DATA s...|
|         4|web-based FRONT-e...|
+----------+--------------------+



In [0]:
def capitalize_text(text):
    def cap_word(word):
        if '-' in word:
            return '-'.join([w.capitalize() for w in word.split('-')])
        return word.capitalize()
    
    return ' '.join([cap_word(w) for w in text.split(' ')])

In [0]:
capitalize_udf = udf(capitalize_text, StringType())

In [0]:
user_content_df_3374\
    .withColumn("original_text", col("content_text"))\
        .withColumn("converted_text", capitalize_udf(col("content_text")))\
            .select("content_id", "original_text", "converted_text").display()

content_id,original_text,converted_text
1,hello world of SQL,Hello World Of Sql
2,the QUICK-brown fox,The Quick-Brown Fox
3,modern-day DATA science,Modern-Day Data Science
4,web-based FRONT-end development,Web-Based Front-End Development
