# First Letter Capitalization II using Pandas

When working with text data, ensuring consistent formatting is crucial for readability and professionalism. A common requirement in databases and content management systems is to capitalize each word correctly while maintaining special formatting rules for hyphenated words.

In this blog, we'll tackle a practical Pandas text transformation problem, where we need to:
- Convert the first letter of each word to uppercase while keeping the rest in lowercase.
- Properly handle hyphenated words by capitalizing both parts (e.g., "modern-day" → "Modern-Day").
- Preserve the original structure and spacing of the content.

To achieve this, we'll explore Python-based solutions using Pandas to efficiently process text stored in a table. By the end of this post, you'll have a clean and structured approach to transforming text while ensuring accurate formatting.

Let’s dive in!

**Table: user_content**

| Column Name  | Type    |
|--------------|---------|
| content_id   | int     |
| content_text | varchar |

content_id is the unique key for this table.
Each row contains a unique ID and the corresponding text content.
Write a solution to transform the text in the content_text column by applying the following rules:

Convert the first letter of each word to uppercase and the remaining letters to lowercase
Special handling for words containing special characters:
For words connected with a hyphen -, both parts should be capitalized (e.g., top-rated → Top-Rated)
All other formatting and spacing should remain unchanged
Return the result table that includes both the original content_text and the modified text following the above rules.

The result format is in the following example.

Input:
**user_content table:**
| content_id | content_text                          |
|------------|---------------------------------------|
| 1          | hello world of SQL                    |
| 2          | the QUICK-brown fox                   |
| 3          | modern-day DATA science               |
| 4          | web-based FRONT-end development       |

**Transformed User Content**

The table below presents the original text content along with the transformed version, where:
- Each word's first letter is capitalized while the rest are in lowercase.
- Words connected by a hyphen are capitalized on both parts.

| content_id | Original Text                          | Converted Text                        |
|------------|---------------------------------------|----------------------------------------|
| 1          | hello world of SQL                    | Hello World Of Sql                    |
| 2          | the QUICK-brown fox                   | The Quick-Brown Fox                   |
| 3          | modern-day DATA science               | Modern-Day Data Science               |
| 4          | web-based FRONT-end development       | Web-Based Front-End Development       |

### Explanation:

- **Content ID 1**:
  - "hello world of SQL" → "Hello World Of Sql"
- **Content ID 2**:
  - "the QUICK-brown fox" → "The Quick-Brown Fox"
  - The hyphenated word "QUICK-brown" is converted to "Quick-Brown".
- **Content ID 3**:
  - "modern-day DATA science" → "Modern-Day Data Science"
  - "modern-day" maintains proper casing: "Modern-Day".
  - "DATA" is converted to "Data".
- **Content ID 4**:
  - "web-based FRONT-end development" → "Web-Based Front-End Development"
  - Both hyphenated words "web-based" and "FRONT-end" follow proper casing.

The transformation ensures consistent formatting while preserving hyphenated word structures.



In [4]:
import pandas as pd

data = [[1, 'hello world of SQL'],
        [2, 'the QUICK-brown fox'],
        [3, 'modern-day DATA science'],
        [4, 'web-based FRONT-end development']]

user_content = pd.DataFrame(data,
                            columns=['content_id',
                                     'content_text']).astype(
                            {'content_id':'Int64',
                             'content_text': str})
display(user_content)

Unnamed: 0,content_id,content_text
0,1,hello world of SQL
1,2,the QUICK-brown fox
2,3,modern-day DATA science
3,4,web-based FRONT-end development


**Step 1: Applying Transformation to content_text**
- This line creates a new column called converted_text in the user_content DataFrame.
- It applies a function to every value in the content_text column using .apply().
- The function lambda s: s.title() takes each string s and applies the .title() method.
- .title() converts each word's first letter to uppercase and the rest to lowercase.
- Example: "hello world" → "Hello World"
- However, this does not correctly handle hyphenated words like "front-end", which would become "Front-End" (which is correct in this case, but .title() may not always handle special cases properly).

In [5]:
user_content["converted_text"] = user_content["content_text"].apply(lambda s: s.title())
display(user_content)

Unnamed: 0,content_id,content_text,converted_text
0,1,hello world of SQL,Hello World Of Sql
1,2,the QUICK-brown fox,The Quick-Brown Fox
2,3,modern-day DATA science,Modern-Day Data Science
3,4,web-based FRONT-end development,Web-Based Front-End Development


**Step 2: Renaming content_text to original_text**
- This line renames the column content_text to original_text for better readability.
- .rename(columns={...}) is a method used to change column names in a DataFrame.
- inplace=True means that the change is applied directly to user_content without needing to reassign it.

In [6]:
user_content.rename(columns={"content_text":"original_text"}, inplace=True)
display(user_content)

Unnamed: 0,content_id,original_text,converted_text
0,1,hello world of SQL,Hello World Of Sql
1,2,the QUICK-brown fox,The Quick-Brown Fox
2,3,modern-day DATA science,Modern-Day Data Science
3,4,web-based FRONT-end development,Web-Based Front-End Development


Reference: [1] https://leetcode.com/problems/first-letter-capitalization-ii/description/