<div id="header">
    <p style="color:#6a66bd; text-align:center; font-weight:bold; font-family:verdana; font-size:20px;">Handling Mix Variables
    </p>
</div>

---

<div style="background-color:gainsboro; padding:8px; border:2px dotted black; border-radius:8px; font-family:verdana; line-height: 1.7em">
• Mixed variables in data refer to columns that contain different types of data within the same column. 
<br>
• For example, a column might have a combination of numerical values, categorical data and missing values. 
<br>
• Handling mixed variables effectively is crucial for data analysis and machine learning.
</div>

In [234]:
# Importing Libraries
import numpy as np
import pandas as pd

In [235]:
# Reading CSV File
df = pd.read_csv("titanic.csv")
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived
0,,A/5 21171,5,0
1,C85,PC 17599,3,1
2,,STON/O2. 3101282,6,1
3,C123,113803,3,1
4,,373450,A,0


<div style="background-color:gainsboro; padding:8px; border:2px dotted black; border-radius:8px; font-family:verdana; line-height: 1.7em">
• pd.to_numeric() is a useful function in Pandas for converting a Series to a numeric datatype. 
<br>
• This function is particularly helpful when you have mixed data types or when you need to ensure that a column is in a numeric format for calculations or analysis.
<br>
<br>
<strong>Sample Code</strong>

```python
pd.to_numeric(arg, errors='raise', downcast=None)
```
where:
<br>
• <strong>arg</strong> 
<br>
→ The input data (usually a Series) you want to convert to numeric.
<br>
• <strong>errors</strong>
<br>
→ 'raise': (default) raises an error if any value cannot be converted.
<br>
→ 'coerce': forces any non-convertible values to NaN.
<br>
→ 'ignore': returns the original data without any changes.
<br>
• <strong>downcast</strong>
<br>
→ Specify 'integer', 'float' or 'signed' to downcast the result to a smaller numeric type when possible.
</div>

In [236]:
# Extracting Numerical Part from No_of_Passengers Column
# By using pd.to_numeric() function
df["Passengers_No"] = pd.to_numeric(df["No_of_Passengers"], errors="coerce", downcast="integer")

In [237]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No
0,,A/5 21171,5,0,5.0
1,C85,PC 17599,3,1,3.0
2,,STON/O2. 3101282,6,1,6.0
3,C123,113803,3,1,3.0
4,,373450,A,0,


<div style="background-color:gainsboro; padding:8px; border:2px dotted black; border-radius:8px; font-family:verdana; line-height: 1.7em">
• np.where() is a versatile function in NumPy that allows you to return values from different arrays based on a specified condition. 
<br>
• It can be used for conditional selection, element-wise operations and creating new arrays.
<br>
<br>
<strong>Sample Code</strong>

```python
np.where(condition, x, y)
```
where:
<br>
• condition: An array-like structure that evaluates to True or False.
<br>
• x: Values from this array are returned where the condition is True.
<br>
• y: Values from this array are returned where the condition is False.
</div>

In [238]:
# Extracting Categorical Part from No_of_Passengers Column
# By using np.where() function
df["Passengers_Cat"] = np.where(df["Passengers_No"].isnull(), df["No_of_Passengers"], np.nan)

In [239]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No,Passengers_Cat
0,,A/5 21171,5,0,5.0,
1,C85,PC 17599,3,1,3.0,
2,,STON/O2. 3101282,6,1,6.0,
3,C123,113803,3,1,3.0,
4,,373450,A,0,,A


In [240]:
# Extracting Cabin Number from Cabin Column
# By selecting all Numbers after the 1st index value
df["Cabin_No"] = df["Cabin"].str[1:]

In [241]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No,Passengers_Cat,Cabin_No
0,,A/5 21171,5,0,5.0,,
1,C85,PC 17599,3,1,3.0,,85.0
2,,STON/O2. 3101282,6,1,6.0,,
3,C123,113803,3,1,3.0,,123.0
4,,373450,A,0,,A,


In [242]:
# Extracting Cabin Category from Cabin Column
# By selecting 1st index value
df["Cabin_Cat"] = df["Cabin"].str[0]

In [243]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No,Passengers_Cat,Cabin_No,Cabin_Cat
0,,A/5 21171,5,0,5.0,,,
1,C85,PC 17599,3,1,3.0,,85.0,C
2,,STON/O2. 3101282,6,1,6.0,,,
3,C123,113803,3,1,3.0,,123.0,C
4,,373450,A,0,,A,,


In [250]:
# Splitting Ticket Column and selecting last index value
df["Ticket_No"] = df["Ticket"].str.split().str[-1]

In [251]:
# Removing Texts from Ticket_No column 
# By using pd.to_numeric() function 
df["Ticket_No"] = pd.to_numeric(df["Ticket_No"], errors="coerce", downcast="integer")

In [252]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No,Passengers_Cat,Cabin_No,Cabin_Cat,Ticket_No,Ticket_Cat
0,,A/5 21171,5,0,5.0,,,,21171.0,A/5
1,C85,PC 17599,3,1,3.0,,85.0,C,17599.0,PC
2,,STON/O2. 3101282,6,1,6.0,,,,3101282.0,STON/O2.
3,C123,113803,3,1,3.0,,123.0,C,113803.0,
4,,373450,A,0,,A,,,373450.0,


In [254]:
# Splitting Ticket Column and selecting first index value
df["Ticket_Cat"] = df["Ticket"].str.split().str[0]

In [255]:
# Extracting only Categorical Part from Ticket Cat column
# By using np.where() function 
df["Ticket_Cat"] = np.where(df["Ticket_Cat"].str.isdigit(), np.nan, df["Ticket_Cat"])

In [257]:
# Sample of Data
df.head(5)

Unnamed: 0,Cabin,Ticket,No_of_Passengers,Survived,Passengers_No,Passengers_Cat,Cabin_No,Cabin_Cat,Ticket_No,Ticket_Cat
0,,A/5 21171,5,0,5.0,,,,21171.0,A/5
1,C85,PC 17599,3,1,3.0,,85.0,C,17599.0,PC
2,,STON/O2. 3101282,6,1,6.0,,,,3101282.0,STON/O2.
3,C123,113803,3,1,3.0,,123.0,C,113803.0,
4,,373450,A,0,,A,,,373450.0,
