

# üìå Feature Construction

---

## üîπ Definition
**Feature Construction** is a feature engineering technique where **new features are created from existing raw features** to better represent underlying patterns in the data and improve model performance.

> In simple words:  
> **Feature construction means creating new, meaningful columns from existing columns.**

---

## üîπ Why Feature Construction is Needed
- Raw features may not capture important relationships
- Helps models learn complex patterns
- Improves accuracy and interpretability

---

## üîπ How Feature Construction Works
New features are created using:
- Mathematical operations
- Domain knowledge
- Logical combinations
- Aggregations and interactions

---

## üîπ Common Feature Construction Techniques

### 1Ô∏è‚É£ Mathematical Features
Create new features using arithmetic operations.

$$
\text{Area} = \text{Length} \times \text{Width}
$$

---

### 2Ô∏è‚É£ Ratio-Based Features
Useful when relative values matter.

$$
\text{Price per unit} = \frac{\text{Total Price}}{\text{Quantity}}
$$

---

### 3Ô∏è‚É£ Date & Time Features
Extract useful information from date columns.

**Examples:**
- Day
- Month
- Year
- Weekday

---

### 4Ô∏è‚É£ Interaction Features
Combine multiple features to capture interactions.

$$
\text{Total Spend} = \text{Quantity} \times \text{Unit Price}
$$

---

### 5Ô∏è‚É£ Aggregation Features
Summarize data over groups.

**Examples:**
- Average purchase per customer
- Total transactions per user

---

### 6Ô∏è‚É£ Encoding-Based Features
Create numerical features from categorical data.

**Examples:**
- One-Hot Encoding
- Frequency Encoding
- Target Encoding

---

## üîπ Example

### Original Features

| Height | Weight |
|--------|--------|
| 170 | 65 |
| 160 | 70 |

### Constructed Feature

$$
\text{BMI} = \frac{\text{Weight}}{(\text{Height}/100)^2}
$$

| Height | Weight | BMI |
|--------|--------|-----|
| 170 | 65 | 22.5 |
| 160 | 70 | 27.3 |

---

## üîπ Feature Construction vs Feature Transformation

| Feature Construction | Feature Transformation |
|---------------------|-----------------------|
| Creates new features | Modifies existing features |
| Adds new columns | Changes feature values |
| Uses domain knowledge | Uses mathematical scaling |

---

## üîπ Advantages
- Improves model performance
- Captures hidden relationships
- Makes data more informative

---

## üîπ Disadvantages
- Can increase dimensionality
- Requires domain expertise
- Risk of overfitting

---

## üîπ One-Line Exam Answer
**Feature construction is the process of creating new features from existing data to enhance the predictive power of machine learning models.**



In [27]:
import numpy as np
import pandas as pd


from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LogisticRegression

import seaborn as sns


In [28]:
df = pd.read_csv(r'C:\Users\Lenovo\Krishnaraj singh\Code\newml\Documents!.0\train.csv')[['Age','Pclass','SibSp','Parch','Survived']]

In [29]:
df.head()

Unnamed: 0,Age,Pclass,SibSp,Parch,Survived
0,22.0,3,1,0,0
1,38.0,1,1,0,1
2,26.0,3,0,0,1
3,35.0,1,1,0,1
4,35.0,3,0,0,0


In [30]:
df.dropna(inplace=True)

In [31]:
x = df.iloc[:,0:4]
y = df.iloc[:,-1]

In [32]:
# cross value scroe

np.mean(cross_val_score(LogisticRegression(),x,y,scoring='accuracy',cv=20))*100

np.float64(69.33333333333333)

- __Apply feature construction__

In [33]:
x['Family_size'] = x['Parch'] + x['SibSp'] + 1

In [34]:
x.head()

Unnamed: 0,Age,Pclass,SibSp,Parch,Family_size
0,22.0,3,1,0,2
1,38.0,1,1,0,2
2,26.0,3,0,0,1
3,35.0,1,1,0,2
4,35.0,3,0,0,1


In [35]:
def fun(n):
    if n == 1:
        return 0
    elif n > 1 and n < 4 :
        return 1
    else:
        return 2

In [36]:
fun(5)

2

In [37]:
x['family_type'] = x['Family_size'].apply(fun)

In [38]:
x.head()

Unnamed: 0,Age,Pclass,SibSp,Parch,Family_size,family_type
0,22.0,3,1,0,2,1
1,38.0,1,1,0,2,1
2,26.0,3,0,0,1,0
3,35.0,1,1,0,2,1
4,35.0,3,0,0,1,0


In [39]:
x.drop(columns=['SibSp','Parch','Family_size'],inplace=True)

In [41]:
x.head()

Unnamed: 0,Age,Pclass,family_type
0,22.0,3,1
1,38.0,1,1
2,26.0,3,0
3,35.0,1,1
4,35.0,3,0


In [42]:
# cross value scroe

np.mean(cross_val_score(LogisticRegression(),x,y,scoring='accuracy',cv=20))*100

np.float64(70.7420634920635)

- __Feature Splitting__

In [44]:
df = pd.read_csv(r'C:\Users\Lenovo\Krishnaraj singh\Code\newml\Documents!.0\train.csv')

In [45]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [47]:
df['Name']
# Name k under phle sirname or fie salutation and then the name of person so we will remove the salutation from the name

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [48]:
df['Name'].describe()

count                     891
unique                    891
top       Dooley, Mr. Patrick
freq                        1
Name: Name, dtype: object

In [49]:
df['Name'].str.split(', ')

0                              [Braund, Mr. Owen Harris]
1      [Cumings, Mrs. John Bradley (Florence Briggs T...
2                               [Heikkinen, Miss. Laina]
3         [Futrelle, Mrs. Jacques Heath (Lily May Peel)]
4                             [Allen, Mr. William Henry]
                             ...                        
886                              [Montvila, Rev. Juozas]
887                       [Graham, Miss. Margaret Edith]
888           [Johnston, Miss. Catherine Helen "Carrie"]
889                              [Behr, Mr. Karl Howell]
890                                [Dooley, Mr. Patrick]
Name: Name, Length: 891, dtype: object