## Data Preprocessing: Encoding and Feature Extraction

### Encoding the Categorical and Ordered Features
After encoding the categorical and ordered features in the dataset using `OneHotEncoder` and `LabelEncoder`, we can now move forward with further data preprocessing.

### Next Steps:
1. **Visualizing Data**: 
   We can begin visualizing the data to understand the relationships between various features and the target variable (house price). Visualization will help in detecting any trends, correlations, or anomalies in the dataset.

2. **Dropping Irrelevant Columns**:
   Some columns may not contribute meaningfully to the model's prediction and can be dropped to reduce dimensionality and improve model performance.

3. **Extracting New Features**:
   For example: 
    We can create new columns from existing ones to enhance the model's prediction power. For example:
   - **Extracting Geographical Information from `PID`**: We will split the `PID` column into four new columns representing Township, Section, Block, and Parcel.
   - **Creating Interaction Features**: Based on domain knowledge, we may create interaction features like the ratio of square footage to the number of bathrooms or the combination of certain features.
    
4. **Handling Missing Data**:
   If any columns have missing values after the encoding and transformation steps, we can handle them by imputing or dropping rows/columns.


In [5]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [6]:
df = pd.read_csv("data/Ames_Housing_Clean.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Order,PID,MS SubClass,Lot Frontage,Lot Area,Lot Shape,Overall Qual,Overall Cond,Year Built,...,cat__Sale Type_Oth,cat__Sale Type_VWD,cat__Sale Type_WD,cat__Sale Condition_AdjLand,cat__Sale Condition_Alloca,cat__Sale Condition_Family,cat__Sale Condition_Normal,cat__Sale Condition_Partial,cat__Heating_GasW,cat__Heating_Other
0,0,1,526301100,20,141.0,31770,2,6,5,1960,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,1,2,526350040,20,80.0,11622,1,5,6,1961,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,2,3,526351010,20,81.0,14267,2,6,6,1958,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,3,4,526353030,20,93.0,11160,1,7,5,1968,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
4,4,5,527105010,60,74.0,13830,2,5,5,1997,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
