In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_excel('Cleaned_Bangalore_Rental_House_data.xlsx')
df_copy = df.copy()

In [3]:
df.loc[df['Balcony'] == 0 , 'Balcony'] = '0'

In [4]:
df.loc[df['Age'] == '10+' , 'Age'] = 11
df['Age'] = df['Age'].astype(float)

In [5]:
df['Age'].value_counts()

Age
5.0     2529
1.0     2233
10.0    1426
11.0     448
0.0      104
Name: count, dtype: int64

In [6]:
df = df.dropna(subset=['Age']).reset_index(drop=True)

In [7]:
def categorize_age(value):
    if pd.isna(value):
        return "Undefined"
    if value == 0:
        return "Under Construction"
    if 1 <= value <= 5:
        return "New Property"
    if 6 <= value <= 10:
        return "Moderately Old"
    if value > 10:
        return "Old Property"
    else:
        return "Undefined"

df['Age'] = df['Age'].apply(categorize_age)

In [8]:
df['Age'].value_counts()

Age
New Property          4762
Moderately Old        1426
Old Property           448
Under Construction     104
Name: count, dtype: int64

In [9]:
train_df = df.drop(columns=['Property_ID' , 'Address' , 'Rating' , 'Locality' , 'Floor_For_Rent' , 'Total_Parking'])

1. **Copying the Dataset**:  
   `data_label_encoded = train_df.copy()` creates a copy of the original DataFrame to avoid modifying it directly.

2. **Identifying Categorical Columns**:  
   `categorical_cols = train_df.select_dtypes(include=['object']).columns` selects all columns with categorical (object) data types.

3. **Applying Label Encoding**:  
   The loop iterates over each categorical column and applies **`OrdinalEncoder`** to convert categories into numerical values. The transformed data is assigned back to the column.  
   `print(oe.categories_)` displays the unique categories for each column after encoding.

4. **Splitting Features and Target**:  
   - `X_label` contains all columns except the target variable `'Rent'`.  
   - `y_label` contains the target variable `'Rent'`.

In [10]:
from sklearn.preprocessing import OrdinalEncoder

data_label_encoded = train_df.copy()

categorical_cols = train_df.select_dtypes(include=['object']).columns

for col in categorical_cols:
    oe = OrdinalEncoder()
    data_label_encoded[col] = oe.fit_transform(data_label_encoded[[col]])
    print(oe.categories_)
    
X_label = data_label_encoded.drop('Rent', axis=1)
y_label = data_label_encoded['Rent']

[array(['Bangalore Central', 'Bangalore East', 'Bangalore North',
       'Bangalore South', 'Bangalore West'], dtype=object)]
[array(['0', '1', '2', '3', '3+'], dtype=object)]
[array(['East', 'North', 'North-East', 'North-West', 'Not Specified',
       'South', 'South-East', 'South-West', 'West'], dtype=object)]
[array(['Furnished', 'Semifurnished', 'Unfurnished'], dtype=object)]
[array(['Moderately Old', 'New Property', 'Old Property',
       'Under Construction'], dtype=object)]
[array(['Full', 'No Backup', 'Partial'], dtype=object)]
[array(['No', 'Yes'], dtype=object)]
[array(['Charges included', 'Charges not included'], dtype=object)]
[array(['Apartment', 'Builder Floor', 'House/Villa', 'Serviced Apartment',
       'Studio'], dtype=object)]
[array(['Dealer', 'Owner'], dtype=object)]


### **Insights from Rent Correlation Data**

#### **1. Strong Positive Correlations**  
These features strongly influence rent—higher values in these attributes are typically linked with higher rents:  
- **Area (sq.ft)** *(0.86)*: Larger properties tend to command higher rent.  
- **Deposit** *(0.81)*: Higher security deposits often align with higher rental values.  
- **Brokerage** *(0.73)*: Higher brokerage fees are commonly associated with higher rents.  
- **Bathroom** *(0.69)*: More bathrooms could indicate a more spacious or luxurious property, increasing rent.  
- **Bedroom** *(0.63)*: More bedrooms usually mean larger properties with higher rents.

---

#### **2. Moderate Positive Correlations**  
These features moderately impact rental prices:  
- **Covered Parking** *(0.45)*: Availability of covered parking contributes to higher rent.  
- **Balcony** *(0.43)*: Properties with balconies tend to have moderately higher rents.  
- **Maintenance** *(0.41)*: Higher maintenance charges might reflect better amenities, correlating with higher rent.  
- **Additional Rooms** *(0.31)*: Extra rooms (like study or storage) add to rental value.  
- **Total Floors** *(0.27)*: Higher floor properties might offer better views or modern construction, influencing rent.  
- **Pet-Friendly** *(0.21)*: Pet-friendly homes may attract higher rents due to niche demand.  
- **Charges** *(0.19)*: Additional charges can be a contributing factor to rent.  
- **Transportation Depots Nearby** *(0.14)*: Accessibility to transportation can slightly increase rental value.

---

#### **3. Weak/Negligible Correlations**  
These features have minimal influence on rent pricing:  
- **Available for Family** *(0.09)*  
- **Available for Women Bachelors** *(0.02)*  
- **Available for Men Bachelors** *(0.02)*  
- **Education Centre Nearby** *(-0.001)*  
- **Public Places Nearby** *(-0.02)*  
- **Facing** *(-0.02)*  
- **Age** *(-0.02)*  
- **Open Parking** *(-0.03)*

*Note*: These factors might not be primary considerations for setting rental prices.

---

#### **4. Negative Correlations**  
These features are inversely related to rent, suggesting that higher values or presence might reduce rental rates:  
- **Posted By** *(-0.37)*: Properties posted directly by owners may have lower rents compared to broker-listed ones.  
- **Power Backup** *(-0.25)*: Surprisingly, the absence or limited backup seems to correlate with lower rents.  
- **Furnishing** *(-0.17)*: Unfurnished properties might command lower rents.  
- **Type** *(-0.14)*: Certain property types (like builder floors) might be priced lower compared to villas or apartments.  
- **Hospitals and Clinics Nearby** *(-0.12)*: Proximity to these may reduce rent slightly, possibly due to noise, traffic, or crowding concerns.  
- **Bank/ATMs Nearby** *(-0.07)*: Close proximity to these services may slightly lower rent.

---

#### **Key Takeaways**  
1. **Main Rent Drivers**: Property size (area, bedrooms, bathrooms) and financial factors (deposit, brokerage) are the strongest influencers of rent.  
2. **Amenities Influence**: Facilities like covered parking, balconies, and maintenance charges moderately impact rent.  
3. **Minimal Social Impact**: Factors related to tenant availability (family, bachelors) and nearby social facilities show negligible influence.  
4. **Negative Factors**: Being unfurnished, lacking power back, or certain property types could lower rental value.


In [11]:
fi_df1 = data_label_encoded.corr()['Rent'].iloc[1:].to_frame().reset_index().rename(columns={'index':'feature','Rent':'corr_coeff'})
fi_df1

Unnamed: 0,feature,corr_coeff
0,Bedroom,0.629412
1,Bathroom,0.693264
2,Balcony,0.432992
3,Additional_rooms,0.309628
4,Area (sq.ft),0.863009
5,Facing,-0.022119
6,Furnishing,-0.174287
7,Age,-0.021206
8,Covered_Parking,0.45148
9,Open_Parking,-0.033634


### Insights from the **Random Forest feature importance analysis** related to predicting **'Rent'**:

---

#### **Top Influential Features**  
These features have the highest impact on rent prediction, as indicated by their importance scores.

1. **Area (sq.ft)** *(0.687)*  
   - Dominates as the most influential factor, indicating that property size is the primary driver for rent pricing.  

2. **Brokerage** *(0.139)*  
   - A significant contributor, suggesting that higher brokerage fees are often associated with higher rents.  

3. **Deposit** *(0.116)*  
   - Security deposits are also a major influencer, aligning with the trend that higher deposits are linked to higher rents.

---

#### **Moderately Important Features**  
These features contribute modestly to the rent prediction but are less influential than the top three.  

4. **Total Floors** *(0.008)*  
   - Properties on higher floors or in taller buildings might command higher rents.  

5. **Covered Parking** *(0.007)*  
   - Parking availability moderately influences rent, possibly due to convenience and security factors.  

6. **Type** *(0.004)*  
   - The property type (Villa, Apartment, etc.) still plays a role but is less significant than expected.  

7. **Region** *(0.005)*  
   - The geographical region adds some influence, hinting at the effect of location on rent.

---

#### **Low Importance Features**  
These features contribute marginally to the rent prediction model:

- **Bathroom** *(0.004)* and **Bedroom** *(0.003)*: Surprisingly low influence, possibly because *'Area'* already captures much of the size information.  
- **Transportation Depots Nearby** *(0.003)* and **Hospitals and Clinics Nearby** *(0.003)*: Proximity to public amenities adds slight predictive value.  
- **Balcony** *(0.002)* and **Maintenance** *(0.002)*: These features have some influence but are not strong predictors.  
- **Facing** *(0.002)*, **Bank/ATMs Nearby** *(0.002)*, **Public Places Nearby** *(0.002)*, and **Education Centre Nearby** *(0.002)*: Minor contributors to rent prediction.  
- **Furnishing** *(0.002)* and **Additional Rooms** *(0.001)*: Some impact but minimal.  
- **Age** *(0.001)*: Slight influence, potentially reflecting the condition of the property.

---

#### **Negligible Importance Features**  
These features have minimal impact on the model and might be less relevant for rent prediction:

- **Social Availability**:  
   - *Available for Family* *(0.0002)*  
   - *Available for Women Bachelors* *(0.0006)*  
   - *Available for Men Bachelors* *(0.0007)*  

- **Amenities**:  
   - *Pet Friendly* *(0.0005)*  
   - *Power Backup* *(0.001)*  
   - *Open Parking* *(0.001)*  
   - *Charges* *(0.001)*  

- **Others**:  
   - *Posted By* *(0.0006)*

---

#### **Key Takeaways**  
1. **Size and Financial Factors Lead**: *Area, Brokerage, and Deposit* dominate as the key predictors of rent.  
2. **Non-Linear Influence**: *Type* and *Total Floors* have greater importance in Random Forest compared to correlation, hinting at complex, non-linear relationships.  
3. **Redundant Features**: Social availability and nearby public facilities have minimal influence and could be candidates for feature reduction.  
4. **Surprising Drops**: Features like *Bathroom* and *Bedroom*—though strongly correlated—are less important in RF, likely due to multicollinearity with *Area*.

In [12]:
from sklearn.ensemble import RandomForestRegressor

# Train a Random Forest regressor on label encoded data
rf_label = RandomForestRegressor(n_estimators=100, random_state=42)
rf_label.fit(X_label, y_label)

# Extract feature importance scores for label encoded data
fi_df2 = pd.DataFrame({
    'feature': X_label.columns,
    'rf_importance': rf_label.feature_importances_
}).sort_values(by='rf_importance', ascending=False)

fi_df2

Unnamed: 0,feature,rf_importance
5,Area (sq.ft),0.686365
13,Brokerage,0.138903
14,Deposit,0.115528
22,Total_Floors,0.008138
9,Covered_Parking,0.006896
0,Region,0.004753
17,Type,0.004287
2,Bathroom,0.003836
27,Transportation_Depots_Nearby,0.00303
1,Bedroom,0.002942


### Insights from the **Gradient Boosting Regressor (GBR) feature importance analysis** for predicting **'Rent'**:

---

#### **Top Influential Features**  
These features have the highest impact on rent prediction based on their importance scores.

1. **Area (sq.ft)** *(0.607)*  
   - Continues to be the most dominant feature, reinforcing that property size is the primary factor influencing rent.  

2. **Deposit** *(0.193)*  
   - A strong predictor, suggesting that higher deposits are consistently associated with higher rents.  

3. **Brokerage** *(0.140)*  
   - Significant influence, indicating a strong correlation between higher brokerage fees and rent amounts.

---

#### **Moderately Important Features**  
These features contribute to the rent prediction but are less dominant compared to the top three.

4. **Bathroom** *(0.021)*  
   - Reflects the importance of the number of bathrooms, likely indicating property comfort and size.  

5. **Bedroom** *(0.012)*  
   - Slightly less influential but still relevant, as the number of bedrooms can affect rental pricing.  

6. **Total Floors** *(0.006)*  
   - Slight influence, possibly related to better views or newer construction.  

7. **Type** *(0.006)*  
   - Property type (Villa, Apartment, etc.) adds moderate predictive value.  

8. **Covered Parking** *(0.005)*  
   - Availability of covered parking offers a minor but notable influence on rent.  

9. **Region** *(0.004)*  
   - Location contributes slightly, hinting at geographical differences in rent levels.

---

#### **Low Importance Features**  
These features have minimal influence on rent prediction.

- **Furnishing** *(0.002)*  
- **Hospitals and Clinics Nearby** *(0.001)*  
- **Power Backup** *(0.0005)*  
- **Posted By** *(0.0004)*  
- **Maintenance** *(0.0004)*  
- **Open Parking** *(0.0004)*  
- **Public Places Nearby** *(0.0004)*  
- **Additional Rooms** *(0.0002)*  
- **Charges** *(0.0002)*  
- **Balcony** *(0.0001)*  
- **Bank ATMs Nearby** *(0.0001)*  
- **Age** *(0.0001)*  
- **Education Centre Nearby** *(0.0001)*

---

#### **Negligible Importance Features**  
These features have little to no impact on the model.

- **Available for Men Bachelors** *(0.0001)*  
- **Available for Women Bachelors** *(0.0000)*  
- **Transportation Depots Nearby** *(0.0000)*  
- **Pet Friendly** *(0.0000)*  
- **Facing** *(0.0000)*  
- **Available for Family** *(0.0000)*

---

#### **Key Takeaways**  

1. **Size and Financial Factors**: *Area, Deposit, and Brokerage* consistently emerge as the top predictors in all models.  
2. **Model Sensitivity**: Features like *Bathroom* and *Bedroom* show higher importance in GBR, possibly due to its sensitivity to subtle feature interactions.  
3. **Low Relevance Features**: Social factors (like availability for families or bachelors) and nearby public amenities have negligible influence across models.


In [13]:
from sklearn.ensemble import GradientBoostingRegressor

# Train a Random Forest regressor on label encoded data
gb_label = GradientBoostingRegressor()
gb_label.fit(X_label, y_label)

# Extract feature importance scores for label encoded data
fi_df3 = pd.DataFrame({
    'feature': X_label.columns,
    'gb_importance': gb_label.feature_importances_
}).sort_values(by='gb_importance', ascending=False)

fi_df3

Unnamed: 0,feature,gb_importance
5,Area (sq.ft),0.607365
14,Deposit,0.19333
13,Brokerage,0.140127
2,Bathroom,0.021017
1,Bedroom,0.012271
22,Total_Floors,0.005942
17,Type,0.005371
9,Covered_Parking,0.005225
0,Region,0.004223
7,Furnishing,0.001627


### Insights from the **Permutation Importance** analysis for the **Random Forest Regressor** predicting **'Rent'**:

---

#### **Top Influential Features**  
These features have the highest impact on rent prediction.

1. **Area (sq.ft)** *(0.760)*  
   - Consistently the most important factor, confirming that larger properties significantly influence higher rent values.

2. **Brokerage** *(0.152)*  
   - Brokerage fees remain a strong predictor, likely indicating that higher rent properties involve higher brokerage.

3. **Deposit** *(0.071)*  
   - Security deposits continue to be a relevant factor, though their importance is lower than in previous models.

---

#### **Moderately Important Features**  
These features have some influence but are less dominant.

4. **Total Floors** *(0.009)*  
   - Slight correlation, possibly due to premium rents for higher floors or newer constructions.

5. **Bathroom** *(0.006)*  
   - Suggests that more bathrooms may slightly contribute to rent variations.  

6. **Bedroom** *(0.004)*  
   - Number of bedrooms contributes modestly, but less than expected compared to *Area* or *Bathroom*.

7. **Region** *(0.004)*  
   - Geographical location slightly influences rent but is not a major determinant.

8. **Type** *(0.004)*  
   - The type of house (Villa, Apartment, etc.) adds moderate predictive value.

---

#### **Low Importance Features**  
These features contribute minimally to the rent prediction.

- **Covered Parking** *(0.002)*  
- **Maintenance** *(0.002)*  
- **Furnishing** *(0.001)*  
- **Transportation Depots Nearby** *(0.001)*  
- **Balcony** *(0.001)*  
- **Education Centre Nearby** *(0.001)*  
- **Power Backup** *(0.0005)*  
- **Available for Men Bachelors** *(0.0003)*  
- **Facing** *(0.0001)*  
- **Pet Friendly** *(0.0001)*  
- **Hospitals and Clinics Nearby** *(0.0000)*  
- **Posted By** *(0.0000)*  

---

#### **Negligible or Negative Importance Features**  
These features have little to no impact or even a slight negative effect, implying they may introduce noise.

- **Age** *(-0.0000)*  
- **Additional Rooms** *(-0.0000)*  
- **Available for Family** *(-0.0001)*  
- **Charges** *(-0.0001)*  
- **Bank ATMs Nearby** *(-0.0001)*  
- **Available for Women Bachelors** *(-0.0002)*  
- **Public Places Nearby** *(-0.0002)*  
- **Open Parking** *(-0.0003)*  

---

#### **Key Takeaways**  

1. **Area, Brokerage, and Deposit** are the top three predictors across all models, confirming their critical role in determining rent.  
2. **Permutation Analysis Highlights Subtleties**: Features like *Total Floors* and *Bathroom* gained more importance compared to previous models, suggesting they add valuable but nuanced information.  
3. **Social and Nearby Features**: Factors such as *Availability for Families/Bachelors*, *Nearby Public Places*, and *Transportation Depots* show negligible or even negative importance. These could be candidates for exclusion in feature selection to simplify the model. exclusion in feature selection to simplify and enhance model performance.


In [14]:
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split

X_train_label, X_test_label, y_train_label, y_test_label = train_test_split(X_label, y_label, test_size=0.2, random_state=42)

# Train a Random Forest regressor on label encoded data
rf_label = RandomForestRegressor(n_estimators=100, random_state=42)
rf_label.fit(X_train_label, y_train_label)

# Calculate Permutation Importance
perm_importance = permutation_importance(rf_label, X_test_label, y_test_label, n_repeats=30, random_state=42)

# Organize results into a DataFrame
fi_df4 = pd.DataFrame({
    'feature': X_label.columns,
    'permutation_importance': perm_importance.importances_mean
}).sort_values(by='permutation_importance', ascending=False)

fi_df4

Unnamed: 0,feature,permutation_importance
5,Area (sq.ft),0.760329
13,Brokerage,0.151898
14,Deposit,0.071026
22,Total_Floors,0.009201
2,Bathroom,0.005504
1,Bedroom,0.004111
0,Region,0.00391
17,Type,0.003674
9,Covered_Parking,0.001921
15,Maintenance,0.001711


### Insights from the **LASSO Regression** analysis for predicting **'Rent'** using standardized, label-encoded data.  

---

#### **Top Positive Influencers**  
These features have the highest positive coefficients, meaning they strongly contribute to increasing rent.  

1. **Area (sq.ft)** *(24699.98)*  
   - As with previous models, area remains the most significant positive contributor to rent values.  

2. **Deposit** *(15046.64)*  
   - Higher deposits continue to strongly influence higher rental prices.  

3. **Brokerage** *(10116.96)*  
   - Brokerage fees maintain their significance, aligning with previous insights.  

4. **Total Floors** *(2443.31)*  
   - More floors, potentially indicating better views or newer constructions, positively impact rent.  

5. **Posted By** *(2417.57)*  
   - If properties are listed by brokers rather than owners, rent prices tend to be higher.  

6. **Available for Men Bachelors** *(1170.92)*  
   - Availability for men bachelors appears to slightly increase rent, possibly due to niche demand.  

7. **Bathroom** *(872.47)*  
   - Additional bathrooms positively influence rent but less than expected compared to other models.  

---

#### **Moderate Positive Influencers**  
These features have a mild positive impact on rent.  

- **Public Places Nearby** *(725.40)*: Proximity to public areas might enhance convenience, slightly increasing rent.  
- **Type** *(625.65)*: The property type (e.g., Villa, Apartment) continues to have a moderate positive influence.  
- **Bank/ATMs Nearby** *(483.78)*: Accessibility to banking services contributes modestly to rent values.  
- **Facing** *(469.02)*: The facing direction of a property adds minor value.  
- **Available for Women Bachelors** *(330.91)* and **Transportation Depots Nearby** *(276.66)*: These features contribute marginally.

---

#### **Weak Positive Influencers**  
- **Maintenance** *(150.17)*: Slight correlation, suggesting better-maintained properties may attract higher rent.

---

#### **Negative Influencers**  
These features have a negative impact, reducing rent values.  

1. **Region** *(-2110.15)*: Certain regions might have lower rental demand or market rates.  
2. **Charges** *(-1693.85)*: Higher additional charges (like maintenance or utilities) could deter higher rent pricing.  
3. **Furnishing** *(-1562.65)*: Unfurnished homes are likely priced lower, reducing the rent value.  
4. **Education Centre Nearby** *(-991.63)* and **Additional Rooms** *(-925.01)*: Proximity to education centers or extra rooms seems to reduce rent, possibly due to niche location constraints.  
5. **Hospitals and Clinics Nearby** *(-845.56)*: Noise or traffic around hospitals might lower rental desirability.  
6. **Pet-Friendly** *(-674.20)*: Surprisingly, pet-friendly properties slightly decrease rental values, possibly due to maintenance concerns.  
7. **Open Parking** *(-654.62)* and **Covered Parking** *(-284.45)*: Lack of secure parking could lead to lower rent.  
8. **Bedroom** *(-261.57)*: More bedrooms show a minor negative influence in this model.  
9. **Power Backup** *(-203.20)* and **Age** *(-152.81)*: Older properties and lack of power backup show a negative trend.  
10. **Balcony** *(-118.31)* and **Available for Family** *(-87.77)*: Minor negative impacts, which may depend on individual preferences.

---

#### **Key Takeaways**  

1. **Consistent Influencers**: *Area, Deposit, and Brokerage* remain the top predictors, confirming their central role in rent determination.  
2. **Model-Specific Variations**: Features like *Posted By* and *Total Floors* gained more importance in LASSO, indicating unique model sensitivities.  
3. **Negative Influences Identified**: Features such as *Region, Furnishing, and Additional Charges* have strong negative coefficients, suggesting these factors can reduce rental desirability.  
4. **Minimal Influence Features**: Social factors (*Availability for Family, Power Backup, Pet-Friendly*) consistently show limited or negative influence. This could indicate that market pricing is less affected by lifestyle preferences.  
5. **Potential for Feature Reduction**: Features like *Public Places Nearby* and *Education Centres* have low or negative contributions and could be excluded in future models for simplicity.

In [15]:
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_label)

# Train a LASSO regression model
# We'll use a relatively small value for alpha (the regularization strength) for demonstration purposes
lasso = Lasso(alpha=0.01, random_state=42)
lasso.fit(X_scaled, y_label)

# Extract coefficients
fi_df5 = pd.DataFrame({
    'feature': X_label.columns,
    'lasso_coeff': lasso.coef_
}).sort_values(by='lasso_coeff', ascending=False)

fi_df5

Unnamed: 0,feature,lasso_coeff
5,Area (sq.ft),24699.984229
14,Deposit,15046.635572
13,Brokerage,10116.964882
22,Total_Floors,2443.308727
21,Posted_By,2417.567857
20,Available_for_Men_Bachelors,1170.91564
2,Bathroom,872.472312
25,Public_Places_Nearby,725.404929
17,Type,625.650052
24,Bank_ATMs_Nearby,483.780139


### Insights from the **Recursive Feature Elimination (RFE)** using a **Random Forest Regressor** on the label-encoded data for predicting rent.  

---

#### **Top Influential Features**  
These features have the highest RFE scores, indicating strong importance in predicting rent.

1. **Area (sq.ft)** *(0.6707)*  
   - Continues to dominate as the most influential feature, consistent with all previous models.  

2. **Brokerage** *(0.1435)*  
   - Remains a significant factor influencing rent, confirming earlier model results.  

3. **Deposit** *(0.1250)*  
   - Consistent influence across models, reinforcing its role as a key rent determinant.  

---

#### **Moderate Influential Features**  
These features have a modest impact on rent prediction.  

- **Total Floors** *(0.0083)*: Indicates a moderate association with rent, possibly due to preferences for newer or higher-floor units.  
- **Covered Parking** *(0.0058)*: Highlights the importance of secure parking availability.  
- **Type** *(0.0052)*: Property type retains moderate significance.  
- **Region** *(0.0048)*: Shows a slight influence, likely due to varying market rates across regions.  
- **Bathroom** *(0.0039)*: A consistent, though modest, contributor to rent prediction.  
- **Transportation Depots Nearby** *(0.0026)*: Suggests slight relevance, particularly for locations with better connectivity.  

---

#### **Low Influence Features**  
These features have minimal but notable importance.  

- **Bedroom** *(0.0032)*: While intuitive, its low score suggests bedroom count might already be captured indirectly through area or type.  
- **Balcony** *(0.0027)* and **Public Places Nearby** *(0.0027)*: Minor influence, aligning with previous permutation and LASSO results.  
- **Hospitals and Clinics Nearby** *(0.0027)*: Indicates slight value in terms of location desirability.  
- **Maintenance** *(0.0025)*: Consistent with previous insights, this factor has minimal but stable influence.  
- **Bank ATMs Nearby** *(0.0023)*: Minor influence, aligning with previous results.  
- **Education Centre Nearby** *(0.0022)*: Low but steady influence, possibly affecting family-oriented renters.  
- **Facing** *(0.0022)*: Directional facing contributes slightly to desirability.  
- **Furnishing** *(0.0020)*: Suggests minimal impact.  

---

#### **Negligible Influence Features**  
These features contribute very little to rent prediction and could be candidates for removal in future models.  

- **Additional Rooms** *(0.0016)* and **Age** *(0.0011)*: Minimal impact, could be considered for removal.  
- **Power Backup** *(0.0010)* and **Charges** *(0.0008)*: Weak influence despite being financial factors.  
- **Open Parking** *(0.0007)*: Low contribution to rent prediction.  
- **Available for Men Bachelors** *(0.0007)* and **Available for Women Bachelors** *(0.0006)*: Very low influence, suggesting uniform rent rates regardless of availability criteria.  
- **Pet-Friendly** *(0.0006)*: Consistently low across models.  
- **Posted By** *(0.0005)*: Surprisingly low importance, though significant in the LASSO model.  
- **Available for Family** *(0.0001)*: Marginal relevance, consistent with previous findings.

---
                                                    
#### **Key Takeaways**  

1. **Strong Core Features**: *Area (sq.ft), Brokerage, and Deposit* remain the strongest predictors, as consistently shown across all models.  
2. **Variability in Importance**: Features like *Region* and *Posted By* display variable significance, suggesting sensitivity to modeling techniques.  
3. **Minimal Impact Features**: Lifestyle features like *Pet-Friendly, Open Parking,* and *Available for Family* show consistently low importance, suggesting they could be excluded from future models to simplify analysis.

In [16]:
from sklearn.feature_selection import RFE

# Initialize the base estimator
estimator = RandomForestRegressor()

# Apply RFE on the label-encoded and standardized training data
selector_label = RFE(estimator, n_features_to_select=X_label.shape[1], step=1)
selector_label = selector_label.fit(X_label, y_label)

# Get the selected features based on RFE
selected_features = X_label.columns[selector_label.support_]

# Extract the coefficients for the selected features from the underlying linear regression model
selected_coefficients = selector_label.estimator_.feature_importances_

# Organize the results into a DataFrame
fi_df6 = pd.DataFrame({
    'feature': selected_features,
    'rfe_score': selected_coefficients
}).sort_values(by='rfe_score', ascending=False)

fi_df6

Unnamed: 0,feature,rfe_score
5,Area (sq.ft),0.696358
13,Brokerage,0.132945
14,Deposit,0.109085
22,Total_Floors,0.008943
9,Covered_Parking,0.006208
17,Type,0.004726
0,Region,0.004715
2,Bathroom,0.003587
27,Transportation_Depots_Nearby,0.003273
1,Bedroom,0.003225


### Insights from the **Linear Regression** model trained on the standardized, label-encoded data for rent prediction.  

---

#### **Top Positive Influential Features**  
These features have the largest positive coefficients, indicating that an increase in these values is associated with higher rent.  

1. **Area (sq.ft)** *(+24,700.00)*  
   - Reaffirms its position as the most influential predictor. Larger area directly correlates with higher rent.  

2. **Deposit** *(+15,046.64)*  
   - A higher deposit requirement generally indicates a higher property value, aligning with rent expectations.  

3. **Brokerage** *(+10,116.97)*  
   - Higher brokerage fees are associated with more expensive properties.  

4. **Total Floors** *(+2,443.33)*  
   - Buildings with more floors may be newer or have better amenities, thus commanding higher rents.  

5. **Posted By** *(+2,417.60)*  
   - Suggests that the person posting the listing (owner vs. agent) influences rent values, possibly due to differences in pricing strategies.  

6. **Available for Men Bachelors** *(+1,170.92)*  
   - Properties open to male bachelors show a slight premium, possibly due to demand dynamics.  

7. **Bathroom** *(+872.54)*  
   - More bathrooms positively influence rent, as expected.  

8. **Public Places Nearby** *(+725.42)*  
   - Proximity to public places boosts rent, indicating the value of convenience and accessibility.  

9. **Type** *(+625.66)*  
   - The property type (e.g., apartment, independent house) moderately affects rent values.  

---

#### **Low Positive Influential Features**  
These features positively affect rent but to a lesser degree.  

- **Bank ATMs Nearby** *(+483.80)*: Slight positive impact, perhaps due to increased neighborhood convenience.  
- **Facing** *(+469.02)*: Preferred directional facing (like east or north) can marginally influence pricing.  
- **Available for Women Bachelors** *(+330.92)*: Slightly increases rent, though less than the men bachelors category.  
- **Transportation Depots Nearby** *(+276.67)*: Good transport connectivity adds value.  
- **Maintenance** *(+150.18)*: Well-maintained properties tend to attract higher rents.

---

#### **Negative Influential Features**  
These features have negative coefficients, meaning their presence or higher value is associated with lower rent.

- **Available for Family** *(-87.77)*: Slightly negative, possibly due to families seeking budget-friendly options.  
- **Balcony** *(-118.33)*: Surprisingly negative, but might be due to trade-offs like reduced indoor area.  
- **Age** *(-152.82)*: Older properties might reduce rental value.  
- **Power Backup** *(-203.22)*: May indicate older buildings where backup is a necessity, correlating with lower rent.  
- **Bedroom** *(-261.63)*: Slightly counterintuitive, but could be due to collinearity with area or type.  
- **Covered Parking** *(-284.47)* and **Open Parking** *(-654.63)*: Surprisingly negative, suggesting parking might not be a primary rent driver in the dataset.  
- **Pet-Friendly** *(-674.21)*: Properties allowing pets might attract discounts or less demand.  
- **Hospitals and Clinics Nearby** *(-845.58)*: Possibly correlating with older areas or less desirable neighborhoods.  
- **Additional Rooms** *(-925.02)*: Might reflect properties that are older or less efficiently designed.  
- **Education Centres Nearby** *(-991.65)*: Unexpected, but could be due to high supply reducing rental premiums.  
- **Furnishing** *(-1,562.67)*: Could indicate older or mismatched furnishings reducing desirability.  
- **Charges** *(-1,693.85)*: Higher maintenance charges may deter renters, lowering overall rent values.  
- **Region** *(-2,110.15)*: Negative coefficient suggests that some regions could be less desirable, impacting rent negatively.

---

#### **Key Insights**  

1. **Consistency in Core Features**:  
   - *Area, Deposit,* and *Brokerage* remain the dominant predictors across all models, reinforcing their critical role.  

2. **Variable Influence for Certain Features**:  
   - *Region* and *Posted By* show significant variability. While negatively impacting rent in linear models, their influence is minimal in tree-based methods.  
   - *Furnishing* and *Parking* show mixed results, suggesting further analysis is needed to understand their real impact.  

3. **Unexpected Negative Features**:  
   - Features like *Hospitals and Clinics Nearby, Education Centres,* and *Balcony* show negative coefficients, hinting at potential multicollinearity or dataset-specific biases.

In [17]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X_scaled, y_label)

fi_df7 = pd.DataFrame({
    'feature': X_label.columns,
    'reg_coeffs': lin_reg.coef_
}).sort_values(by='reg_coeffs', ascending=False)

fi_df7

Unnamed: 0,feature,reg_coeffs
5,Area (sq.ft),24700.004018
14,Deposit,15046.642333
13,Brokerage,10116.970711
22,Total_Floors,2443.327814
21,Posted_By,2417.596134
20,Available_for_Men_Bachelors,1170.918899
2,Bathroom,872.536161
25,Public_Places_Nearby,725.421557
17,Type,625.662217
24,Bank_ATMs_Nearby,483.80387


### Insights derived from the **SHAP (SHapley Additive Explanations)** analysis, which offers a detailed, model-agnostic interpretation of feature importance for the **Random Forest** model.

---

#### **Key Insights from SHAP Scores**  
The SHAP score reflects each feature's average absolute impact on the model's prediction.

---

#### **Top Influential Features**  
These features contribute the most to rent prediction:

1. **Area (sq.ft)** *(18,561.65)*  
   - Continues to be the **most influential** feature, confirming that larger property size directly correlates with higher rent.  
   
2. **Brokerage** *(6,103.71)*  
   - Significantly impacts rent, implying that properties with higher brokerage fees tend to be more valuable or in higher demand.

3. **Deposit** *(4,163.43)*  
   - Strongly correlates with rent, emphasizing that a higher upfront deposit often indicates a higher-priced rental.

4. **Total Floors** *(1,434.41)*  
   - Buildings with more floors may offer better amenities or newer infrastructure, thus attracting higher rents.

5. **Type** *(1,118.69)*  
   - Property type (e.g., apartment, independent house) moderately influences rent values.

---

#### **Mid-Level Influential Features**  
Features with moderate importance in rent prediction:

- **Region** *(1,082.64)*: Indicates that the geographical location of the property still plays a significant role in pricing.  
- **Bedroom** *(807.90)*: Number of bedrooms is an expected factor, although it's secondary to area and deposits.  
- **Bathroom** *(745.01)*: More bathrooms marginally influence rent.  
- **Furnishing** *(436.12)*: Suggests that furnished properties are valued higher.  
- **Maintenance** *(244.38)*: Well-maintained properties fetch better rent.

---

#### **Low but Notable Influential Features**  
These have minor contributions but could be relevant in specific contexts:

- **Public Places Nearby** *(231.18)*: Enhances the desirability of the property.  
- **Hospitals and Clinics Nearby** *(217.03)*: Slightly raises rent due to accessibility benefits.  
- **Power Backup** *(212.83)*: Important in regions with frequent power outages.  
- **Bank ATMs Nearby** *(145.97)*: Indicates convenience but with minimal direct influence.  
- **Transportation Depots Nearby** *(144.39)*: Proximity to transport hubs marginally impacts rent.  
- **Balcony** *(140.70)* and **Facing** *(140.50)*: Orientation and outdoor space contribute modestly.  
- **Covered Parking** *(126.45)*: Adds minor value, highlighting the desirability of secure parking.  
- **Age** *(115.77)*: Slightly affects rent, indicating newer properties could be priced higher.  
- **Education Centre Nearby** *(104.84)*: Minor influence, possibly due to family-centric renters.  
- **Charges** *(90.71)*: Slightly negative effect, as higher charges may deter potential renters.

---

#### **Minimal Influential Features**  
These features had minimal influence on predictions:

- **Available for Men Bachelors** *(80.20)* and **Women Bachelors** *(53.49)*: Show minor positive influence, indicating some demand impact.  
- **Posted By** *(71.47)*: Low contribution, suggesting the poster's identity has minimal impact.  
- **Additional Rooms** *(60.59)*: Minimal impact, possibly due to collinearity with area or type.  
- **Pet-Friendly** *(46.80)*: Slight influence, indicating marginal market demand.  
- **Open Parking** *(35.73)*: Surprisingly low, possibly due to less demand for open parking spaces.  
- **Available for Family** *(18.67)*: Very minimal impact, indicating that rent isn’t heavily swayed by this parameter.

---

#### **Key Observations**  

1. **Consistency of Core Features**  
   - *Area, Brokerage,* and *Deposit* are consistently top predictors, across SHAP, permutation importance, and regression coefficients.

2. **Regional Influence is Mixed**  
   - *Region* has significant SHAP influence but negative regression coefficients. This suggests a **non-linear relationship**—in some areas, rent may be high, but not uniformly.

3. **Low Impact of Social Factors**  
   - Features like *Pet-Friendly, Available for Family,* and *Posted By* contribute minimally, suggesting that rental prices are more influenced by structural and locational factors than social constraints.

In [18]:
import shap

rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_label, y_label)
print("Fitted")
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_label)

shap_sum = np.abs(shap_values).mean(axis=0)

shap_values

Fitted


array([[  596.14652579,  1076.43471279,  2296.79395684, ...,
          -36.95824142,    92.82575016,  -116.43385976],
       [ 1237.84924829,   305.34090486,   159.93598934, ...,
         -493.30833651,  -140.42382905,  -250.24713583],
       [ 1241.19173971, -1287.52101556,   -17.2666988 , ...,
          282.89691003,   -67.60697257,  -101.90589845],
       ...,
       [ 2524.36384348, -1265.49622825, -1208.9152146 , ...,
          456.31473112,   228.29265947,    20.3696183 ],
       [ 2570.43315534,   402.67327346,  -926.34946056, ...,
          168.08615084,    -5.99023017,    81.663331  ],
       [ 1922.3316723 , -1396.25638787, -1273.03911905, ...,
           -9.79832308,   -72.22622471,   -72.5606459 ]])

In [19]:
fi_df8 = pd.DataFrame({
    'feature': X_label.columns,
    'SHAP_score': np.abs(shap_values).mean(axis=0)
}).sort_values(by='SHAP_score', ascending=False)

fi_df8

Unnamed: 0,feature,SHAP_score
5,Area (sq.ft),18561.651116
13,Brokerage,6103.708217
14,Deposit,4163.429648
22,Total_Floors,1434.408189
17,Type,1118.693817
0,Region,1082.637638
1,Bedroom,807.900559
2,Bathroom,745.007692
7,Furnishing,436.122118
15,Maintenance,244.375258


### **Key Insights from the Aggregated Feature Importance**

#### **Top 3 Dominant Features**  
These features overwhelmingly influence rent predictions, consistently ranking high across all models.

1. **Area (sq.ft)** *(64.90% importance)*  
   - This is by far the **most critical determinant** of rent, reaffirming that larger properties command higher rental prices.  
   - The feature dominates all models, highlighting its universal significance.

2. **Brokerage** *(14.91% importance)*  
   - Brokerage fees directly reflect the rental market's demand-supply dynamics.  
   - This consistent importance suggests that higher brokerage is often associated with premium properties.

3. **Deposit** *(12.45% importance)*  
   - The deposit amount closely correlates with rental value. A larger deposit often indicates higher rent, possibly due to property quality or exclusivity.

---

#### **Mid-Level Influential Features**  
These features have a **moderate but notable impact** on rent prediction.

4. **Total Floors** *(1.43%)*  
   - Indicates that multi-floor buildings may have better facilities or location advantages, impacting rent.  

5. **Bathroom** *(1.10%)*  
   - More bathrooms generally imply larger, better-equipped homes, though the effect is lesser than overall area.

6. **Type** *(1.01%)*  
   - The type of accommodation (e.g., apartment, independent house) moderately influences rent decisions.

7. **Bedroom** *(0.90%)*  
   - Aligns with expectations that more bedrooms indicate higher rents but is less influential than overall area.

---

#### **Low-Influence Features**  
These features have **minor contributions** to rent prediction.

- **Covered Parking** *(0.47%)* and **Furnishing** *(0.38%)*: Slightly raise rental value but are secondary considerations.  
- **Maintenance** *(0.28%)*: Better-maintained properties can command higher rents but with marginal influence.  
- **Hospitals and Clinics Nearby** *(0.24%)* and **Public Places Nearby** *(0.24%)*: Proximity to amenities has some influence but is not a primary driver.  
- **Transportation Depots Nearby** *(0.21%)* and **Bank ATMs Nearby** *(0.17%)*: Convenience factors but not decisive.  
- **Balcony** *(0.20%)* and **Power Backup** *(0.18%)*: Slightly enhance property desirability but are lower priorities.

---

#### **Minimal-Influence Features**  
These features contribute **negligibly** and could be candidates for feature reduction.

- **Available for Men Bachelors** *(0.08%)* and **Available for Women Bachelors** *(0.05%)*: These features might be more relevant for filtering preferences but less for rent prediction.  
- **Open Parking** *(0.05%)* and **Pet Friendly** *(0.05%)*: Minimal effect, likely due to less variation or demand in the dataset.  
- **Available for Family** *(0.02%)*: Surprisingly low impact, suggesting that rent pricing is more structurally driven than by the target demographic.  
- **Posted By** *(0.07%)*: The person posting the listing doesn't significantly influence rent price.

---

#### **Key Observations and Trends**

1. **Consistency Across Models**  
   - *Area, Brokerage, and Deposit* dominate across all techniques, confirming their strong, consistent relationship with rent prices.

2. **Minimal Influence of Social & Convenience Features**  
   - Features like *Pet-Friendly, Open Parking, Available for Family*, and *Posted By* contribute marginally, suggesting that the **physical characteristics of the property** overshadow these aspects.

3. **Potential Redundancy in Features**  
   - Some convenience features (e.g., *Hospitals, Public Places, ATMs*) show low influence and could be redundant if they don’t add value to the model's prediction accuracy.

In [20]:
final_fi_df = fi_df1.merge(fi_df2,on='feature').merge(fi_df3,on='feature').merge(fi_df4,on='feature').merge(fi_df5,on='feature').merge(fi_df6,on='feature').merge(fi_df7,on='feature').merge(fi_df8,on='feature').set_index('feature')

In [21]:
final_fi_df

Unnamed: 0_level_0,corr_coeff,rf_importance,gb_importance,permutation_importance,lasso_coeff,rfe_score,reg_coeffs,SHAP_score
feature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Bedroom,0.629412,0.002942,0.012271,0.004111,-261.567202,0.003225,-261.63245,807.900559
Bathroom,0.693264,0.003836,0.021017,0.005504,872.472312,0.003587,872.536161,745.007692
Balcony,0.432992,0.00225,3.2e-05,0.000833,-118.307252,0.002215,-118.325608,140.702796
Additional_rooms,0.309628,0.001149,0.000169,-3.6e-05,-925.011992,0.001389,-925.021852,60.587249
Area (sq.ft),0.863009,0.686365,0.607365,0.760329,24699.984229,0.696358,24700.004018,18561.651116
Facing,-0.022119,0.00235,5e-06,8.1e-05,469.017411,0.002343,469.024475,140.497393
Furnishing,-0.174287,0.001807,0.001627,0.001408,-1562.653861,0.002045,-1562.665116,436.122118
Age,-0.021206,0.001091,0.0001,-2.9e-05,-152.813919,0.001107,-152.821152,115.77362
Covered_Parking,0.45148,0.006896,0.005225,0.001921,-284.452097,0.006208,-284.467131,126.446576
Open_Parking,-0.033634,0.000732,0.000407,-0.000276,-654.622778,0.000917,-654.626326,35.728821


In [22]:
final_fi_df = final_fi_df.divide(final_fi_df.sum(axis=0), axis=1)

In [23]:
final_fi_df[['rf_importance','gb_importance','permutation_importance','rfe_score','SHAP_score']].mean(axis=1).sort_values(ascending=False)

feature
Area (sq.ft)                     0.654091
Brokerage                        0.147003
Deposit                          0.121345
Total_Floors                     0.014489
Bathroom                         0.010979
Type                             0.009891
Bedroom                          0.009048
Covered_Parking                  0.004771
Furnishing                       0.003825
Maintenance                      0.002736
Hospitals_and_Clinics_Nearby     0.002500
Public_Places_Nearby             0.002336
Transportation_Depots_Nearby     0.002250
Balcony                          0.001857
Power_backup                     0.001795
Bank_ATMs_Nearby                 0.001749
Facing                           0.001748
Education_Centre_Nearby          0.001576
Age                              0.001106
Additional_rooms                 0.000877
Charges                          0.000798
Available_for_Men_Bachelors      0.000796
Posted_By                        0.000719
Open_Parking              

In [24]:
X_label

Unnamed: 0,Region,Bedroom,Bathroom,Balcony,Additional_rooms,Area (sq.ft),Facing,Furnishing,Age,Covered_Parking,...,Available_for_Family,Available_for_Women_Bachelors,Available_for_Men_Bachelors,Posted_By,Total_Floors,Hospitals_and_Clinics_Nearby,Bank_ATMs_Nearby,Public_Places_Nearby,Education_Centre_Nearby,Transportation_Depots_Nearby
0,1.0,4,5,3.0,2,2100.00,0.0,1.0,0.0,1,...,1,1,1,0.0,3,1,1,3,2,2
1,1.0,3,3,2.0,0,1777.26,1.0,1.0,1.0,1,...,1,0,0,0.0,19,0,0,1,0,0
2,1.0,1,2,1.0,0,600.00,0.0,1.0,1.0,1,...,1,1,1,1.0,2,25,3,5,3,0
3,1.0,2,2,1.0,0,1160.00,0.0,0.0,1.0,1,...,1,1,1,0.0,5,14,6,1,2,0
4,1.0,3,5,4.0,1,3300.00,0.0,1.0,1.0,2,...,1,0,0,0.0,5,3,0,3,4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6735,0.0,1,1,0.0,0,450.00,0.0,0.0,1.0,0,...,1,1,1,1.0,4,1,11,0,1,1
6736,0.0,1,1,1.0,2,540.00,4.0,1.0,1.0,1,...,1,1,1,1.0,3,7,8,1,7,1
6737,0.0,1,1,1.0,0,400.00,4.0,1.0,0.0,0,...,1,0,0,1.0,2,9,6,8,0,1
6738,0.0,2,1,1.0,0,600.00,0.0,1.0,0.0,0,...,1,1,1,1.0,2,11,14,4,2,2


### The analysis based on the **cross-validation results** after performing feature reduction:

---

#### **Model Performance Comparison**

| **Model**                | **Features Used** | **Mean R² Score** |
|--------------------------|------------------|--------------------|
| **Original Model**       | All Features     | **0.8788**         |
| **Reduced Feature Model**| Top Features Only (excluding 14 low-impact features) | **0.8844**         |

---

#### **Key Insights**  

1. **Slight Improvement in Accuracy**  
   - After dropping **14 low-importance features**, the R² score improved **from 0.8788 to 0.8844**.  
   - Although the improvement is marginal (~0.56%), it shows that simplifying the model **did not harm performance** and even led to a slight gain.

2. **Enhanced Model Simplicity**  
   - The reduced model is **simpler, faster to train, and easier to interpret**.  
   - It retains the most **predictive and influential features**, eliminating redundant or noise-contributing variables.

3. **Confirmed Irrelevance of Certain Features**  
   - Features like *Charges, Available for Men/Women Bachelors, Pet Friendly, Facing,* and *Power Backup* had **minimal impact** on model performance.  
   - Their removal not only simplifies the model but may also reduce overfitting risks.duce overfitting risks.


In [25]:
from sklearn.model_selection import cross_val_score

rf = RandomForestRegressor(n_estimators=100, random_state=42)

scores = cross_val_score(rf, X_label, y_label, cv=5, scoring='r2')

In [26]:
scores.mean()

0.8788218393888038

In [27]:
rf = RandomForestRegressor(n_estimators=100, random_state=42)

scores = cross_val_score(rf, X_label.drop(columns=['Charges',
'Available_for_Men_Bachelors',
'Posted_By',
'Available_for_Women_Bachelors',
'Open_Parking',
'Pet_friendly',
'Available_for_Family' , 'Hospitals_and_Clinics_Nearby' , 'Public_Places_Nearby' , 'Bank_ATMs_Nearby' , 'Transportation_Depots_Nearby',
'Facing',
'Education_Centre_Nearby',
'Power_backup']), y_label, cv=5, scoring='r2')

In [28]:
scores.mean()

0.8843855122988215

In [29]:
X_label.shape[1] - 14

14

In [30]:
Drop_features = ['Charges','Available_for_Men_Bachelors','Posted_By','Available_for_Women_Bachelors','Open_Parking','Pet_friendly','Available_for_Family','Total_Parking' , 'Hospitals_and_Clinics_Nearby' , 'Public_Places_Nearby' , 'Bank_ATMs_Nearby' , 'Transportation_Depots_Nearby','Facing','Education_Centre_Nearby','Power_backup' , 'Property_ID' , 'Address' , 'Rating' , 'Locality' , 'Floor_For_Rent' , 'Total_Parking']

In [31]:
df_copy.drop(columns =  Drop_features)

Unnamed: 0,Region,Bedroom,Bathroom,Balcony,Additional_rooms,Area (sq.ft),Furnishing,Age,Covered_Parking,Brokerage,Deposit,Maintenance,Type,Total_Floors,Rent
0,Bangalore East,4,5,3,2,2100.00,Semifurnished,10,1,120000,840000,0,House/Villa,3,120000
1,Bangalore East,3,3,2,0,1777.26,Semifurnished,5,1,43000,300000,5000,Apartment,19,43000
2,Bangalore East,1,2,1,0,600.00,Semifurnished,5,1,0,70000,0,Builder Floor,2,12000
3,Bangalore East,2,2,1,0,1160.00,Furnished,1,1,40000,200000,0,Apartment,5,40000
4,Bangalore East,3,5,3+,1,3300.00,Semifurnished,1,2,140000,840000,15000,Apartment,5,140000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6774,Bangalore Central,1,1,0,0,450.00,Furnished,5,0,0,80000,0,Builder Floor,4,16000
6775,Bangalore Central,1,1,1,2,540.00,Semifurnished,5,1,0,75000,0,Builder Floor,3,20000
6776,Bangalore Central,1,1,1,0,400.00,Semifurnished,10,0,0,0,0,House/Villa,2,12000
6777,Bangalore Central,2,1,1,0,600.00,Semifurnished,10,0,0,50000,0,Apartment,2,16000


In [32]:
df_copy.drop(columns =  Drop_features).to_excel('Cleaned_and_Feature_Section_Bangalore_Rental_House_data.xlsx' , index=False)