# <div style="text-align: center; background-color: #595964; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">📊EDA |house Data | Visualization </div>

<h3 style="text-align: left;background-color: #00BFFF; font-family:Times New Roman; color: white; padding: 14px; line-height: 1; border-radius:10px"> About Dataset📁</h3>

<h4>house Data<mark>21 columns</mark>:</h4>
The House Price dataset provides valuable information about residential properties, encompassing a range of features such as carpet area, property status, floor, transaction type, furnishing, facing, overlooking, society name, bathroom count, balcony count, car parking availability, ownership type, super area, dimensions, plot area, property title, total amount, price per square foot, and a description of the property's location. This dataset offers insights into the dynamics of the housing market and can aid buyers, sellers, and real estate professionals in making informed decisions.

<h4>The House Prices Dataset contains <mark>21 columns</mark>, each with the following descriptions:</h4>

* <b> <mark>1. Index</mark></b>: Unique identifier for each property entry.
* <b> <mark>2. Title</mark></b>: Title or name associated with the property.
* <b> <mark>3. Description</mark></b>: Detailed description of the property.
* <b> <mark>4. Amount(in rupees)</mark></b>: The initial amount in rupees associated with the property.
* <b> <mark>5. Price (in rupees)</mark></b>: Current price in rupees for the property.
* <b> <mark>6. Location</mark></b>: Geographic location of the property.
* <b> <mark>7. Carpet Area</mark></b>: The area covered by carpet within the property.
* <b> <mark>8. Status</mark></b>: Status of the property (e.g., occupied, vacant).
* <b> <mark>9. Floor</mark></b>: The floor on which the property is situated.
* <b> <mark>10. Transaction</mark></b>: Type of transaction associated with the property (e.g., sale, rent).
* <b> <mark>11. Furnishing</mark></b>: Level of furnishing in the property.
* <b> <mark>12. Facing</mark></b>: The direction the property is facing.
* <b> <mark>13. Overlooking</mark></b>: The view or aspect the property overlooks.
* <b> <mark>14. Society</mark></b>: The society or community the property belongs to.
* <b> <mark>15. Bathroom</mark></b>: Number of bathrooms in the property.
* <b> <mark>16. Balcony</mark></b>: Number of balconies in the property.
* <b> <mark>17. Car Parking</mark></b>: Availability of car parking space.
* <b> <mark>18. Ownership</mark></b>: Type of ownership for the property.
* <b> <mark>19. Super Area</mark></b>: Total area including all spaces within the property.
* <b> <mark>20. Dimensions</mark></b>: Specific dimensions of the property (e.g., length x width).
* <b> <mark>21. Plot Area</mark></b>: The total area of the plot where the property is situated.


<a id="1"></a>
# <div style="text-align: center; background-color: #00BFFF; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">1. Import Necessary Libraries</div>

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import shap
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import missingno as mno
import plotly.offline as pyo 
import plotly.figure_factory as ff
import plotly.io as pio
from wordcloud import WordCloud
color_pal = sns.color_palette()
plt.style.use('seaborn-dark-palette')
plt.style.use('dark_background')

import nltk
import re

import warnings
warnings.filterwarnings('ignore')
sns.set_theme(style='darkgrid', palette='colorblind')
from sklearn.preprocessing import LabelEncoder 
le = LabelEncoder()

#Model
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from xgboost import XGBRegressor
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [None]:
df = pd.read_csv("/kaggle/input/house-price/house_prices.csv" , index_col="Index")

<a id="1"></a>
# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">2. 📊EDA </div>

In [None]:
df.head()

In [None]:
cols = df.columns
cols

In [None]:
df.shape

In [None]:
df.describe().T

In [None]:
df.describe(include = 'object').T


In [None]:
df.dtypes

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">3. Null values</div>

In [None]:
df

In [None]:
df.isnull().sum()

In [None]:
# Calculating the count of missing values in each column
missing_values = df.isna().sum()

# Creating a bar plot using Plotly Express
fig = px.bar(x=missing_values.index, y=missing_values.values, labels={'x': 'Columns', 'y': 'Missing Values Count'},
             title='Count of Missing Values in Each Column')
fig.show()

In [None]:
# Drop columns with a high number of missing values
df.drop(columns=['Society', 'Car Parking', 'Super Area', 'Dimensions', 'Plot Area'], inplace=True)

# Fill missing values in categorical columns with 'Unknown'
categorical_columns = ['Description', 'Status', 'Furnishing', 'Transaction']
df[categorical_columns] = df[categorical_columns].fillna('Unknown')

# Calculate mean and median for specific columns
Mode_Bathroom = df['Bathroom'].mode()

# Fill missing values in specific columns with calculated values
df['Bathroom'].fillna(Mode_Bathroom, inplace=True)

# Drop rows with any remaining missing values
df = df.dropna()

# Display the shape of the DataFrame after cleaning
df.shape


In [None]:
df.isnull().sum()

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">4. Duplicate rows</div>


In [None]:
# Finding duplicate rows
duplicate_rows = df[df.duplicated(keep='first')]

# Number of duplicate rows
num_duplicates = duplicate_rows.shape[0]

# Displaying the duplicate rows
print(f"Number of duplicate rows: {num_duplicates}")
duplicate_rows

In [None]:
df.drop_duplicates(keep='first', inplace=True)


In [None]:
df.shape

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">5. Feature engineering</div>

# Carpet_Area

In [None]:
#Carpet_Area
df.rename(columns={"Carpet Area":"Carpet_Area"} , inplace=True)

df['Carpet_Area'] = df['Carpet_Area'].str.replace('sqft', '', regex=True)
df['Carpet_Area'] = df['Carpet_Area'].str.replace('sqm', '', regex=True)
df['Carpet_Area'] = pd.to_numeric(df['Carpet_Area'], errors='coerce')
df['Carpet_Area'] = df['Carpet_Area'] * 10.7639

df["Carpet_Area"]

# Amount(in rupees)

In [None]:
def convert_amount(amount):
    try:
        if 'Lac' in amount:
            amount = amount.replace('Lac', '').strip()
            return float(amount) * 100000  # Convert Lac to rupees (1 Lac = 100000 rupees)
        elif 'Cr' in amount:
            amount = amount.replace('Cr', '').strip()
            return float(amount) * 10000000  # Convert Cr to rupees (1 Cr = 10000000 rupees)
        else:
            return float(amount)
    except ValueError:
        return None

df['Amount(in rupees)'] = df['Amount(in rupees)'].apply(convert_amount)

df['Amount(in rupees)']

In [None]:
df.rename(columns={"Amount(in rupees)":"Amount"} , inplace=True)


# Price

In [None]:
df.rename(columns={"Price (in rupees)":"Price"} , inplace=True)

df['Price'] = pd.to_numeric(df['Price'], errors='coerce')

# Bathroom & Balcony

In [None]:
#Bathroom
df['Bathroom'] = pd.to_numeric(df['Bathroom'], errors='coerce')
#Balcony
df['Balcony'] = pd.to_numeric(df['Balcony'], errors='coerce')

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">6. Data visualisation</div>

In [None]:
plt.figure(figsize=(20,20))
sns.heatmap(df.corr(), annot=True, fmt=".2f")
plt.show()

In [None]:
# Selecting the top 10 prices from the 'Price' column in the DataFrame 'df'
top_prices = df['Price'].nlargest(10)
locations = df.loc[top_prices.index]['location']

# Plotting the top 10 prices using Matplotlib
plt.figure(figsize=(10, 6))  
plt.bar(range(len(top_prices)), top_prices, color='#7B66FF')  
plt.xlabel('Index')  
plt.ylabel('Price') 
plt.legend(['Prices'])
plt.title('Top 10 House Prices') 
plt.xticks(range(len(top_prices)), locations)  
plt.tight_layout()  
plt.show()


In [None]:
# Calculate the median 'Price' for each 'Bathroom'
Price_By_Bathroom = df.groupby('Bathroom')['Price'].median().reset_index()

# labels for the x-axis, title, and customized height
fig_Price_By_Bathroom = px.line(
    Price_By_Bathroom,  # DataFrame containing the data
    x='Bathroom',   # x-values: Bathroom categories
    y='Price',  # y-values: median Price
    labels={'Bathroom': 'Bathrooms'},  # Customize label for the x-axis
    title='Price  by number of Bathroom',  # Set the title of the plot
    height=650  # Set the height of the plot
)

# Display the plot
fig_Price_By_Bathroom.show()


In [None]:
# Calculate the median 'Price' for each 'Balcony'
Price_By_Balcony = df.groupby('Balcony')['Price'].median().reset_index()

# labels for the x-axis, title, and customized height
fig_Price_By_Balcony = px.line(
    Price_By_Balcony,  # DataFrame containing the data
    x='Balcony',   # x-values: Balcony counts
    y='Price',  # y-values: median Price
    labels={'Balcony': 'Balcony Count'},  # Customize label for the x-axis
    title='Price by Number of Balconies',  # Set the title of the plot
    height=650  # Set the height of the plot
)

# Display the plot
fig_Price_By_Balcony.show()


In [None]:
# Calculate the average 'Price' for each 'overlooking' category and sort in descending order
average_price = df.groupby('overlooking')['Price'].max().reset_index()
average_price = average_price.sort_values(by='Price', ascending=False)

# Select the top 10 'overlooking' categories with the highest average 'Price'
top_10_expensive_price = average_price.head(10)

# Create a bar plot using Plotly Express
fig = px.bar(
    top_10_expensive_price,  # DataFrame containing the data
    x='overlooking',  # x-values: 'overlooking' categories
    y='Price',  # y-values: average 'Price'
    title='Top 10 Overlooking Categories by Max Price',  # Set the title of the plot
    labels={'overlooking': 'Overlooking Category', 'Price': 'Price'},  # Customize labels
    template='plotly_white'  # Use a white template for the plot
)

# Set the height of the plot
fig.update_layout(height=650)

# Display the plot
fig.show()


In [None]:
# Calculate the Max 'Price' for each 'location' and sort in descending order
Max_price = df.groupby('location')['Price'].max().reset_index()
Max_price = Max_price.sort_values(by='Price', ascending=False)

# Select the top 10 locations with the highest average price
top_10_expensive_price = Max_price.head(10)

# Create a bar plot using Plotly Express
fig = px.bar(
    top_10_expensive_price,  # DataFrame containing the data
    x='location',  # x-values: locations
    y='Price',  # y-values: Max prices
    color='Price',  # Color the bars based on the indices
    title='Top 10 Locations by Max Price',  # Set the title of the plot
    labels={'location': 'Location', 'Price': 'Max Price'},  # Set labels for axes
    template='plotly_white'  # Use a white template for the plot
)

# Set font color to black
fig.update_traces(textfont_color='black')

# Set the height of the plot
fig.update_layout(height=650)

# Display the plot
fig.show()


In [None]:
# Calculate the value counts for each unique value in the 'location' column
top10_location = df['location'].value_counts()[:10]

# Create a bar plot using Plotly Express
fig = px.bar(
    y=top10_location.values,  # Use the counts as the y-values
    x=top10_location.index,   # Use the unique values as the x-values
    color=top10_location.index,  # Color the bars based on the unique values
    color_discrete_sequence=px.colors.sequential.PuBuGn,  # Set color palette
    text=top10_location.values,  # Display the count values on top of the bars
    title='Top 10 Locations',  # Set the title of the plot
    template='plotly_white'  # Use a dark template for the plot
)

# Update the layout of the plot
fig.update_layout(
    xaxis_title="Location",  # Label for the x-axis
    yaxis_title="Count",  # Label for the y-axis
    font=dict(size=17, family="Franklin Gothic")  # Set the font size and family for the text
)

# Display the plot
fig.show()


In [None]:
df

In [None]:
#Pie Plot
# Sunburst chart for the distribution of facing
fig2 = px.sunburst(df, path=['facing'], color_discrete_sequence=px.colors.qualitative.Set3)
fig2.update_layout(title_text='Distribution of facing ', height=500)
fig2.show()




#Bar Plot
# Calculate the value counts for each unique value in the 'facing' column
facing_counts = df['facing'].value_counts()

# Create a bar plot using Plotly Express
fig = px.bar(
    y=facing_counts.values,  # Use the counts as the y-values
    x=facing_counts.index,   # Use the unique values as the x-values
    color=facing_counts.index,  # Color the bars based on the unique values
    color_discrete_sequence=px.colors.sequential.PuBuGn,  # Set color palette
    text=facing_counts.values,  # Display the count values on top of the bars
    title='Facing of Companies',  # Set the title of the plot
    template='plotly_dark'  # Use a dark template for the plot
)

# Update the layout of the plot
fig.update_layout(
    xaxis_title="Facing",  # Label for the x-axis
    yaxis_title="Count",   # Label for the y-axis
    font=dict(size=17, family="Franklin Gothic")  # Set the font size and family for the text
)

# Display the plot
fig.show()


In [None]:
# Sunburst chart for the distribution of Furnishing
fig2 = px.sunburst(df, path=['Furnishing'], color_discrete_sequence=px.colors.qualitative.Set3)
fig2.update_layout(title_text='Distribution of Furnishing ', height=500)
fig2.show() 



# Calculate the value counts for each unique value in the 'Furnishing' column
size = df['Furnishing'].value_counts()

# Create a bar plot using Plotly Express
fig = px.bar(
    y=size.values,  # Use the counts as the y-values
    x=size.index,   # Use the unique values as the x-values
    color=size.index,  # Color the bars based on the unique values
    color_discrete_sequence=px.colors.sequential.PuBuGn,  # Set color palette
    text=size.values,  # Display the count values on top of the bars
    title='Furnishing Types',  # Set the title of the plot
    template='plotly_dark'  # Use a dark template for the plot
)

# Update the layout of the plot
fig.update_layout(
    xaxis_title="Furnishing Type",  # Label for the x-axis
    yaxis_title="Count",  # Label for the y-axis
    font=dict(size=17, family="Franklin Gothic")  # Set the font size and family for the text
)

# Display the plot
fig.show()


In [None]:
# Sunburst chart for the distribution of Transaction
fig2 = px.sunburst(df, path=['Transaction'], color_discrete_sequence=px.colors.qualitative.Set3)
fig2.update_layout(title_text='Distribution of Transaction ', height=500)
fig2.show() 



# Calculate the value counts for each unique value in the 'Transaction' column
transaction_counts = df['Transaction'].value_counts()

# Create a bar plot using Plotly Express
fig = px.bar(
    y=transaction_counts.values,  # Use the counts as the y-values
    x=transaction_counts.index,   # Use the unique values as the x-values
    color=transaction_counts.index,  # Color the bars based on the unique values
    color_discrete_sequence=px.colors.sequential.PuBuGn,  # Set color palette
    text=transaction_counts.values,  # Display the count values on top of the bars
    title='Transaction Counts',  # Set the title of the plot
    template='plotly_dark'  # Use a dark template for the plot
)

# Update the layout of the plot
fig.update_layout(
    xaxis_title="Transaction Type",  # Label for the x-axis
    yaxis_title="Count",  # Label for the y-axis
    font=dict(size=17, family="Franklin Gothic")  # Set the font size and family for the text
)

# Display the plot
fig.show()



In [None]:
#Pie Plot
# Sunburst chart for the distribution of Ownership
fig2 = px.sunburst(df, path=['Ownership'], color_discrete_sequence=px.colors.qualitative.Set3)
fig2.update_layout(title_text='Distribution of Ownership ', height=500)
fig2.show()

# Calculate the value counts for each unique value in the 'ownership' column
ownership_counts = df['Ownership'].value_counts()

# Create a bar plot using Plotly Express
fig = px.bar(
    y=ownership_counts.values,  # Use the counts as the y-values
    x=ownership_counts.index,   # Use the unique values as the x-values
    color=ownership_counts.index,  # Color the bars based on the unique values
    color_discrete_sequence=px.colors.sequential.PuBuGn,  # Set color palette
    text=ownership_counts.values,  # Display the count values on top of the bars
    title='Ownership Distribution',  # Set the title of the plot
    template='plotly_white'  # Use a dark template for the plot
)

# Update the layout of the plot
fig.update_layout(
    xaxis_title="Ownership",  # Label for the x-axis
    yaxis_title="Count",  # Label for the y-axis
    font=dict(size=17, family="Franklin Gothic")  # Set the font size and family for the text
)

# Display the plot
fig.show()


In [None]:
def barPlot(variable):
    var = df[variable]
    var_value = var.value_counts().reset_index()
    var_value.columns = [variable, 'Frequency']

    fig = px.bar(
        var_value,
        x=variable,
        y='Frequency',
        color=variable,
        color_discrete_sequence=px.colors.qualitative.Plotly,
        text='Frequency',
        title=variable,
    )

    fig.update_layout(
        xaxis_title=variable,
        yaxis_title="Frequency",
        font=dict(size=12, family="Arial"),
    )

    fig.show()


In [None]:
variable_list =['Bathroom','Balcony']

for x in variable_list:
    barPlot(x)

In [None]:
df

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">7. Categorical</div>

In [None]:
# Select columns with object (categorical) data types
categorical_cols = df.select_dtypes(include='object').columns.tolist()

categorical_cols

In [None]:
Cols_to_transform = ['location','Status','Floor','Transaction','Furnishing','facing','overlooking','Ownership']
# Initialize the LabelEncoder
le = LabelEncoder()

# Apply Label Encoding to the selected numerical columns
for x in Cols_to_transform:  
    df[x] = le.fit_transform(df[x])

# Now, your categorical columns (excluding column 0) have been converted to numerical values
df.head()

In [None]:
df = df.reset_index("Index")
del df["Index"]

In [None]:
# Check Null values 
df.isnull().sum()

In [None]:
df = df.dropna()

df.shape



# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">8. Corr Matrix
</div>

In [None]:
# Calculate the correlation matrix
correlation_matrix = df.corr()

# Create a correlation heatmap using Plotly Express
fig = px.imshow(
    correlation_matrix,  # Matrix containing the data
    labels=dict(x="Features", y="Features", color="Correlation"),  # Customize labels
    x=correlation_matrix.columns,  # x-values: Features
    y=correlation_matrix.columns,  # y-values: Features
    color_continuous_scale='blues',  # Set the color scale
    title='Correlation Heatmap',  # Set the title of the plot
    height=800  # Set the height of the plot
)

# Display the plot
fig.show()

In [None]:
print('Top 5 Most Positively Correlated to the Target Variable')
correlation_matrix['Price'].sort_values(ascending=False).head(5)

In [None]:
print('Top 5 Most Negatively Correlated to the Target Variable')
correlation_matrix['Price'].sort_values(ascending=True).head(5)

In [None]:
columns_to_drop = [col for col in correlation_matrix.columns if abs(correlation_matrix.loc['Price', col]) < 0.3]
columns_to_drop

In [None]:
df1=df.copy()

df1 = df1.drop(columns_to_drop, axis=1)
df1.shape

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">9. spliting the dataset

</div>


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

X = df.drop(columns=['Price','Title','Description'])
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the resulting datasets
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)

# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">10. Model Building and Analysis

</div>

In [None]:
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(random_state=42),
}
best_model = None
best_r2 = 0

for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    # Evaluate the model
    r2 = r2_score(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    submit = pd.DataFrame()
    submit['Price'] = y_test
    submit['Predict_Price'] = y_pred
    submit = submit.reset_index()
    r2 = r2_score(y_test, y_pred)
    if r2 > best_r2:
        best_r2 = r2
        best_model = model.__class__.__name__

    print(f'{model_name}:')
    print(f'R2 Score: {r2:.2f}')
    print(f'Mean Absolute Error (MAE): {mae:.2f}')
    print(f'Root Mean Squared Error (RMSE): {rmse:.2f}')
    print(submit.head(5))

    print('----------------------------------------')
print(f"The best performing model is: {best_model} with accuracy: {best_r2:.2f}")


In [None]:
import statsmodels.api as sm

def forward_selection(df, target, significance_level=0.05):
    initial_features = df.columns.tolist()
    best_features = []
    while len(initial_features) > 0:
        remaining_features = list(set(initial_features) - set(best_features))
        new_pval = pd.Series(index=remaining_features)
        for new_column in remaining_features:
            model = sm.OLS(target, sm.add_constant(df[best_features + [new_column]])).fit()
            new_pval[new_column] = model.pvalues[new_column]
        min_p_value = new_pval.min()
        if min_p_value < significance_level:
            best_features.append(new_pval.idxmin())
        else:
            break
    return best_features

# Assuming you have already defined X and y as the features and target variable respectively
selected_features = forward_selection(X, y)
print("Selected features:", selected_features)


In [None]:
X = df[['Amount', 'Bathroom', 'Carpet_Area', 'location', 'Transaction', 'Ownership', 'Balcony']].values
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(random_state=42),
}
best_model = None
best_r2 = 0

for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    # Evaluate the model
    r2 = r2_score(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    submit = pd.DataFrame()
    submit['Price'] = y_test
    submit['Predict_Price'] = y_pred
    submit = submit.reset_index()
    r2 = r2_score(y_test, y_pred)
    if r2 > best_r2:
        best_r2 = r2
        best_model = model.__class__.__name__

    print(f'{model_name}:')
    print(f'R2 Score: {r2:.2f}')
    print(f'Mean Absolute Error (MAE): {mae:.2f}')
    print(f'Root Mean Squared Error (RMSE): {rmse:.2f}')
    print(submit.head(5))

    print('----------------------------------------')
print(f"The best performing model is: {best_model} with accuracy: {best_r2:.2f}")


# <div style="text-align: center; background-color: #6495ED; font-family:Times New Roman; color: white; padding: 14px; line-height: 1;border-radius:20px">11. feature importances

</div>


In [None]:
importances = model.feature_importances_
feature_names = df.columns
feature_importance_dict = dict(zip(feature_names, importances))
sorted_feature_importance = sorted(feature_importance_dict.items(), key=lambda x: x[1], reverse=True)

top_n = 5  # Set the number of top features to display
top_feature_names, top_importances = zip(*sorted_feature_importance[:top_n])

fig = px.bar(
    x=top_importances,
    y=top_feature_names,
    orientation='h',
    title='Top 5 Feature Importance',
    labels={'x': 'Importance', 'y': 'Feature'},
    color=top_importances,  # Color bars by importance values
    color_continuous_scale='reds',  # Choose a color scale
)

fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')

fig.show()