House Pricing Dataset Analysis

A comprehensive analysis of the House Pricing dataset with various regression models and feature selection techniques. Score on Kaggle: 0.15

Data Science Workflow

Data cleaning Handling missing values, outliers, and inconsistencies.
Exploratory Data Analysis Understanding the data through summary statistics and visualizations.
- bar plots, count plots, scatter plots, hexplot, probability plots, histograms
Feature Engineering Creating new features or modifying existing ones to improve model performance, such as the "extraordinary" feature storing the information about extra add-ons in house.
Feature Selection: Identifying and selecting the most important features for the model. Techniques used:
- VarianceThreshold: Removes features with low variance.
- Eliminating highly correlated features,
- Correlation Matrix: Identifies the most correlated features.
- RFE (Recursive Feature Elimination): Selects features by recursively considering smaller sets of features.
- AIC Minimization Method: Stepwise regression using the dmba library to select features based on the Akaike Information Criterion.
Regression Diagnostics: Performing comprehensive diagnostics including:
- Coefficient Analysis: Evaluating the significance and impact of each feature's coefficient.
- Multicollinearity Rejection Using VIF (Variance Inflation Factor): Identifying and removing highly collinear features.
- Residual Analysis: Checking the residuals to ensure they meet assumptions (e.g., normality, homoscedasticity).
- Partial Residual Analysis: Understanding the relationship between each predictor and the response, adjusting for other predictors.
- Leverage and Influence: Using the OLS method with the statsmodels library to identify influential data points.
- Outlier Handling: Detecting and managing outliers.
- Cook's Distance: Measuring the influence of each observation on the fitted model.
- Hat Values: Identifying leverage points in the data.
- Studentized Residuals: Identifying outliers based on standardized residuals.
Model Building: Selecting and training regression models.
- Standardization: StandardScaler
- testing influence of various feature set on model performance
- Linear models Linear Regression, Ridge, Lasso, ElasticNet
- Tree-based models Random Forest
- Cross-validation techniques & Hyperparameter: K-Fold Cross-Validation and Grid Search
Model Evaluation: Assessing the performance of models using appropriate metrics.
- RMSE, MAE, MAD, R^2
- Feature Importance, permutation importance to evaluate feature relevance.
Deployment: Implementing the model in for Kaggle compatiion

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
rb-housepricing-regression (2).ipynb		rb-housepricing-regression (2).ipynb
rb-housepricing-regression (5).ipynb		rb-housepricing-regression (5).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

House Pricing Dataset Analysis

Data Science Workflow

Technical - General

Programming Languages and libiraries:

Environments

About

Releases

Packages

Languages

RomaBugiel/ML----House_Pricing_Regression

Folders and files

Latest commit

History

Repository files navigation

House Pricing Dataset Analysis

Data Science Workflow

Technical - General

Programming Languages and libiraries:

Environments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages