Skip to content

This repository explores various machine learning strategies for the Kaggle competition - "House Prices: Advanced Regression Techniques".

License

Notifications You must be signed in to change notification settings

PierreExeter/kaggle-house-prices

Repository files navigation

Binder

This repository explores various machine learning strategies for the Kaggle competition - "House Prices: Advanced Regression Techniques".

Introduction

The 7 steps of machine learning as decribed here are:

  1. Data collection

When participating in a Kaggle competition, this step is already completed for you.

  1. Data preparation

Deal with missing values, outliers, duplicates, categorical data. The features needs to be normalized, selected and/or engineering and errors and bias need to be removed. Exploratory data analysis and visualisation of the data can help to identify patterns.

  1. Select a model

The model chosen depends on the data. A more complex model does not always constitute a better model.

  1. Train the model

Fit the model to detect patterns in the training data.

  1. Evaluate the model

Use a validation dataset to assess how well a trained model performs on unseen data.

  1. Hyperparameters tuning

Tune parameters to get better performance on the validation dataset.

  1. Generate predictions

The competition

Goal

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

Metric

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Features and target variable

  • SalePrice: The property's sale price in dollars. This is the target variable that you're trying to predict.

  • MSSubClass: The building class

  • MSZoning: The general zoning classification

  • LotFrontage: Linear feet of street connected to property

  • LotArea: Lot size in square feet

  • Street: Type of road access

  • Alley: Type of alley access

  • LotShape: General shape of property

  • LandContour: Flatness of the property

  • Utilities: Type of utilities available

  • LotConfig: Lot configuration

  • LandSlope: Slope of property

  • Neighborhood: Physical locations within Ames city limits

  • Condition1: Proximity to main road or railroad

  • Condition2: Proximity to main road or railroad (if a second is present)

  • BldgType: Type of dwelling

  • HouseStyle: Style of dwelling

  • OverallQual: Overall material and finish quality

  • OverallCond: Overall condition rating

  • YearBuilt: Original construction date

  • YearRemodAdd: Remodel date

  • RoofStyle: Type of roof

  • RoofMatl: Roof material

  • Exterior1st: Exterior covering on house

  • Exterior2nd: Exterior covering on house (if more than one material)

  • MasVnrType: Masonry veneer type

  • MasVnrArea: Masonry veneer area in square feet

  • ExterQual: Exterior material quality

  • ExterCond: Present condition of the material on the exterior

  • Foundation: Type of foundation

  • BsmtQual: Height of the basement

  • BsmtCond: General condition of the basement

  • BsmtExposure: Walkout or garden level basement walls

  • BsmtFinType1: Quality of basement finished area

  • BsmtFinSF1: Type 1 finished square feet

  • BsmtFinType2: Quality of second finished area (if present)

  • BsmtFinSF2: Type 2 finished square feet

  • BsmtUnfSF: Unfinished square feet of basement area

  • TotalBsmtSF: Total square feet of basement area

  • Heating: Type of heating

  • HeatingQC: Heating quality and condition

  • CentralAir: Central air conditioning

  • Electrical: Electrical system

  • 1stFlrSF: First Floor square feet

  • 2ndFlrSF: Second floor square feet

  • LowQualFinSF: Low quality finished square feet (all floors)

  • GrLivArea: Above grade (ground) living area square feet

  • BsmtFullBath: Basement full bathrooms

  • BsmtHalfBath: Basement half bathrooms

  • FullBath: Full bathrooms above grade

  • HalfBath: Half baths above grade

  • Bedroom: Number of bedrooms above basement level

  • Kitchen: Number of kitchens

  • KitchenQual: Kitchen quality

  • TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

  • Functional: Home functionality rating

  • Fireplaces: Number of fireplaces

  • FireplaceQu: Fireplace quality

  • GarageType: Garage location

  • GarageYrBlt: Year garage was built

  • GarageFinish: Interior finish of the garage

  • GarageCars: Size of garage in car capacity

  • GarageArea: Size of garage in square feet

  • GarageQual: Garage quality

  • GarageCond: Garage condition

  • PavedDrive: Paved driveway

  • WoodDeckSF: Wood deck area in square feet

  • OpenPorchSF: Open porch area in square feet

  • EnclosedPorch: Enclosed porch area in square feet

  • 3SsnPorch: Three season porch area in square feet

  • ScreenPorch: Screen porch area in square feet

  • PoolArea: Pool area in square feet

  • PoolQC: Pool quality

  • Fence: Fence quality

  • MiscFeature: Miscellaneous feature not covered in other categories

  • MiscVal: $Value of miscellaneous feature

  • MoSold: Month Sold

  • YrSold: Year Sold

  • SaleType: Type of sale

  • SaleCondition: Condition of sale

About

This repository explores various machine learning strategies for the Kaggle competition - "House Prices: Advanced Regression Techniques".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published