In [2]:
!pip install imblearn

Collecting imblearn
  Using cached https://files.pythonhosted.org/packages/81/a7/4179e6ebfd654bd0eac0b9c06125b8b4c96a9d0a8ff9e9507eb2a26d2d7e/imblearn-0.0-py2.py3-none-any.whl
Collecting imbalanced-learn (from imblearn)
  Using cached https://files.pythonhosted.org/packages/e6/62/08c14224a7e242df2cef7b312d2ef821c3931ec9b015ff93bb52ec8a10a3/imbalanced_learn-0.5.0-py3-none-any.whl
Collecting scikit-learn>=0.21 (from imbalanced-learn->imblearn)
  Using cached https://files.pythonhosted.org/packages/d6/9e/6a42486ffa64711fb868e5d4a9167153417e7414c3d8d3e0d627cf391e1e/scikit_learn-0.21.3-cp37-cp37m-win_amd64.whl
Installing collected packages: scikit-learn, imbalanced-learn, imblearn
Successfully installed imbalanced-learn-0.5.0 imblearn-0.0 scikit-learn-0.21.3


<span style="color:#008abc"><b>Problem Statement</b></span>

A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below.

The company is looking at prospective properties to buy to enter the market.Build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.

The company wants to know:

- Which variables are significant in predicting the price of a house, and

- How well those variables describe the price of a house.

Also, determine the optimal value of lambda for ridge and lasso regression.

<span style="color:#008abc"><b>Business Goal</b></span>

- Model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables.
- They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. 
- Further, the model will be a good way for the management to understand the pricing dynamics of a new market.

## <span style="color:#008abc"><b> Data Preparation</b></span>

In [1]:
# Import the required libraries

# Analysis and computation
import numpy as np
import pandas as pd

# Plotting
import matplotlib.pyplot as plt
import seaborn as sns
params = {'legend.fontsize': 'x-large',
          'figure.figsize': (10,8),
         'axes.labelsize': 'x-large',
         'axes.labelcolor': '#008abc',
         'axes.titlesize':'15',
         'text.color':'green',
         'axes.titlepad': 35,
         'xtick.labelsize':'small',
         'ytick.labelsize':'small'}
plt.rcParams.update(params)

# Model building & evaluation
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.feature_selection import RFE
import statsmodels.api as sm 

# Ignore the warnings
import warnings
warnings.filterwarnings('ignore')

# Autocomplete in cell
%config IPCompleter.greedy=True

KeyboardInterrupt: 

> ### <span style="color:#008abc">Read the data</span>

In [None]:
housing_df = pd.read_csv('train.csv')

In [None]:
housing_df.head()


> ### <span style="color:#008abc">Inspect the data</span>

In [None]:
housing_df.shape

In [None]:
housing_df['Id'].nunique()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> No duplicate values found</span>
</div>

In [None]:
housing_df.info()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> We find few columns with missing values. Let's further check on the percenatge of values that are missing.
</div>

___

## <font color='#008abc'>Data Cleaning</font> 

In [None]:
## Find Percentage of NULL values
mis_val_percent = round((100 * housing_df.isnull().sum() / len(housing_df)),2)

## Fetch columns where percentage of missing records is greater than 0
mis_cols=mis_val_percent.loc[(mis_val_percent>0)].sort_values(ascending=False)
mis_cols

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        There are few columns that do not even contain close to 50% of the data. We will remove those columns as imputing them with values might introduce bias in the dataset.
    </span>
    </div>

In [None]:
## Droppig columns with grater tha 45% data is missing
housing_df.drop(mis_cols[mis_cols>45].index,inplace=True,axis=1)
housing_df.info()

In [None]:
## Columns dropped
mis_cols[mis_cols>45].index

In [None]:
housing_df.drop('MiscVal',axis=1,inplace=True)

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        As we have dropped the <b>MiscFeature</b>, we can also drop the <b>MiscVal</b> ,as it is nothing but the value of the corresponding feature. Let's handle cases where the missing values are less than 20%.
    </span>
    </div>

> ### <span style="color:#008abc">Handle missing values </span>

In [None]:
## Fetch column containing less than 45% of missing values
mis_cols[mis_cols<45]

> Let's analyze the above columns based on their datatypes and relations, to check if the column/rows needs to be retained and decide on the imputation to be performed.

> ### <font color='#008abc'>Categorical variables</font>

In [None]:
## Fetch string datatypes
categorical_col = list(housing_df[mis_cols[mis_cols<45].index].select_dtypes(include='object').columns)

## Identify the top values and it's frequency
housing_df[categorical_col].describe()

In [None]:
## Function to plot the data and print the top value % contribution
def check_col(col):
    from IPython.display import display, HTML
    sns.countplot(housing_df[col])
    plt.title(col +": Percentage of missing value:"+str(mis_cols[col])+"%")
    plt.xticks(rotation='vertical')
    plt.show()
    text='''The top value is <b>%s</b> and it makes up for <b>%s</b> amount of the data'''%(housing_df[col].value_counts().idxmax(),round((housing_df[col].value_counts().max()/housing_df[col].count()*100),2))
    data=HTML('''<div class="alert alert-block alert-info"><span style="color:black">'''+text+'''</span></div>''')
    display(data)

In [None]:
## Columns related to Garage
Garage_missing = mis_cols[categorical_col[0:4]].index
Garage_missing

In [None]:
housing_df[Garage_missing]=housing_df[Garage_missing].replace(np.nan,'No Garage')


> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <b>NA</b> values in the 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond' columns mean <b>No Garage</b> as per the metadata description. So we have replaced the same for the missing values.
    </span>
    </div>

---

In [None]:
## Columns related to Basement
Bsmt_missing = mis_cols[categorical_col[4:9]].index
Bsmt_missing

In [None]:
housing_df[Bsmt_missing]=housing_df[Bsmt_missing].replace(np.nan,'No Basement')

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <b>NA</b> values in the 'BsmtFinType2', 'BsmtExposure', 'BsmtFinType1', 'BsmtCond', 'BsmtQual' columns mean <b>No Basement</b> as per the metadata description. So we have replaced the same for the missing values.
    </span>
    </div>

---

In [None]:
## Masonry veneer type column - categorical_col[9]
check_col(categorical_col[9])

In [None]:
housing_df[housing_df[categorical_col[9]].isnull()]['MasVnrArea'].unique()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
      As the MasVnrArea is also null, we can assume that there was no veneer type, hence we will replace the missing values with None  
    </span>
    </div>

In [None]:
housing_df[categorical_col[9]]=housing_df[categorical_col[9]].replace(np.nan,'None')

---

In [None]:
## Electrical - categorical_col[10]
check_col(categorical_col[10])

In [None]:
## As the evident top contributor, we will replace the missing values with SBrkr
housing_df[categorical_col[10]]=housing_df[categorical_col[10]].replace(np.nan,'SBrkr')

---

> ### <font color='#008abc'>Numerical variables</font>

In [None]:
## Fetch numerical datatypes
int_mis_cols=housing_df[mis_cols[mis_cols<45].index].select_dtypes(include='float').columns
int_mis_cols

In [None]:
## LotFrontage
print("LotFrontage contains %s missing values"%(mis_cols.LotFrontage))

In [None]:
housing_df['LotFrontage']=housing_df.groupby('Neighborhood')['LotFrontage'].transform(lambda x: x.fillna(x.median()))

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
     As most of the lots in a similar neighbhourhood contain similar feet of street connected to the Lot, we have imputed the missing values by using the median values for a neighbhourhood.   
    </span>
    </div>

In [None]:
## The integer variables ar ehandled in a similar fashion like the categorical counterparts
housing_df[int_mis_cols[1]]=housing_df[int_mis_cols[1]].replace(np.nan,0)
housing_df[int_mis_cols[2]]=housing_df[int_mis_cols[2]].replace(np.nan,0)

In [None]:
## Find Percentage of NULL values
mis_val_percent = round((100 * housing_df.isnull().sum() / len(housing_df)),2)
mis_val_percent[mis_val_percent>0]

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> We find that all the missing values are handled. Now let's perform EDA on the dataset.</span>
</div>

## <span style="color:#008abc">Analyze the dataset :EDA</span>

In [None]:
housing_df.columns.nunique()

In [None]:
## Area columns with respect to SalePrice
plt.figure(figsize=(16,13))
plt.subplot(2,3,1)
plt.scatter(housing_df['MasVnrArea'],housing_df.SalePrice)
plt.title('MasVnrArea vs SalePrice')
plt.subplot(2,3,2)
plt.scatter(housing_df['TotalBsmtSF'],housing_df.SalePrice)
plt.title('TotalBsmtSF vs SalePrice')
plt.subplot(2,3,3)
plt.scatter(housing_df['1stFlrSF'],housing_df.SalePrice)
plt.title('1stFlrSF vs SalePrice')
plt.subplot(2,3,4)
plt.scatter(housing_df['GarageArea'],housing_df.SalePrice)
plt.title('GarageArea vs SalePrice')
plt.subplot(2,3,5)
plt.scatter(housing_df['GrLivArea'],housing_df.SalePrice)
plt.title('GrLivArea vs SalePrice')
plt.subplot(2,3,6)
plt.scatter(housing_df['LotArea'],housing_df.SalePrice)
plt.title('LotArea vs SalePrice')
plt.tight_layout()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <ul>
            <li> We find that the outliers are affecting the salesprice, but most of the ares have a linear relationship with price.</li>
    </span>
</div>

>### <span style="color:#008abc">Analyze the target variable</span>

In [None]:
sns.distplot(housing_df.SalePrice)

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b>From the plot we can see that the <b>Sale Price</b>is skewed towards the left.Let's handle the skewness using the log transformation. </span>
</div>


In [None]:
housing_df.SalePrice=np.log(housing_df.SalePrice)

In [None]:
sns.distplot(housing_df.SalePrice)

## <span style="color:#008abc">Data Preparation</span>

In [None]:
## Year as number of years from current year
import datetime
curr_year=datetime.datetime.now().year
housing_df['YearBuilt'] = curr_year - housing_df['YearBuilt']
housing_df['YearRemodAdd'] = curr_year - housing_df['YearRemodAdd']
housing_df['GarageYrBlt'] = curr_year - housing_df['GarageYrBlt']
housing_df['YrSold'] = curr_year - housing_df['YrSold']

In [None]:
## Determine the integer variables which are categorical in nature.
Numerics=['int64','float64']
integer_cols=housing_df.select_dtypes(include=Numerics)
integer_cols.drop('Id',axis=1,inplace=True)
int_cols = integer_cols.nunique()
int_cols[int_cols<50]

In [None]:
## convert integer levels to categorical type
housing_df['MSSubClass'] = housing_df['MSSubClass'].astype('object')
housing_df['OverallQual'] = housing_df['OverallQual'].astype('object')
housing_df['OverallCond'] = housing_df['OverallCond'].astype('object')
housing_df['BsmtFullBath'] = housing_df['BsmtFullBath'].astype('object')
housing_df['BsmtHalfBath'] = housing_df['BsmtHalfBath'].astype('object')
housing_df['FullBath'] = housing_df['FullBath'].astype('object')
housing_df['HalfBath'] = housing_df['HalfBath'].astype('object')
housing_df['BedroomAbvGr'] = housing_df['BedroomAbvGr'].astype('object')
housing_df['KitchenAbvGr'] = housing_df['KitchenAbvGr'].astype('object')
housing_df['TotRmsAbvGrd'] = housing_df['TotRmsAbvGrd'].astype('object')
housing_df['Fireplaces'] = housing_df['Fireplaces'].astype('object')
housing_df['GarageCars'] = housing_df['GarageCars'].astype('object')

In [None]:
## Check the correlation among the numerics features
Numerics=['int64','float64']
integer_cols=housing_df.select_dtypes(include=Numerics)
int_corr=integer_cols.corr()
int_corr=int_corr.transform(lambda x : round(x,2))
plt.figure(figsize=(20,20))
sns.heatmap(int_corr,cmap = plt.cm.RdYlBu_r, annot=True,vmin = -0.00,vmax = 1)

> ### <font color='#008abc'> Outlier Handling</font>

In [None]:
## Drop outliers for numerical columns using the Interquartile range
num_col = housing_df.select_dtypes(include=Numerics).columns
num_col.drop('Id')
# num_col = ['LotArea','MasVnrArea','BsmtFinSF1','BsmtFinSF2','TotalBsmtSF','1stFlrSF','GrLivArea','OpenPorchSF',
#            'EnclosedPorch','3SsnPorch',
#            'ScreenPorch' ,'PoolArea','MiscVal','SalePrice']
def drop_outliers(x):
    
    for col in num_col:
        Q1 = x[col].quantile(.05)
        Q3 = x[col].quantile(.95)
        IQR = Q3-Q1
        x =  x[(x[col] >= (Q1-(1.5*IQR))) & (x[col] <= (Q3+(1.5*IQR)))] 
    return x   

housing_df = drop_outliers(housing_df)

In [None]:
housing_df[num_col].head()

In [None]:
## Area columns with respect to SalePrice
plt.figure(figsize=(16,13))
plt.subplot(2,3,1)
plt.scatter(housing_df['MasVnrArea'],housing_df.SalePrice)
plt.title('MasVnrArea vs SalePrice')
plt.subplot(2,3,2)
plt.scatter(housing_df['TotalBsmtSF'],housing_df.SalePrice)
plt.title('TotalBsmtSF vs SalePrice')
plt.subplot(2,3,3)
plt.scatter(housing_df['1stFlrSF'],housing_df.SalePrice)
plt.title('1stFlrSF vs SalePrice')
plt.subplot(2,3,4)
plt.scatter(housing_df['GarageArea'],housing_df.SalePrice)
plt.title('GarageArea vs SalePrice')
plt.subplot(2,3,5)
plt.scatter(housing_df['GrLivArea'],housing_df.SalePrice)
plt.title('GrLivArea vs SalePrice')
plt.subplot(2,3,6)
plt.scatter(housing_df['LotArea'],housing_df.SalePrice)
plt.title('LotArea vs SalePrice')
plt.tight_layout()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <ul>
            <li> We find that after removing the outliers the inear relationship with price is more clear.</li>
    </span>
</div>

> #### <font color='#008abc'>Create Dummy variables</font>

In [None]:
col_cat=housing_df.nunique()
binary_cols=col_cat[col_cat<3]
binary_cols.index

In [None]:
housing_df[binary_cols.index].apply(lambda x :print(x.name,x.unique()))

In [None]:
housing_df.drop(['LowQualFinSF','3SsnPorch','PoolArea'],axis=1,inplace=True)

In [None]:
housing_df['Street']=housing_df['Street'].map({'Pave': 1, 'Grvl': 0})
housing_df['Utilities']=housing_df['Utilities'].map({'AllPub': 1, 'NoSeWa': 0})
housing_df['CentralAir']=housing_df['CentralAir'].map({'Y': 1, "N": 0})

In [None]:
## One hot encoding
categorical_fields=housing_df.select_dtypes(include='object')
categorical_columns=categorical_fields.nunique().sort_values(ascending=False).index

In [None]:
dummy_vars = pd.get_dummies(housing_df[categorical_columns], drop_first=True)
dummy_vars.head()

In [None]:
housing_df = pd.concat([housing_df, dummy_vars], axis=1)
housing_df = housing_df.drop(categorical_columns, axis = 1)
housing_df.shape

In [None]:
house_price_df = housing_df.copy()

In [None]:
house_price_df.drop('Id',axis=1,inplace=True)

In [None]:
house_price_df.select_dtypes(include=Numerics).columns

---

## <font color='#008abc'>Splitting Data into Training and Testing Sets</font>

In [None]:
from sklearn.model_selection import train_test_split

# We specify this so that the train and test data set always have the same rows, respectively
np.random.seed(0)
df_train, df_test = train_test_split(house_price_df, train_size = 0.7, test_size = 0.3, random_state = 100)

In [None]:
#### Rescaling the features

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_train_scaled = scaler.fit_transform(df_train.values)
df_train = pd.DataFrame(df_train_scaled, index=df_train.index, columns=df_train.columns)

In [None]:
df_train.head()

## <font color='#008abc'>Model Building</font>

In [None]:
## Divide into X and Y set for model building

y_train = df_train.pop('SalePrice')
X_train = df_train

### <font color='#008abc'>Ridge Regression</font>

In [None]:
# list of alphas to tune
params = {'alpha': [0.0001, 0.001, 0.01, 0.05, 0.1, 
 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 
 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20, 50, 100, 500, 1000 ]}


ridge = Ridge()

# cross validation
folds = 5
model_cv = GridSearchCV(estimator = ridge, 
                        param_grid = params, 
                        scoring= 'neg_mean_absolute_error', 
                        cv = folds, 
                        return_train_score=True,
                        verbose = 1)            
model_cv.fit(X_train, y_train) 

In [None]:
## Print the best parameter lambda and the best NMSE score
print(model_cv.best_params_)
print(model_cv.best_score_)

In [None]:
cv_results = pd.DataFrame(model_cv.cv_results_)
cv_results = cv_results[cv_results['param_alpha']<=1000]
cv_results.head()

In [None]:
# plotting mean test and train scoes with alpha 
cv_results['param_alpha'] = cv_results['param_alpha'].astype('int32')
plt.figure(figsize=(16,5))

# plotting
plt.plot(cv_results['param_alpha'], cv_results['mean_train_score'])
plt.plot(cv_results['param_alpha'], cv_results['mean_test_score'])
plt.xlabel('alpha')
plt.ylabel('Negative Mean Absolute Error')
plt.title("Negative Mean Absolute Error and alpha")
plt.legend(['train score', 'test score'], loc='upper right')
plt.show()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <ul>
            <li>From the above graph and best parameter we have alpha as <b>4</b></li>
    </span>
</div>

In [None]:
alpha = 4
ridge = Ridge(alpha=alpha)
ridge.fit(X_train, y_train)
r_coeff=ridge.coef_
r_coeff[r_coeff!=0].shape

In [None]:
#lets predict the R-squared value of test and train data
y_train_pred = ridge.predict(X_train)
RR2=metrics.r2_score(y_train, y_train_pred)
print("Ridge R squared (train):",RR2)

In [None]:
# Plot the histogram of the error terms
fig = plt.figure()
sns.distplot((y_train - y_train_pred), bins = 20)
fig.suptitle('Error Terms', fontsize = 20)                  # Plot heading 
plt.xlabel('Errors', fontsize = 18)                         # X-label

### <font color='#008abc'>Lasso Regression</font>

In [None]:
lasso = Lasso()

# cross validation
model_cv = GridSearchCV(estimator = lasso, 
                        param_grid = params, 
                        scoring= 'neg_mean_absolute_error', 
                        cv = folds, 
                        return_train_score=True,
                        verbose = 1)            

model_cv.fit(X_train, y_train) 

In [None]:
print(model_cv.best_params_)
print(model_cv.best_score_)

In [None]:
cv_results = pd.DataFrame(model_cv.cv_results_)
cv_results.head()

In [None]:
# plotting mean test and train scoes with alpha 
cv_results['param_alpha'] = cv_results['param_alpha'].astype('float32')

# plotting
plt.plot(cv_results['param_alpha'], cv_results['mean_train_score'])
plt.plot(cv_results['param_alpha'], cv_results['mean_test_score'])
plt.xlabel('alpha')
plt.ylabel('Negative Mean Absolute Error')
plt.xscale('log')

plt.title("Negative Mean Absolute Error and alpha")
plt.legend(['train score', 'test score'], loc='upper left')
plt.show()

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b> 
        <ul>
            <li>From the above graph and best parameter we have alpha as <b>0.0001</b></li>
    </span>
</div>

In [None]:
alpha = 0.0001
lasso = Lasso(alpha=alpha)      
lasso.fit(X_train, y_train) 

In [None]:
lasso_c=lasso.coef_
lasso_c[lasso_c!=0].shape

In [None]:
#lets predict the R-squared value of test and train data
y_train_pred = lasso.predict(X_train)
LR2=metrics.r2_score(y_true=y_train, y_pred=y_train_pred)
print("Lasso R squared(Train)",LR2)

In [None]:
fig = plt.figure()
sns.distplot((y_train - y_train_pred), bins = 20)
fig.suptitle('Error Terms', fontsize = 20)                  # Plot heading 
plt.xlabel('Errors', fontsize = 18)                         # X-label

## <font color='#008abc'>Model Evaluation</font>

In [None]:
df_test.head()

In [None]:
df_test_scaled = scaler.transform(df_test.values)
df_test = pd.DataFrame(df_test_scaled, index=df_test.index, columns=df_test.columns)

In [None]:
## Divide into X and Y set for model building

y_test = df_test.pop('SalePrice')
X_test = df_test

In [None]:
#lets predict the R-squared value of test and train data
y_test_pred = lasso.predict(X_test)
LR2TS=metrics.r2_score(y_true=y_test, y_pred=y_test_pred)
from sklearn.metrics import mean_squared_error
LRMSE= mean_squared_error(y_test, y_test_pred)

In [None]:
#lets predict the R-squared value of test and train data
y_test_pred = ridge.predict(X_test)
RR2TS=metrics.r2_score(y_true=y_test, y_pred=y_test_pred)
from sklearn.metrics import mean_squared_error
RRMSE=mean_squared_error(y_test, y_test_pred)

### <font color='#008abc'>Metrics</font>

In [None]:
print("Ridge")
print("Train R square",RR2)
print("Test R square",RR2TS)
print("RMSE",RRMSE)

print("Lasso")
print("Train R square",LR2)
print("Test R square",LR2TS)
print("RMSE",LRMSE)

In [None]:
model_param = list(ridge.coef_)
cols = list(df_train.columns)
ridge_coef = pd.DataFrame(list(zip(cols,model_param)))
ridge_coef.columns = ['Featuere','Coef']
ridge_coef.sort_values(by='Coef',ascending=False).head(10)

In [None]:
model_param = list(lasso.coef_)
cols = list(df_train.columns)
lasso_coef = pd.DataFrame(list(zip(cols,model_param)))
lasso_coef.columns = ['Featuere','Coef']
lasso_coef.sort_values(by='Coef',ascending=False).head(10)

In [None]:
lasso_coef.sort_values(by='Coef',ascending=True).head(10)

> <div class="alert alert-block alert-info">
    <span style="color:black"><b>Inference:</b>
        From the scores we can find that the R2 scores and RMSE are quite similar for both the models. We can go with Lasso regression,
        as the number of features are reduced due to feature elimination and even with small value of alpha many of the coefficients are reducing to absolute zeroes
        </span>
</div>


<div class="alert alert-block alert-info">
    <span style="color:black"><b>Suggestions:</b> Factors that positively affect the price of the house are
         <ul>
            <li><b>GrLivArea:</b> Above grade (ground) living area square feet</li>
            <li><b>TotalBsmtSF:</b> Total square feet of basement area</li>
            <li><b>OverallQual:</b> Rates the overall material and finish of the house. The higher the quality, the higher is the impact on the price.
                <b>10</b> - Very Excellent
                <b> 9</b> - Excellent
                <b> 8</b> - Very Good
            </li>
            <li><b>BsmtFinSF1:</b> Type 1 finished square feet</li>
            <li><b>GarageArea:</b> Size of garage in square feet</li>
            <li><b>Neighborhood:</b> Physical locations within Ames city limits. The prices are high in </li>
                <b>Stone Brook</b>
                <b>Crawford</b>
            <li><b>LotArea:</b> Lot size in square feet</li>
            </ul>
        </span>
</div>


<div class="alert alert-block alert-info">
    <span style="color:black"><b>Suggestions:</b> Factors that negatively affect the price of the house are, these values lead to lower price
         <ul>
            <li><b>YearBuilt:</b> Above grade (ground) living area square feet</li>
            <li><b>Gravity furnace</b> 
            <li><b>OverallQual:</b> Rates the overall material and finish of the house. The lower the quality, the lesser is the price.
            </li>
            <li><b>Average/Typical or FairKitchen Quality</b></li>
            <li><b>Fair Exterior Quality</b> </li>
            </ul>
        </span>
</div>
