The following code provides feature engineering for data after being cleaned. Here the feature engineering includes 
- Naive feature engineering to get sum, average and counts of some features
- get_stats function from Little Boat: https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/discussion/32123

### Import Data

In [31]:
%matplotlib inline
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [32]:
train = pd.read_json('Datacleaned_train.json')
test = pd.read_json('test.json')

### 1. Naive Feature Engineering

In [33]:
def naiveFE(df):
    ''' do naive feature engineering to both the train and test data frame
    '''
    # total number of room
    df["sum_room"] = df["bedrooms"] + df["bathrooms"]
    df["room_diff"] = df["bedrooms"] - df["bathrooms"]
    
    # average price per room (withnan)
    df["price_s"] = df["price"]/df["sum_room"]
    df["price_bed"] = df["price"]/df["bedrooms"]
    df["price_bath"] = df["price"]/df["bathrooms"] 
    
    # number of photos
    df["num_photos"] = df["photos"].apply(len)
    
    # number features
    df["num_features"] = df["features"].apply(len)
    
    # count of words present in description column
    df["num_description_words"] = df["description"].apply(lambda x: len(x.split(" ")))
    
    # created time, year = 2016 constant
    df["created"] = pd.to_datetime(df["created"])
    df["created_month"] = df["created"].dt.month
    df["created_day"] = df["created"].dt.day
    
    return df

In [34]:
train_df = naiveFE(train)
test_df = naiveFE(test)

In [35]:
train_df = train_df.drop(['building_id', 'created', 'listing_id', 'description', 'display_address', 'features', 'manager_id', 'photos', 'street_address'], axis = 1)
test_df = test_df.drop(['building_id', 'created', 'listing_id', 'description', 'display_address', 'features', 'manager_id', 'photos', 'street_address'], axis = 1)

In [36]:
train_df.to_json('Datacleaned_FE0_train_withnan.json')
test_df.to_json('FE0_test_withnan.json')