# Polynomial Regression: Office Prices

https://www.hackerrank.com/challenges/predicting-office-space-price/problem

## Input Data

The first line contains F and N.

N = number of train set samples.
F = number of features.

A train set row contains F values separated by a single space followed by office price label.

The train set is followed by a row containing T.

T = number of test set

The test set also contains F features, but no office price label.

In [2]:
# sample input
with open('Office_Prices_input03.txt','r') as fh:
     all_lines = fh.readlines()
all_lines[0:20]

['2 100\n',
 '0.44 0.68 511.14\n',
 '0.99 0.23 717.1\n',
 '0.84 0.29 607.91\n',
 '0.28 0.45 270.4\n',
 '0.07 0.83 289.88\n',
 '0.66 0.8 830.85\n',
 '0.73 0.92 1038.09\n',
 '0.57 0.43 455.19\n',
 '0.43 0.89 640.17\n',
 '0.27 0.95 511.06\n',
 '0.43 0.06 177.03\n',
 '0.87 0.91 1242.52\n',
 '0.78 0.69 891.37\n',
 '0.9 0.94 1339.72\n',
 '0.41 0.06 169.88\n',
 '0.52 0.17 276.05\n',
 '0.47 0.66 517.43\n',
 '0.65 0.43 522.25\n',
 '0.85 0.64 932.21\n']

For Polynomial regression, the features are pre-processed with a polynomial model to make them linear. The polynomial degree is always < 4.

Then, a linear model can be applied to the linear data.

However, the testing features have to be brought from polynomial to linear space first to make predictions.

In [15]:
# Enter your code here. Read input from STDIN. Print output to STDOUT

import fileinput
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Extract the Training and Testing data
training_feature = []
training_label = []
testing_feature = []
i=-1 # counter start value for first line

for line in all_lines:
    # read F and N from first line
    if i == -1:
        F_features = int(line.split(" ")[0])
        N_samples = int((line.split(" ")[1]))
        i = 0
        
    # read training samples and fill into feature and label vectors
    elif i < N_samples:
        split_line = [float(x) for x in line.split(" ")]
        training_feature.append(split_line[0:-1])
        training_label.append(split_line[-1])
        i += 1
        
    # read test samples
    else:
        split_line = [float(x) for x in line.split(" ")]
        # ignore the line with the single digit T
        if len(split_line) > 1:
            testing_feature.append(split_line)

# Preprocessing training features with polynomial model to linear features
poly = PolynomialFeatures(degree=3)
processed_training_feature = poly.fit_transform(training_feature)

# Build linear model
linear = LinearRegression()
linear.fit(processed_training_feature, training_label)

# Preprocessing testing features with poly to linear features
testing_processed = poly.transform(testing_feature)

# Predict the Output
prediction = linear.predict(testing_processed)
for pred in prediction:
    print(pred)

180.3768244254237
1312.0650596555415
440.12925329791335
343.7153800758425
