# **Introduction**
In this notebook, we will execute phase2 of the ICS474 project. 

### **Table of Contents**
- Finding a "Big Dataset" - 1.14GB - ADULTS DATASET.
- Detailed description of the dataset.
- Importing libraries and setting up PySpark.
- Preprocessing the dataset.
   - 4.1 Exploring the data.
   - 4.2 Preprocessing for classification task.
   - 4.3 Preprocessing for regression task. 
- Splitting the data into training and testing.
- Building a classification & a regression model and testing/reporting the results.
  - Classification task.
  - Regression task.
- Using the models.
  - Classification prediction.
  - Regression prediction.

<br></br>
Team members (Group3):
 - Amaan Izhar (201781130/Section02)
 - Farhan M. Abdul Qadir (201771950/Section02)
 - AbdulJawad Mohammad (201744310/Section03) 

### **Finding a "Big Dataset" - 1.14GB - ADULTS DATASET.**
We found the dataset on Kaggle. Its size is 1.14GB. 

Link: https://www.kaggle.com/brijeshbmehta/adult-datasets?select=adult10m


### **Detailed description of the dataset.**
   - Dataset Task: Predict whether the income of each adult exceeds $50K per year based on census data.
   - List of Attributes:
       1. Continuous Features:
           - <b>age</b>
           - <b>fnlwgt(Final Weight)</b>
           - <b>education-num</b>
           - <b>capital-gain</b>
           - <b>capital-loss</b>
           - <b>hourse-per-week</b>
       2. Categorical Features:
           - <b>workclass</b>: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt: continuous
           - <b>education</b>: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool
           - <b>marital-status</b>: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
           - <b>occupation</b>: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
           - <b>relationship</b>: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried
           - <b>race</b>: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
           - <b>sex</b>: Female, Male
           - <b>native-country</b>: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands
           - <b>SalaryClass</b>: <=50k OR >50k

### **Importing libraries and setting up PySpark.**

In [1]:
import findspark
findspark.init()
print(f'Spark dependency found -> {findspark.find()}')

Spark dependency found -> C:\Spark\spark-3.0.3-bin-hadoop2.7


In [32]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import isnan, when, count, col, countDistinct
from pyspark.ml.feature import VectorAssembler, StringIndexer, OneHotEncoder, StandardScaler
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.regression import LinearRegression

In [3]:
# Setting up spark.
spark = SparkSession.builder.appName('ICS474Project').getOrCreate()

### **Preprocessing the dataset.**

In [73]:
df = spark.read.csv(path='adult10m', sep=',', inferSchema=False)

In [74]:
columns = ['a', 'wc', 'fw', 'ed', 'edN', 
           'ms', 'oc', 'rel', 'race', 'sex', 
           'cg', 'cl', 'hpw', 'nc', 'salaryClass']

df = df.toDF(*columns)
df.show(5, truncate=False)

+---+----------------+------+---------+---+------------------+-----------------+-------------+-----+------+----+---+---+-------------+-----------+
|a  |wc              |fw    |ed       |edN|ms                |oc               |rel          |race |sex   |cg  |cl |hpw|nc           |salaryClass|
+---+----------------+------+---------+---+------------------+-----------------+-------------+-----+------+----+---+---+-------------+-----------+
|39 |State-gov       |77516 |Bachelors|13 |Never-married     |Adm-clerical     |Not-in-family|White|Male  |2174|0  |40 |United-States|<=50K      |
|50 |Self-emp-not-inc|83311 |Bachelors|13 |Married-civ-spouse|Exec-managerial  |Husband      |White|Male  |0   |0  |13 |United-States|<=50K      |
|38 |Private         |215646|HS-grad  |9  |Divorced          |Handlers-cleaners|Not-in-family|White|Male  |0   |0  |40 |United-States|<=50K      |
|53 |Private         |234721|11th     |7  |Married-civ-spouse|Handlers-cleaners|Husband      |Black|Male  |0   |0  |40

#### Exploring the data.

In [35]:
print(f'df shape = {df.count()}, {len(df.columns)}')

df shape = 10000000, 15


In [37]:
df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns]).show()

+---+---+---+---+---+---+---+---+----+---+---+---+---+---+-----------+
|  a| wc| fw| ed|edN| ms| oc|rel|race|sex| cg| cl|hpw| nc|salaryClass|
+---+---+---+---+---+---+---+---+----+---+---+---+---+---+-----------+
|  0|  0|  0|  0|  0|  0|  0|  0|   0|  0|  0|  0|  0|  0|          0|
+---+---+---+---+---+---+---+---+----+---+---+---+---+---+-----------+



In [17]:
df.select(countDistinct("a").alias("as"),
          countDistinct("wc").alias("wcs"), 
          countDistinct('edN').alias("edNs"), 
          countDistinct('ms').alias("mss"),
          countDistinct('oc').alias("ocs"),
          countDistinct('rel').alias("rels"), 
          countDistinct('race').alias("races"), 
          countDistinct('sex').alias("sexes"), 
          countDistinct('nc').alias("ncs"), 
          countDistinct('salaryClass').alias("salaryClasses")
          ).show()

+---+---+----+---+---+----+-----+-----+---+-------------+
| as|wcs|edNs|mss|ocs|rels|races|sexes|ncs|salaryClasses|
+---+---+----+---+---+----+-----+-----+---+-------------+
| 73|  9|  16|  7| 15|   6|    5|    2| 42|            2|
+---+---+----+---+---+----+-----+-----+---+-------------+



In [10]:
df.groupBy(df['wc']).count().show()
df.groupBy(df['edN']).count().show()
df.groupBy(df['ms']).count().show()
df.groupBy(df['oc']).count().show()
df.groupBy(df['rel']).count().show()
df.groupBy(df['race']).count().show()
df.groupBy(df['sex']).count().show()
df.groupBy(df['nc']).count().show(30)
df.groupBy(df['salaryClass']).count().show()

+----------------+-------+
|              wc|  count|
+----------------+-------+
|Self-emp-not-inc|1110296|
|       Local-gov|1110045|
|       State-gov|1108295|
|         Private|1129254|
|     Without-pay|1106759|
|     Federal-gov|1108964|
|    Never-worked|1106779|
|               ?|1110548|
|    Self-emp-inc|1109060|
+----------------+-------+

+---+------+
|edN| count|
+---+------+
|  7|623919|
| 15|624516|
| 11|624818|
|  3|621756|
|  8|623454|
| 16|623397|
|  5|623844|
|  6|623061|
|  9|633268|
|  1|622270|
| 10|630730|
|  4|623699|
| 12|623620|
| 13|629065|
| 14|624905|
|  2|623678|
+---+------+

+--------------------+-------+
|                  ms|  count|
+--------------------+-------+
|           Separated|1423556|
|       Never-married|1433420|
|Married-spouse-ab...|1426068|
|            Divorced|1428751|
|             Widowed|1426311|
|   Married-AF-spouse|1423834|
|  Married-civ-spouse|1438060|
+--------------------+-------+

+-----------------+------+
|               oc

In [36]:
df_new = df.select(col('a').cast('double'), col('fw').cast('double'), 
                      col('cg').cast('double'), col('cl').cast('double'),
                      col('hpw').cast('double'), 'wc', 'edN', 'oc', 'salaryClass')
df_new.show(5)

+----+--------+------+---+----+----------------+---+-----------------+-----------+
|   a|      fw|    cg| cl| hpw|              wc|edN|               oc|salaryClass|
+----+--------+------+---+----+----------------+---+-----------------+-----------+
|39.0| 77516.0|2174.0|0.0|40.0|       State-gov| 13|     Adm-clerical|      <=50K|
|50.0| 83311.0|   0.0|0.0|13.0|Self-emp-not-inc| 13|  Exec-managerial|      <=50K|
|38.0|215646.0|   0.0|0.0|40.0|         Private|  9|Handlers-cleaners|      <=50K|
|53.0|234721.0|   0.0|0.0|40.0|         Private|  7|Handlers-cleaners|      <=50K|
|28.0|338409.0|   0.0|0.0|40.0|         Private| 13|   Prof-specialty|      <=50K|
+----+--------+------+---+----+----------------+---+-----------------+-----------+
only showing top 5 rows



#### Preprocessing for classification task.

In [75]:
# Indexing categorical features.
indexer = StringIndexer(inputCols=['wc', 'edN', 'oc', 'salaryClass'], outputCols=['wc_si', 'edN_si', 'oc_si', 'salaryClass_si']) 
df_indexed = indexer.fit(df_new).transform(df_new) 
df_indexed = df_indexed.drop(*['wc', 'edN', 'oc', 'salaryClass'])
df_indexed.show(5)

+----+--------+------+---+----+-----+------+-----+--------------+
|   a|      fw|    cg| cl| hpw|wc_si|edN_si|oc_si|salaryClass_si|
+----+--------+------+---+----+-----+------+-----+--------------+
|39.0| 77516.0|2174.0|0.0|40.0|  6.0|   2.0|  3.0|           0.0|
|50.0| 83311.0|   0.0|0.0|13.0|  2.0|   2.0|  2.0|           0.0|
|38.0|215646.0|   0.0|0.0|40.0|  0.0|   0.0|  8.0|           0.0|
|53.0|234721.0|   0.0|0.0|40.0|  0.0|   6.0|  8.0|           0.0|
|28.0|338409.0|   0.0|0.0|40.0|  0.0|   2.0|  4.0|           0.0|
+----+--------+------+---+----+-----+------+-----+--------------+
only showing top 5 rows



In [76]:
# One hot encoding the categorical features.
onehot = OneHotEncoder(inputCols=['wc_si', 'edN_si', 'oc_si'], outputCols=['wc_ohe', 'edN_ohe', 'oc_ohe'])
df_ohe = onehot.fit(df_indexed).transform(df_indexed)
df_ohe.show(5, truncate=False)

+----+--------+------+---+----+-----+------+-----+--------------+-------------+--------------+--------------+
|a   |fw      |cg    |cl |hpw |wc_si|edN_si|oc_si|salaryClass_si|wc_ohe       |edN_ohe       |oc_ohe        |
+----+--------+------+---+----+-----+------+-----+--------------+-------------+--------------+--------------+
|39.0|77516.0 |2174.0|0.0|40.0|6.0  |2.0   |3.0  |0.0           |(8,[6],[1.0])|(15,[2],[1.0])|(14,[3],[1.0])|
|50.0|83311.0 |0.0   |0.0|13.0|2.0  |2.0   |2.0  |0.0           |(8,[2],[1.0])|(15,[2],[1.0])|(14,[2],[1.0])|
|38.0|215646.0|0.0   |0.0|40.0|0.0  |0.0   |8.0  |0.0           |(8,[0],[1.0])|(15,[0],[1.0])|(14,[8],[1.0])|
|53.0|234721.0|0.0   |0.0|40.0|0.0  |6.0   |8.0  |0.0           |(8,[0],[1.0])|(15,[6],[1.0])|(14,[8],[1.0])|
|28.0|338409.0|0.0   |0.0|40.0|0.0  |2.0   |4.0  |0.0           |(8,[0],[1.0])|(15,[2],[1.0])|(14,[4],[1.0])|
+----+--------+------+---+----+-----+------+-----+--------------+-------------+--------------+--------------+
only showi

In [77]:
for col in ['a', 'fw', 'cg', 'cl', 'hpw']:
    assembler = VectorAssembler(inputCols=[col], outputCol=f'{col}_vec')
    df_ohe = assembler.transform(df_ohe)

    scaler = StandardScaler(inputCol=f'{col}_vec', outputCol=f'{col}_sc')
    df_ohe = scaler.fit(df_ohe).transform(df_ohe)

In [78]:
df_clean = df_ohe.withColumnRenamed('salaryClass_si', 'label')
df_clean = df_clean.select(['a_sc', 'fw_sc', 'cg_sc', 'cl_sc', 'hpw_sc', 'wc_ohe', 'edN_ohe', 'oc_ohe', 'label'])
df_clean.show(5, False)

+--------------------+--------------------+---------------------+-----+--------------------+-------------+--------------+--------------+-----+
|a_sc                |fw_sc               |cg_sc                |cl_sc|hpw_sc              |wc_ohe       |edN_ohe       |oc_ohe        |label|
+--------------------+--------------------+---------------------+-----+--------------------+-------------+--------------+--------------+-----+
|[1.8487061379573442]|[0.6877735810696858]|[0.19880196875001743]|[0.0]|[1.407781753443814] |(8,[6],[1.0])|(15,[2],[1.0])|(14,[3],[1.0])|0.0  |
|[2.3701360743042876]|[0.7391906807948887]|[0.0]                |[0.0]|[0.4575290698692395]|(8,[2],[1.0])|(15,[2],[1.0])|(14,[2],[1.0])|0.0  |
|[1.8013034164712585]|[1.9133549417327194]|[0.0]                |[0.0]|[1.407781753443814] |(8,[0],[1.0])|(15,[0],[1.0])|(14,[8],[1.0])|0.0  |
|[2.5123442387625445]|[2.082601046522753] |[0.0]                |[0.0]|[1.407781753443814] |(8,[0],[1.0])|(15,[6],[1.0])|(14,[8],[1.0])|0.0  |

In [79]:
# We will now assemble our features according to the classification task.

final_assembler_classif = VectorAssembler(inputCols=['a_sc', 'fw_sc', 'cg_sc', 'cl_sc', 'hpw_sc', 'wc_ohe', 'edN_ohe', 'oc_ohe'], outputCol='features')
final_df_classif = final_assembler_classif.transform(df_clean)
final_df_classif = final_df_classif.select(['features', 'label'])
final_df_classif.show(5, False)

+-----------------------------------------------------------------------------------------------------------------+-----+
|features                                                                                                         |label|
+-----------------------------------------------------------------------------------------------------------------+-----+
|(42,[0,1,2,4,11,15,31],[1.8487061379573442,0.6877735810696858,0.19880196875001743,1.407781753443814,1.0,1.0,1.0])|0.0  |
|(42,[0,1,4,7,15,30],[2.3701360743042876,0.7391906807948887,0.4575290698692395,1.0,1.0,1.0])                      |0.0  |
|(42,[0,1,4,5,13,36],[1.8013034164712585,1.9133549417327194,1.407781753443814,1.0,1.0,1.0])                       |0.0  |
|(42,[0,1,4,5,19,36],[2.5123442387625445,2.082601046522753,1.407781753443814,1.0,1.0,1.0])                        |0.0  |
|(42,[0,1,4,5,15,32],[1.327276201610401,3.0025900432970136,1.407781753443814,1.0,1.0,1.0])                        |0.0  |
+-----------------------

#### Preprocessing for regression task.

In [80]:
df_reg = df_new

In [81]:
for col in ['a', 'fw', 'cg', 'cl']:
    assembler = VectorAssembler(inputCols=[col], outputCol=f'{col}_vec')
    df_reg = assembler.transform(df_reg)

    scaler = StandardScaler(inputCol=f'{col}_vec', outputCol=f'{col}_sc')
    df_reg = scaler.fit(df_reg).transform(df_reg)

In [82]:
# We will now assemble our features according to the regression task.

df_reg = df_reg.withColumnRenamed('hpw', 'label')
final_assembler_reg = VectorAssembler(inputCols=['a_sc', 'fw_sc', 'cg_sc', 'cl_sc'], outputCol='features')
final_df_reg = final_assembler_reg.transform(df_reg)
final_df_reg = final_df_reg.select(['features', 'label'])
final_df_reg.show(5, False)

+---------------------------------------------------------------+-----+
|features                                                       |label|
+---------------------------------------------------------------+-----+
|[1.8487061379573442,0.6877735810696858,0.19880196875001743,0.0]|40.0 |
|[2.3701360743042876,0.7391906807948887,0.0,0.0]                |13.0 |
|[1.8013034164712585,1.9133549417327194,0.0,0.0]                |40.0 |
|[2.5123442387625445,2.082601046522753,0.0,0.0]                 |40.0 |
|[1.327276201610401,3.0025900432970136,0.0,0.0]                 |40.0 |
+---------------------------------------------------------------+-----+
only showing top 5 rows



### **Splitting the data into training and testing.**

As the dataset is huge (10M rows), we will take a sample of it i.e. 30-70% of the data will be sampled first. Then we will split it into train-test.

In [83]:
def split(df):
    train, test = df.sample(fraction=0.3, seed=3).randomSplit([0.75, 0.25])
    print(f"Size of train dataset = {train.count()}")
    print(f"Size of test dataset = {test.count()}")
    print()
    return train, test

In [84]:
print('For classification task:')
train_c, test_c = split(df=final_df_classif)
train_c.show(5, truncate=False)
test_c.show(5, truncate=False)

For classification task:
Size of train dataset = 2249488
Size of test dataset = 750256

+------------------------------------------------------------------------------------------------------------------+-----+
|features                                                                                                          |label|
+------------------------------------------------------------------------------------------------------------------+-----+
|(42,[0,1,2,3,4],[0.8532489867495435,2.564608629365143,0.3359698404726606,2.7479168024697795,3.4842598397734394])  |0.0  |
|(42,[0,1,2,3,4],[0.9006517082356292,2.150654395080528,0.24095822799277644,2.3159475107984773,0.49272361370533485])|1.0  |
|(42,[0,1,2,3,4],[0.948054429721715,1.9916828375694908,0.31191974029729047,2.6644834813370046,0.7390854205580023]) |0.0  |
|(42,[0,1,2,3,4],[1.0428598726938865,2.5677939164749524,0.21297598216135724,2.4559001785050674,0.809474508230193]) |0.0  |
|(42,[0,1,2,3,4],[1.1376653156660579,1.049272613701

In [85]:
print('For regression task:')
train_r, test_r = split(df=final_df_reg)
train_r.show(5, truncate=False)
test_r.show(5, truncate=False)

For regression task:
Size of train dataset = 2249435
Size of test dataset = 750309

+------------------------------------------------------------------------------+-----+
|features                                                                      |label|
+------------------------------------------------------------------------------+-----+
|[0.8058462652634577,0.1816146012554234,1.4476697181607756,2.2177115036582746] |74.0 |
|[0.8058462652634577,0.18450709038577992,0.5956743442675315,4.9562084150324175]|47.0 |
|[0.8058462652634577,0.185908971620738,0.26583133539848236,2.2217485998421185] |27.0 |
|[0.8058462652634577,0.1918802758683759,0.34575448198887576,3.050699016258075] |41.0 |
|[0.8058462652634577,0.19930669734723608,0.3359698404726606,2.8919065663602135]|19.0 |
+------------------------------------------------------------------------------+-----+
only showing top 5 rows

+-------------------------------------------------------------------------------+-----+
|features           

### **Building a classification & a regression model and testing/reporting the results.**

#### Classification Task.

In [88]:
lr = LogisticRegression()

print('Training started.')
lr_model = lr.fit(train_c)
print('Training finished.\n')

classif_pred_df = lr_model.transform(test_c)

TP = classif_pred_df.filter(classif_pred_df['label'] == 1).filter(classif_pred_df['prediction'] == 1).count()
TN = classif_pred_df.filter(classif_pred_df['label'] == 0).filter(classif_pred_df['prediction'] == 0).count()
FP = classif_pred_df.filter(classif_pred_df['label'] == 0).filter(classif_pred_df['prediction'] == 1).count()
FN = classif_pred_df.filter(classif_pred_df['label'] == 1).filter(classif_pred_df['prediction'] == 0).count()

acc = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP)
recall = TP / (TP + TN)
f1_Score = (2 * precision * recall)/(precision + recall)

print(f'Test metrics/results:') 
print(f'accuracy = {acc*100:.5f}%')
print(f'precision = {precision*100:.5f}%')
print(f'recall = {recall*100:.5f}%')
print(f'f1-score = {f1_Score*100:.5f}%')

print('\nDataframe with predictions:')
classif_pred_df.show(5)

Training started.
Training finished.

Test metrics/results:
accuracy = 50.01706%
precision = 50.04921%
recall = 46.89119%
f1-score = 48.41876%

Dataframe with predictions:
+--------------------+-----+--------------------+--------------------+----------+
|            features|label|       rawPrediction|         probability|prediction|
+--------------------+-----+--------------------+--------------------+----------+
|(42,[0,1,2,3,4],[...|  0.0|[-0.0234825026104...|[0.49412964410142...|       1.0|
|(42,[0,1,2,3,4],[...|  0.0|[-0.0162071483742...|[0.49594830159490...|       1.0|
|(42,[0,1,2,3,4],[...|  0.0|[-0.0075972826771...|[0.49810068846618...|       1.0|
|(42,[0,1,2,3,4],[...|  1.0|[-0.0109005243306...|[0.49727489590068...|       1.0|
|(42,[0,1,2,3,4],[...|  1.0|[-0.0099282020832...|[0.49751796986679...|       1.0|
+--------------------+-----+--------------------+--------------------+----------+
only showing top 5 rows



#### Regression Task.

In [89]:
lin_reg = LinearRegression()

print('Training started.')
lin_reg_model = lin_reg.fit(train_r)
print('Training finished.\n')

reg_preds_df = lin_reg_model.transform(test_r)

reg_metrics = lin_reg_model.evaluate(test_r)

print(f'Test metrics/results:') 
print(f'MSE = {reg_metrics.meanSquaredError}')

print('\nDataframe with predictions:')
reg_preds_df.show(5)

Training started.
Training finished.

Test metrics/results:
MSE = 806.9079288324978

Dataframe with predictions:
+--------------------+-----+------------------+
|            features|label|        prediction|
+--------------------+-----+------------------+
|[0.80584626526345...| 22.0|48.494436943028155|
|[0.80584626526345...| 64.0| 48.50154758794418|
|[0.80584626526345...| 46.0|  48.3794670792577|
|[0.80584626526345...|  5.0| 48.43548907988203|
|[0.80584626526345...|  7.0| 48.51998045463998|
+--------------------+-----+------------------+
only showing top 5 rows



### **Using the models.**

For using the models, we will take one random sample from the preprocessed test dataset and get predictions.

#### Classification prediction.

In [66]:
random_sample = test_c.sample(0.01).limit(1)
random_sample.show()

+--------------------+-----+
|            features|label|
+--------------------+-----+
|(42,[0,1,2,3,4,5,...|  0.0|
+--------------------+-----+



In [68]:
prediction = lr_model.transform(random_sample)
prediction.show()

+--------------------+-----+--------------------+--------------------+----------+
|            features|label|       rawPrediction|         probability|prediction|
+--------------------+-----+--------------------+--------------------+----------+
|(42,[0,1,2,3,4,5,...|  0.0|[0.04130605650358...|[0.51032504612650...|       0.0|
+--------------------+-----+--------------------+--------------------+----------+



As we can see from the resulted df, the actual label and predicted label are the same. 

#### Regression prediction.

In [71]:
random_sample = test_r.sample(0.01).limit(1)
random_sample.show()

+--------------------+-----+
|            features|label|
+--------------------+-----+
|[0.80584626526345...| 68.0|
+--------------------+-----+



In [72]:
prediction = lin_reg_model.transform(random_sample)
prediction.show()

+--------------------+-----+-----------------+
|            features|label|       prediction|
+--------------------+-----+-----------------+
|[0.80584626526345...| 68.0|48.47679680848794|
+--------------------+-----+-----------------+



As we can see from the resulted df, the actual label is 68.0 and the predicted label is 48.48. The difference is acceptable. 

<center>
  <p><b>THE END</b></p>
</center>