# Part 0: Load Data and Define Loss Function
This notebook contains shared (common) codes for:
- Import modules
- Read in data and prep dataframe
- Define the loss function

To use it, run the following code at the beginning of a Jupyter notebook: `%run Part0.ipynb`

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

## Read in data and prep dataframe

In [2]:
# read from a local CSV with row numbers as index
df = pd.read_csv("bikeDetails.csv")
# convert seller_type to categorical variable
df["seller_type"] = df["seller_type"].astype("category")

# convert owner to numeric variable
df["owner"] = df["owner"].str[:1].astype(int)

# data summary
print(df.info())
# take a look at the first rows of the data
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1061 entries, 0 to 1060
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   name               1061 non-null   object  
 1   selling_price      1061 non-null   int64   
 2   year               1061 non-null   int64   
 3   seller_type        1061 non-null   category
 4   owner              1061 non-null   int32   
 5   km_driven          1061 non-null   int64   
 6   ex_showroom_price  626 non-null    float64 
dtypes: category(1), float64(1), int32(1), int64(3), object(1)
memory usage: 46.9+ KB
None
                                  name  selling_price  year seller_type  \
0            Royal Enfield Classic 350         175000  2019  Individual   
1                            Honda Dio          45000  2017  Individual   
2  Royal Enfield Classic Gunmetal Grey         150000  2018  Individual   
3    Yamaha Fazer FI V 2.0 [2016-2018]          65000  2015  I

## Define the loss function - RMSE

In [12]:
# loss function - rmse
## obs: obserbation (pandas series)
## pred: prediction (pandas series)
def loss(obs, pred):
    # in case pred is a number instead of a list, make it a list with the same length of the obs list
    if type(pred) == int or type(pred) == float:
        pred = [pred] * len(obs)
        pass
    
    # calculate the rmse
    return (np.mean(np.square(np.subtract(obs, pred)))) ** 0.5