# Practical 3: Feature Scaling and Dummification\n\n## Objective:\n- Apply feature-scaling techniques like standardization and normalization to numerical features.\n- Perform feature dummification to convert categorical variables into numerical representations.

### 1. Loading the Data\nWe'll start by loading the `cars.csv` dataset. For this practical, we'll focus on two columns: 'Engine Information.Engine Statistics.Horsepower' (a numerical feature) and 'Engine Information.Fuel Type' (a categorical feature). We'll also drop any rows with missing values in these columns to keep things simple.

In [None]:
import pandas as pd\nfrom sklearn.preprocessing import StandardScaler, MinMaxScaler\n\nprint(\"--- Loading Data ---\")\ndf = pd.read_csv('cars.csv')\ncolumns_to_use = ['Engine Information.Engine Statistics.Horsepower', 'Engine Information.Fuel Type']\ndf_subset = df[columns_to_use].dropna().copy()\nprint(\"Original Data (subset):\")\nprint(df_subset.head())\nprint(\"\\n\")

### 2. Feature Scaling\nFeature scaling is a method used to standardize the range of independent variables or features of data. We will demonstrate two common techniques:\n- **Standardization:** This technique rescales features so that they have the properties of a standard normal distribution with a mean of 0 and a standard deviation of 1.\n- **Normalization:** This technique scales all data points to a fixed range, typically 0 to 1.

In [None]:
hp_col = 'Engine Information.Engine Statistics.Horsepower'\n\n# -- Standardization --\nprint(\"--- Standardization ---\")\nscaler_std = StandardScaler()\ndf_subset['Horsepower_Standardized'] = scaler_std.fit_transform(df_subset[[hp_col]])\nprint(\"Data after Standardization:\")\nprint(df_subset[['Horsepower_Standardized', hp_col]].head())\nprint(f\"Mean of standardized horsepower: {df_subset['Horsepower_Standardized'].mean():.2f}\")\nprint(f\"Standard Deviation of standardized horsepower: {df_subset['Horsepower_Standardized'].std():.2f}\")\nprint(\"\\n\")\n\n# -- Normalization --\nprint(\"--- Normalization ---\")\nscaler_norm = MinMaxScaler()\ndf_subset['Horsepower_Normalized'] = scaler_norm.fit_transform(df_subset[[hp_col]])\nprint(\"Data after Normalization:\")\nprint(df_subset[['Horsepower_Normalized', hp_col]].head())\nprint(f\"Min of normalized horsepower: {df_subset['Horsepower_Normalized'].min():.2f}\")\nprint(f\"Max of normalized horsepower: {df_subset['Horsepower_Normalized'].max():.2f}\")\nprint(\"\\n\")

### 3. Dummification\nMany machine learning models require numerical input. Dummification is the process of converting categorical variables into a numerical format. We'll use the `pd.get_dummies()` function from pandas to convert the 'Engine Information.Fuel Type' column into dummy variables.

In [None]:
print(\"--- Dummification ---\")\nfuel_type_col = 'Engine Information.Fuel Type'\nprint(f\"Original unique values in '{fuel_type_col}':\")\nprint(df_subset[fuel_type_col].unique())\nprint(\"\\n\")\n\n# Use pd.get_dummies to convert the 'Fuel Type' column\ndummies = pd.get_dummies(df_subset[fuel_type_col], prefix='FuelType')\n\n# Concatenate the new dummy variables with the original dataframe\ndf_dummified = pd.concat([df_subset, dummies], axis=1)\n\nprint(\"Data after Dummification:\")\nprint(df_dummified.head())\nprint(\"\\n\")\n\nprint(\"--- Practical 3 execution finished ---\")