- ### **Data Preprocessing - Data Transformation**
    ![Data_Transformation](https://github.com/user-attachments/assets/09e4c136-0411-458b-8bd2-cab7ca67ccb5)
    - #### **Normalization** Scaling data to a smaller range, usually [0, 1]. It's commonly used when features have different scales or units.
    ![Normalization](https://github.com/user-attachments/assets/879df362-165e-4e9d-b2b9-db36dbcc0a11)
    - ##### **Methods used to normalize data**
    ![Methods_used_to_normalize_data](https://github.com/user-attachments/assets/eaea4c0f-690a-405f-8708-5f69e3a1e7e2)
        - **Standardization** Adjusting data to have a mean of 0 and a standard deviation of 1, often used when data follows a Gaussian distribution.
        - **Normalization vs. Standardization** Normalization is used for bounded data, while standardization is used when data varies widely in scale.
        ![Normalization_vs._Standardization](https://github.com/user-attachments/assets/b610d933-959d-4460-b81d-5ebe649a4320)

### **Normalization**

In [1]:
from sklearn import preprocessing
import numpy as np
numpy_array = np.array([2,3,5,7,17,20,80,150])
normalized_array = preprocessing.normalize([numpy_array])
normalized_array

array([[0.01160987, 0.0174148 , 0.02902467, 0.04063454, 0.09868389,
        0.11609869, 0.46439475, 0.87074017]])

### **Standardization**

In [2]:
from sklearn.preprocessing import StandardScaler
data = [[0,0],[0,0],[1,1],[-2,-2]]
scaler = StandardScaler()
scaler.fit(data)
scaler.transform(data)

array([[ 0.22941573,  0.22941573],
       [ 0.22941573,  0.22941573],
       [ 1.14707867,  1.14707867],
       [-1.60591014, -1.60591014]])

- ## **Data Preprocessing - Data Transformation**
    - #### **Attribute Selection**
        - **Need for Attribute Subset Selection** Reduces the dimensionality of the data, removing irrelevant or redundant features, which can improve model performance.
        ![Need_for_Attribute_Subset_Selection](https://github.com/user-attachments/assets/8ae9c73e-fd11-4eb6-8a99-26f758c98800)
        ![Attribute Subset Selection](https://github.com/user-attachments/assets/800eebcb-e57b-4688-bce6-d848fe27d19d)
    - #### **Discretization** Converting continuous data into discrete bins or categories, useful for algorithms that require categorical input.
        ![Discretization](https://github.com/user-attachments/assets/2c02f4b7-ed1b-49e2-9b2a-d7ba7e2c716f)
        ![K-mean](https://github.com/user-attachments/assets/d97c6d88-e6f4-4b88-8646-4cfce2774295)
        ![K-mean](https://github.com/user-attachments/assets/f87b5d9b-130b-454e-afdc-424b56230589)
    - #### **Concept Hierarchy Generation** Creating a hierarchy of concepts by organizing attributes into different levels of abstraction.
        ![Concept Hierarchy Generation](https://github.com/user-attachments/assets/5b3c57a4-a3a6-4570-a4aa-1da25e999d87)
        ![City](https://github.com/user-attachments/assets/44ba5451-b68e-4262-bdb7-e3ca49ad54aa)
