<div style="background:#67FFF0; color:#000; display: flex; justify-content:space-between; padding-left:15px;">
    <h1>Programming For Data Science Course
        <br>
        Week 7: Data Wrangling and Feature Engineering
    </h1>
    <img src="../assets/dataidea-logo.png" alt="DATAIDEA" style="width:100px; flex:end">
</div>

## Data Normalization and Standardization

#### Introduction:
In data analysis and machine learning, preprocessing steps such as data normalization and standardization are crucial for improving the performance and interpretability of models.
This Jupyter Notebook provides an overview of the importance of data normalization and standardization in preparing data for analysis and modeling.

#### Importance:

1. Data Normalization:
   - Uniform Scaling: Ensures all features are scaled to a similar range, preventing dominance by features with larger scales.
   - Improved Convergence: Facilitates faster convergence in optimization algorithms by making the loss surface more symmetric.
   - Interpretability: Easier interpretation as values are on a consistent scale, aiding in comparison and understanding of feature importance.


2. Data Standardization:
   - Mean Centering: Transforms data to have a mean of 0 and a standard deviation of 1, simplifying interpretation of coefficients in linear models.
   - Handling Different Scales: Useful when features have different scales or units, making them directly comparable.
   - Reducing Sensitivity to Outliers: Less affected by outliers compared to normalization, leading to more robust models.
   - Maintaining Information: Preserves relative relationships between data points without altering the distribution shape.

In [2]:
# Data Normalization without libraries:
def minMaxScaling(data):
    min_val = min(data)
    max_val = max(data)
    scaled_data = []
    for value in data:
        scaled = (value - min_val) / (max_val - min_val)
        scaled_data.append(scaled)
    return scaled_data

In [3]:
# Example data
data = [10, 20, 30, 40, 50]
normalized_data = minMaxScaling(data)
print("Normalized data (Min-Max Scaling):", normalized_data)

Normalized data (Min-Max Scaling): [0.0, 0.25, 0.5, 0.75, 1.0]


In [None]:
# Data Standardization without libraries:
def zScoreNormalization(data):
    mean = sum(data) / len(data)
    variance = sum((x - mean) ** 2 for x in data) / len(data)
    std_dev = variance ** 0.5
    standardized_data = [(x - mean) / std_dev for x in data]
    return standardized_data

In [None]:
# Example data
data = [10, 20, 30, 40, 50]
standardized_data = z_score_normalization(data)
print("Standardized data (Z-Score Normalization):", standardized_data)

#### Conclusion:

Both data normalization and standardization are critical preprocessing steps in data analysis and machine learning.
Their importance lies in improving model performance, interpretability, and robustness while preserving the underlying data relationships.
The choice between normalization and standardization depends on the specific characteristics of the data and modeling requirements.

<div style="font-style: futura; background:#67FFF0; color:#000;
    padding:15px">
    <h1>Do you seriously want to learn Programming and Data Analysis with Python?</h1>
    <p>
If you’re serious about learning Programming, Data Analysis with Python and getting prepared for Data Science roles, I highly encourage you to enroll in my Programming for Data Science Course, which I've taught to hundreds of students. Don’t waste your time following disconnected, outdated tutorials
    </p>
    <p>
    My Complete Programming for Data Science Course has everything you need in one place. 
    </p>
    <ul>
        The course offers:
        <li>Duration: Usually 3-4 months</li>
        <li>Sessions: Four times a week (one on one)</li>
        <li>Location: Online or/and at UMF House, Sir Apollo Kagwa Road</li>
    </ul>
    <ul>
        What you'l learn:
        <li>Fundamentals of programming</li>
        <li>Data manipulation and analysis </li>
        <li>Visualization techniques</li>
        <li>Statistical Analysis</li>
        <li>Machine Learning</li>
        <li>Database Management with SQL (optional)</li>
        <li>Web Development with Django (optional)</li>
    </ul>
    <ul style="list-style: none">
    <li>Best</li>
    <li>Juma Shafara</li>
    <li>ML Software Developer, Coding Instructor</li>
    <li>
    <a href="mailto:jumashafara0@gmail.com">jumashafara0@gmail.com</
    </li>
        <li>
    <a href="tel:+256701520768">+256701520768</a> or <a href="tel:+256771754118">+256771754118</a> 
        </li>
    </ul
    <div>
        <img src='../assets/profile.jpg' style="width:100px" alt="Juma Shafara">
        <img src='../assets/dataidea-logo.png' style="width:100px" alt="DATAIDEA">
    </div>
</div>
