# Basic Data Cleaning for Machine Learning

In this exercise, you'll work with a dataset containing four features: date, number of customers, condition and temperature, and one label: number of ice cream sales. You'll practice importing the dataset, cleaning the data, handling missing values, encoding categorical variables, and standardizing the data using Pandas.

## Step 1: Download and Load the Dataset

1. **Download the Dataset**: Ensure that the Excel file you are working with includes the following columns: `date`, `number of customers`, `temperature`, `condition`, and `number of ice cream sales`.

   **Hint**: You should have an Excel file prepared and saved in the correct format before starting the hands-on exercise.

2. **Load the Dataset** using Pandas. Import the data into a DataFrame so you can work with it.

   **Hint**: Pandas has a function to read Excel files directly into DataFrames. 

## Step 2: Cleaning the Data

1. Check for missing values in the dataset. Before moving forward, identify which values are missing and in which columns.

Hint: Pandas has a function that helps you detect missing values in your dataset.

2. Handle the missing values accordingly:

    - For numerical data, fill the missing values with the average value of the column.
    - For categorical data, fill the missing values with the most frequently occurring value in the column.

Hint: Pandas provides functions to fill missing values with a specific value or a calculation like mean or mode.

## Step 3: Encoding Categorical Variables

1. There is one categorical variable in the dataset. Encode this variable to make it usable in your analysis. Machine learning algorithms work with numerical values, so you'll need to convert this categorical variable into a numerical format.

You can use a function in Pandas that transforms a categorical column into multiple binary (0 or 1) columns.

Hint: Look for a Pandas method that "expands" categories into separate columns, with each representing one category.

## Step 4: Standardizing the Data

1. Scale the numerical features so that they are in the same range. This is important because machine learning algorithms may perform better when features have similar scales.

You can use a scaler from the sklearn.preprocessing module to standardize numerical features. Look for a method that ensures your data has a mean of 0 and a standard deviation of 1.

Hint: Remember to fit the scaler on the numerical columns and then transform them before continuing to the next step.

## Step 5: Save the Cleaned Data

1. **Create an Output Folder**: It’s always a good idea to keep your files organized. If the folder for saving outputs doesn’t exist, make sure to create it first.

   **Hint**: There's a method in Python for creating directories, but make sure you check if it exists before creating it.

2. **Save the Cleaned Dataset**: After cleaning and processing the data, save it as an Excel file with the name `small-store-sales-clean.xlsx` inside the output folder.

   **Hint**: Pandas can write DataFrames to Excel files with a specific function.

3. **Verify the Output**: Once you’ve run the code, check the folder to confirm that the file was saved correctly.

   **Hint**: It’s good to manually check the folder in your project directory to see if the file appears as expected.