# Data Preprocessing Report

## 1. Data Loading and Exploration

### Dataset: Gym Dataset

- **Source:** Provided dataset
- **Rows:** 19
- **Columns:** 3
- **Features:** `Date`, `workout`, `time`

### Data Types & Missing Values

```python
import pandas as pd

gym_df = pd.read_csv(r"C:\Users\lenovo\Downloads\archivegym.csv")

gym_df.info()

gym_df.isnull().sum()
```

#### Observations:

- The dataset contains **three categorical columns**.
- No missing values were detected.
- The `time` column is stored as a string and needs conversion.

---

## 2. Handling Data Types

Since `time` is stored as a string, we will convert it into **total minutes** for easier numerical analysis.

```python
def convert_time_to_minutes(time_str):
    if ':' in time_str:
        h, m, s = map(int, time_str.split(':'))
        return h * 60 + m
    return int(time_str)  

gym_df['time'] = gym_df['time'].apply(convert_time_to_minutes)
```

#### Explanation:
- This function converts `HH:MM:SS` format into **total minutes**.
- If the value is numeric (e.g., `0`), it remains unchanged.

---

## 3. Encoding Categorical Variables

### Identified Categorical Columns:
- `workout` (Nominal)

```python
from sklearn.preprocessing import LabelEncoder

# Encode 'workout' column
label_encoder = LabelEncoder()
gym_df['workout'] = label_encoder.fit_transform(gym_df['workout'])
```

#### Explanation:
- **Label Encoding** is used since `workout` has multiple categories.

---

## 4. Feature Scaling

- **Standardization** is applied to the `time` column.

```python
from sklearn.preprocessing import StandardScaler

# Standardize 'time'
scaler = StandardScaler()
gym_df[['time']] = scaler.fit_transform(gym_df[['time']])
```

#### Explanation:
- **Standardization** ensures `time` has zero mean and unit variance for better model performance.

---

## 5. Train-Test Split

Since there is no explicit target variable, this step is skipped.

---


