# **Machine Learning Workflow - Brief Revision Notes**

### 1. **Load Dataset**

* Import data into pandas DataFrame.
* Purpose: Get data ready for analysis and modeling.

### 2. **Check for Missing Values**

* Use `df.isnull().sum()` to identify missing data.
* Handle missing values or confirm none exist.
* Purpose: Ensure data quality.

### 3. **Understand Data Types & Target**

* Use `df.info()` and `df.describe()`.
* Identify numeric and categorical columns.
* Identify target variable (e.g., `Price` for regression).
* Purpose: Know what kind of data you have and what to predict.

### 4. **Exploratory Data Analysis (EDA) with Visualization**

* Visualize distributions (`histplot`), relationships (`scatterplot`), and categories (`boxplot`).
* Use correlation heatmap for numeric feature relationships.
* Purpose: Find patterns, detect outliers, and understand feature importance.

### 5. **Data Preprocessing**

* **Drop irrelevant columns** (e.g., IDs).
* **Encode categorical variables** with One-Hot Encoding (`pd.get_dummies`).

  * Why? ML models need numbers, not text.
* **Feature Scaling** (Standardization with `StandardScaler`).

  * Why? Align feature scales for better model performance, especially linear models.

### 6. **Train-Test Split**

* Split dataset into training and testing sets (e.g., 80-20).
* Purpose: Evaluate model performance on unseen data.

### 7. **Train Model**

* Start with simple model like Linear Regression.
* Fit model on training data.

### 8. **Make Predictions**

* Use trained model to predict on test data.

### 9. **Evaluate Model**

* Use metrics like Mean Absolute Error (MAE) and R² score.
* Interpret results to judge model accuracy.
* Visualize actual vs predicted values with scatter plots.

### 10. **Interpretation & Next Steps**

* Low R² or high error indicates model may be too simple.
* Explore advanced models (Random Forest, Gradient Boosting).
* Consider feature engineering or data transformation.
* Understand model limitations and improve iteratively.