#  House Price Prediction App

This project is a simple **Machine Learning web application** built using **Streamlit** to predict house prices based on features from the California Housing dataset.


##  Project Structure

```
model/
├── app.py                  # Streamlit app
├── housing new (2).csv     # Dataset
├── model.pkl               # Trained Random Forest model
├── scaler.pkl              # StandardScaler object
└── README.md               # Instructions


##  How to Run the App

###  Requirements

Ensure you have Python 3.7+ and install the required packages:


pip install -r requirements.txt


Or install only what's needed:

```bash
pip install streamlit pandas numpy scikit-learn matplotlib seaborn


### ▶ Running the App

From the project directory (where `app.py` is located), run:

```bash
streamlit run app.py
```

This will launch the web app in your default browser.


##  Data Cleaning Process

1. **Load Dataset:**
    ```python
    df = pd.read_csv("housing new (2).csv")
    ```

2. **Handle Missing Values:**
    - Filled missing values in `total_bedrooms` using the median.
    ```python
    df['total_bedrooms'] = df['total_bedrooms'].fillna(df['total_bedrooms'].median())
    ```

3. **Encode Categorical Variable:**
    - Used `pd.get_dummies()` to convert `ocean_proximity` into numerical columns.

    ```python
    df = pd.get_dummies(df, columns=['ocean_proximity'])
    ```

4. **Remove Outliers:**
    - Applied the IQR method on the following columns:
        - `total_rooms`, `total_bedrooms`, `population`, `households`, `median_income`, `median_house_value`

    ```python
    def remove_outliers_iqr(df, columns):
        for col in columns:
            Q1 = df[col].quantile(0.25)
            Q3 = df[col].quantile(0.75)
            IQR = Q3 - Q1
            lower = Q1 - 1.5 * IQR
            upper = Q3 + 1.5 * IQR
            df = df[(df[col] >= lower) & (df[col] <= upper)]
        return df

    df = remove_outliers_iqr(df, ['total_rooms', 'total_bedrooms', 'population',
                                  'households', 'median_income', 'median_house_value'])
    ```



##  Machine Learning Model

- **Algorithms Used:**
  - Linear Regression
  - Random Forest (final selected model)

- **Scaling:**
  - Applied `StandardScaler` on features.



##  Deployment

Run locally using Streamlit as described above. You can also deploy this app on platforms like:
- Streamlit Cloud
- Hugging Face Spaces
- Render or Heroku



##  Author

Built using Python, Pandas, and Streamlit by Tshiamo Chibua, Tibawo Timuhwe, Phenyo Sithelo, Wame Oduetse, Oabile Moroka
