# **Exploratory Data Analysis (EDA) with cleaned Ames Housing dataset**

## Objectives

* Read cleaned_data.csv and confirm the dataset shape, column structure, and absence of missing values.
* Use appropriate libraries for analysis and visualisation:
   - `pandas` → data manipulation  
   - `numpy` → numerical operations  
   - `matplotlib.pyplot` → plotting distributions and relationships  
   - `seaborn` → advanced visualisations (correlation heatmap, boxplots, scatterplots)
* Explore the distribution of important numerical and categorical features.
* Investigate how different house attributes relate to `SalePrice`.
* Visualise correlations to identify the strongest predictor variables.
* Compare the client's inherited houses to the overall Ames market.
* Use the insights gained to guide feature selection for the modelling notebook.


## Inputs

* `data/processed/cleaned_data.csv` - The fully cleaned main Ames Housing dataset produced in 02_data_ckeaning.ipynb.
* `data/raw/inherited_houses.csv` - The four inherited properties that must be analysed and later predicted.
* The following Python libraries:`pandas`,`numpy`,`matplotlib.pyplot`,`seaborn`


## Outputs

* Distribution plots (histograms, KDEs, and boxplots) for key numerical features.
* Countplots/bar charts for categorical variables.
* Scatterplots showing relationships between numerical predictors and SalePrice.
* Boxplots showing how categorical variables influence sale price.
* A correlation matrix and correlation heatmap for numerical features.
* Visual comparisons between inherited houses and the wider Ames dataset.
* A summary of analysis findings to support feature selection for modelling.



## Additional Comments

* This notebook does not perform further data cleaning; all cleaning operations were completed in `02_data_cleaning.ipynb`.
- Plots generated in this notebook will also be saved later for use in the Streamlit dashboard.
- All plots and insights should contribute to answering the client’s business questions:
  1. Which features affect house price the most?  
  2. How do the client’s inherited houses compare to the market?
- Insights from this analysis will directly influence the feature engineering and model training steps in the next notebook.


---

### Step 1: Load the cleaned dataset

In this step, I load the cleaned_data.csv dataset that was produced in the `02_data_cleaning.ipynb` notebook. The cleaned dataset contains all of the earlier data preparation steps. The cleaned dataset will be used as the basis for all exploratory visualisations in this notebook.

The client's inherited houses dataset will be loaded later, only when needed for direct comparison during the EDA. This helps keep the notebook focused and avoids introducing unused variables at the start.


---

# Section 2

Section 2 content

---