In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Cleaning and Visualization\n",
    "This notebook demonstrates practical **data preprocessing and visualization** using Python. It is part of my Data Science coursework and applies to a dataset containing Ethiopian employee demographic information (Name, Gender, Age, City, Salary)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# Set visualization style\n",
    "sns.set(style='whitegrid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Cleaned Data"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "data = pd.read_csv('../data/cleaned_data.csv')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Exploration"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Basic Information\n",
    "data.info()"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Summary Statistics\n",
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1️⃣ Age Distribution"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "plt.figure(figsize=(6,4))\n",
    "sns.histplot(data['Age'], bins=6, kde=True, color='steelblue')\n",
    "plt.title('Age Distribution of Employees')\n",
    "plt.xlabel('Age')\n",
    "plt.ylabel('Count')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2️⃣ Salary Distribution"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "plt.figure(figsize=(6,4))\n",
    "sns.histplot(data['Salary'], bins=6, kde=True, color='darkorange')\n",
    "plt.title('Salary Distribution (ETB)')\n",
    "plt.xlabel('Salary')\n",
    "plt.ylabel('Count')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3️⃣ Average Salary by City"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "plt.figure(figsize=(7,4))\n",
    "sns.barplot(x='City', y='Salary', data=data, estimator='mean', palette='viridis')\n",
    "plt.title('Average Salary by City')\n",
    "plt.xlabel('City')\n",
    "plt.ylabel('Average Salary (ETB)')\n",
    "plt.xticks(rotation=25)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4️⃣ Salary by Gender"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "plt.figure(figsize=(6,4))\n",
    "sns.boxplot(x='Gender', y='Salary', data=data, palette='Set2')\n",
    "plt.title('Salary Distribution by Gender')\n",
    "plt.xlabel('Gender')\n",
    "plt.ylabel('Salary (ETB)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5️⃣ Correlation Heatmap"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "numeric_data = data.select_dtypes(include='number')\n",
    "plt.figure(figsize=(4,3))\n",
    "sns.heatmap(numeric_data.corr(), annot=True, cmap='coolwarm', fmt='.2f')\n",
    "plt.title('Correlation Matrix')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6️⃣ Relationship Between Age and Salary"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "plt.figure(figsize=(6,4))\n",
    "sns.scatterplot(x='Age', y='Salary', hue='Gender', data=data, palette='cool', s=100)\n",
    "plt.title('Age vs Salary Relationship')\n",
    "plt.xlabel('Age')\n",
    "plt.ylabel('Salary (ETB)')\n",
    "plt.legend(title='Gender')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "This notebook presents a clear overview of employee demographic and financial data. It explores **age and salary patterns**, highlights **gender and city-based variations**, and provides **visual insights** essential for decision-making and reporting."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
