In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A/B Hypothesis Testing Notebook\n",
    "\n",
    "This notebook performs A/B hypothesis tests on the insurance dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Introduction & Hypotheses\n",
    "- Test for risk differences across provinces, zipcodes, margin, and gender."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import sys\n",
    "sys.path.append('../src')\n",
    "import pandas as pd\n",
    "import src.data_loader as data_loader\n",
    "import src.ab_testing as ab_testing\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Data Loading"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "df = data_loader.load_data('../data/MachineLearningRating_v3.txt')\n",
    "df = data_loader.clean_data(df)\n",
    "df.head()"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Hypothesis Testing: Province Risk Differences"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "province_results = ab_testing.t_test_by_group(df, 'Province', 'TotalClaims')\n",
    "province_results"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Hypothesis Testing: Zipcode Risk Differences"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "zipcode_results = ab_testing.t_test_by_group(df, 'PostalCode', 'TotalClaims')\n",
    "zipcode_results"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Hypothesis Testing: Margin Differences by Zipcode\n",
    "- Margin = TotalPremium - TotalClaims"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "df['Margin'] = df['TotalPremium'] - df['TotalClaims']\n",
    "margin_results = ab_testing.t_test_by_group(df, 'PostalCode', 'Margin')\n",
    "margin_results"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Hypothesis Testing: Gender Risk Differences"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "gender_results = ab_testing.t_test_by_group(df, 'Gender', 'TotalClaims')\n",
    "gender_results"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Interpretation & Recommendations\n",
    "- Summarize findings and business implications here."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
