In [3]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework Assignment: t-test, Histograms, and Boxplots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this assignment we will learn how to perform statistical comparisons using the t-test, visualize distributions with histograms, and summarize data with boxplots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color = \"pink\" > Task 1: Generate Two Random Distributions\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate two random distributions of size 100 each using numpy.random.randn. Add a shift to make the distributions different (e.g., add +2 to the other).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Generate random distributions\n",
    "dist1 = np.random.randn(100) * 1.5\n",
    "dist2 = np.random.randn(100) + 3.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color = \"pink\" > Task 2: Perform a t-test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## t-test\n",
    "\n",
    "t-test checks if two populations are statistically different from each other.\n",
    "\n",
    "- [Python package for statistical functions: `scipy.stats.ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)\n",
    "\n",
    "It returns a **P-value**. If the P-value is smaller than some threshold (e.g., 0.05, which is commonly used), we say that there is **(1 ‚Äì threshold) * 100% confidence** that the populations are different.\n",
    "\n",
    "### Key Points:\n",
    "- **Smaller P-values** (usually below 0.05) indicate that the difference between two populations is statistically significant.\n",
    "- t-test assumes that the populations are sampled from **normal distributions**.\n",
    "- The **P-value magnitude** depends on the number of samples. Larger sample sizes lead to smaller P-values if the distributions are statistically different.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the scipy.stats.ttest_ind function to perform an independent t-test on the two distributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "t-statistic: -13.722978828283012\n",
      "p-value: 5.295908774788721e-29\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import ttest_ind\n",
    "\n",
    "# Perform t-test on the 2 populations using the function `ttest_ind()` and print it\n",
    "t_statistic, p_value = ttest_ind(dist1, dist2)\n",
    "print(\"t-statistic:\", t_statistic)\n",
    "print(\"p-value:\", p_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color = \"pink\" >Task 3: Visualize with Histograms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot Histograms of Two Distributions\n",
    "\n",
    "- Use `matplotlib.pyplot.hist` to plot histograms of the two distributions.  \n",
    "- Use **different colors** for each histogram and add a **legend** to indicate the distributions.  \n",
    "- Refer to the [Matplotlib `hist` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) for more details.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Plot histograms\n",
    "plt.hist(dist1, color=\"blue\", alpha=0.5, label=\"dist1\", bins=15)\n",
    "plt.hist(dist2, color=\"orange\", alpha=0.5, label=\"dist2\", bins=15)\n",
    "plt.legend()\n",
    "plt.title(\"Distributions of dist1 and dist2\")\n",
    "plt.xlabel(\"Values\")\n",
    "plt.ylabel(\"Frequency\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color = \"pink\" >Task 4: Create Boxplots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create boxplots\n",
    "plt.boxplot([dist1, dist2], labels=[\"dist1\", \"dist2\"])\n",
    "plt.title(\"Boxplot of dist1 and dist2\")\n",
    "plt.ylabel(\"Values\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color ='pink'> # Task 5: Multiple Questions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's review what we learned in the previous class to refine our understanding üìù :\n",
    "\n",
    "1. Write a loop to ask the user **five subtraction questions**.\n",
    "2. Use the following Python features:\n",
    "   - `random` to generate random numbers for the questions.\n",
    "   - `input()` to allow the user to answer each question.\n",
    "   - `int()` to convert the user input to an integer for comparison.\n",
    "3. Keep track of how many answers are correct.\n",
    "4. At the end, display the user's total score (e.g., \"You scored X/5!\").\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': ['# Homework Assignment: t-test, Histograms, and Boxplots']},
  {'cell_type': 'markdown',
   'metadata': {},
   'source': ['In this assignment we will learn how to perform statistical comparisons using the t-test, visualize distributions with histograms, and summarize data with boxplots']},
  {'cell_type': 'markdown',
   'metadata': {},
   'source': ['## <font color = "pink" > Task 1: Generate Two Random Distributions\n']},
  {'cell_type': 'markdown',
   'metadata': {},
   'source': ['Generate two random distributions of size 100 each using numpy.random.randn. Add a shift to make the distributions different (e.g., add +2 to the other).\n']},
  {'cell_type': 'code',
   'execution_count': 1,
   'metadata': {},
   'outputs': [],
   'source': ['import numpy as np\n',
    '\n',
    '# Generate random distributions\n',
    'dist1 = np.random.randn(100) * 1.5\n',
    'dist2 = np.random.randn(100) + 3.5']},
  {'cell_type': 'm