Cal-Poly-Data-301 · teststudent-kb · Jan 6, 2022 · Jan 6, 2022 · Jan 6, 2022
diff --git a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb
@@ -0,0 +1,209 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.6.4"
+    },
+    "colab": {
+      "name": "DATA 301 Lab 1A - YOUR NAMES HERE",
+      "provenance": [],
+      "collapsed_sections": [],
+      "include_colab_link": true
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/teststudent-kb/test-assignment-teststudent-kb/blob/main/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fQLu8g7NkVNI"
+      },
+      "source": [
+        "# The Distribution of First Digits\n",
+        "\n",
+        "In this lab, you will explore the distribution of first digits in real data. For example, the first digits of the numbers 52, 30.8, and 0.07 are 5, 3, and 7 respectively. In this lab, you will investigate the question: how frequently does each digit 1-9 appear as the first digit of the number?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "X4RaiSB8kVNJ"
+      },
+      "source": [
+        "## Question 0\n",
+        "\n",
+        "Make a prediction. \n",
+        "\n",
+        "1. Approximately what percentage of the values do you think will have a _first_ digit of 1? What percentage of the values do you think will have a first digit of 9?\n",
+        "2. Approximately what percentage of the values do you think will have a _last_ digit of 1? What percentage of the values do you think will have a last digit of 9?\n",
+        "\n",
+        "(Don't worry about being wrong. You will earn full credit for any justified answer.)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "WlxSEeCxkVNK"
+      },
+      "source": [
+        "**ENTER YOUR WRITTEN EXPLANATION HERE.**"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "IKpeJAfokVNL"
+      },
+      "source": [
+        "## Question 1\n",
+        "\n",
+        "The [S&P 500](https://en.wikipedia.org/wiki/S%26P_500_Index) is a stock index based on the market capitalizations of large companies that are publicly traded on the NYSE or NASDAQ. The CSV file (https://dlsun.github.io/pods/data/sp500.csv) contains data from February 1, 2018 about the stocks that comprise the S&P 500. We will investigate the first digit distributions of the variables in this data set.\n",
+        "\n",
+        "Read in the S&P 500 data. What is the unit of observation in this data set? Is there a variable that is natural to use as the index? If so, set that variable to be the index. Once you are done, display the `DataFrame`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "LxNsWuUNkVNM"
+      },
+      "source": [
+        "# ENTER YOUR CODE HERE."
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7IXwAbCnkVNQ"
+      },
+      "source": [
+        "**ENTER YOUR WRITTEN EXPLANATION HERE.**"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "jXLVHxjIkVNR"
+      },
+      "source": [
+        "## Question 2\n",
+        "\n",
+        "We will start by looking at the `volume` column. This variable tells us how many shares were traded on that date.\n",
+        "\n",
+        "Extract the first digit of every value in this column. (_Hint:_ First, turn the numbers into strings. Then, use the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` to extract the first character of each string.) Make an appropriate visualization to display the distribution of the first digits. (_Hint:_ Think carefully about whether the variable you are plotting is quantitative or categorical.)\n",
+        "\n",
+        "How does this compare with what you predicted in Question 0?"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "gCnuPUejkVNS"
+      },
+      "source": [
+        "# ENTER YOUR CODE HERE."
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YiTi4orlkVNU"
+      },
+      "source": [
+        "**ENTER YOUR WRITTEN EXPLANATION HERE.**"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gX4YumLtkVNV"
+      },
+      "source": [
+        "## Question 3\n",
+        "\n",
+        "Now, repeat Question 2, but for the distribution of _last_ digits. Again, make an appropriate visualization and compare with your prediction in Question 0."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "PdKf6S7DkVNX"
+      },
+      "source": [
+        "# ENTER YOUR CODE HERE."
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JPsZCTnAkVNZ"
+      },
+      "source": [
+        "**ENTER YOUR WRITTEN EXPLANATION HERE.**"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "v3GOfL93kVNa"
+      },
+      "source": [
+        "## Question 4\n",
+        "\n",
+        "Maybe the `volume` column was just a fluke. Let's see if the first digit distribution holds up when we look at a very different variable: the closing price of the stock. Make a visualization of the first digit distribution of the closing price (the `close` column of the `DataFrame`). Comment on what you see.\n",
+        "\n",
+        "(_Hint:_ What type did `pandas` infer this variable as and why? You will have to first clean the values using the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` and then convert this variable to a quantitative variable.)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "0EAC_EY3kVNb"
+      },
+      "source": [
+        "# ENTER YOUR CODE HERE."
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YI6oR6sjkVNe"
+      },
+      "source": [
+        "**ENTER YOUR WRITTEN EXPLANATION HERE.**"
+      ]
+    }
+  ]
+}