From 1755a6e57319c07094eb64dee00e81f20a007e62 Mon Sep 17 00:00:00 2001 From: "github-classroom[bot]" <66690702+github-classroom[bot]@users.noreply.github.com> Date: Tue, 4 Jan 2022 22:57:58 +0000 Subject: [PATCH 1/6] Setting up GitHub Classroom Feedback From 05518e6e60ffea4fb7555493f14b921d82e87825 Mon Sep 17 00:00:00 2001 From: Regina George <43589175+teststudent-kb@users.noreply.github.com> Date: Thu, 6 Jan 2022 11:59:11 -0800 Subject: [PATCH 2/6] Created using Colaboratory --- DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb | 209 ++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb diff --git a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb new file mode 100644 index 0000000..d764107 --- /dev/null +++ b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb @@ -0,0 +1,209 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + }, + "colab": { + "name": "DATA 301 Lab 1A - YOUR NAMES HERE", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fQLu8g7NkVNI" + }, + "source": [ + "# The Distribution of First Digits\n", + "\n", + "In this lab, you will explore the distribution of first digits in real data. For example, the first digits of the numbers 52, 30.8, and 0.07 are 5, 3, and 7 respectively. In this lab, you will investigate the question: how frequently does each digit 1-9 appear as the first digit of the number?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X4RaiSB8kVNJ" + }, + "source": [ + "## Question 0\n", + "\n", + "Make a prediction. \n", + "\n", + "1. Approximately what percentage of the values do you think will have a _first_ digit of 1? What percentage of the values do you think will have a first digit of 9?\n", + "2. Approximately what percentage of the values do you think will have a _last_ digit of 1? What percentage of the values do you think will have a last digit of 9?\n", + "\n", + "(Don't worry about being wrong. You will earn full credit for any justified answer.)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WlxSEeCxkVNK" + }, + "source": [ + "**ENTER YOUR WRITTEN EXPLANATION HERE.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IKpeJAfokVNL" + }, + "source": [ + "## Question 1\n", + "\n", + "The [S&P 500](https://en.wikipedia.org/wiki/S%26P_500_Index) is a stock index based on the market capitalizations of large companies that are publicly traded on the NYSE or NASDAQ. The CSV file (https://dlsun.github.io/pods/data/sp500.csv) contains data from February 1, 2018 about the stocks that comprise the S&P 500. We will investigate the first digit distributions of the variables in this data set.\n", + "\n", + "Read in the S&P 500 data. What is the unit of observation in this data set? Is there a variable that is natural to use as the index? If so, set that variable to be the index. Once you are done, display the `DataFrame`." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "LxNsWuUNkVNM" + }, + "source": [ + "# ENTER YOUR CODE HERE." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7IXwAbCnkVNQ" + }, + "source": [ + "**ENTER YOUR WRITTEN EXPLANATION HERE.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jXLVHxjIkVNR" + }, + "source": [ + "## Question 2\n", + "\n", + "We will start by looking at the `volume` column. This variable tells us how many shares were traded on that date.\n", + "\n", + "Extract the first digit of every value in this column. (_Hint:_ First, turn the numbers into strings. Then, use the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` to extract the first character of each string.) Make an appropriate visualization to display the distribution of the first digits. (_Hint:_ Think carefully about whether the variable you are plotting is quantitative or categorical.)\n", + "\n", + "How does this compare with what you predicted in Question 0?" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "gCnuPUejkVNS" + }, + "source": [ + "# ENTER YOUR CODE HERE." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YiTi4orlkVNU" + }, + "source": [ + "**ENTER YOUR WRITTEN EXPLANATION HERE.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gX4YumLtkVNV" + }, + "source": [ + "## Question 3\n", + "\n", + "Now, repeat Question 2, but for the distribution of _last_ digits. Again, make an appropriate visualization and compare with your prediction in Question 0." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PdKf6S7DkVNX" + }, + "source": [ + "# ENTER YOUR CODE HERE." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JPsZCTnAkVNZ" + }, + "source": [ + "**ENTER YOUR WRITTEN EXPLANATION HERE.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v3GOfL93kVNa" + }, + "source": [ + "## Question 4\n", + "\n", + "Maybe the `volume` column was just a fluke. Let's see if the first digit distribution holds up when we look at a very different variable: the closing price of the stock. Make a visualization of the first digit distribution of the closing price (the `close` column of the `DataFrame`). Comment on what you see.\n", + "\n", + "(_Hint:_ What type did `pandas` infer this variable as and why? You will have to first clean the values using the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` and then convert this variable to a quantitative variable.)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0EAC_EY3kVNb" + }, + "source": [ + "# ENTER YOUR CODE HERE." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YI6oR6sjkVNe" + }, + "source": [ + "**ENTER YOUR WRITTEN EXPLANATION HERE.**" + ] + } + ] +} \ No newline at end of file From 5c0582eaa245b9c474e93b9fe578ee0712e56976 Mon Sep 17 00:00:00 2001 From: Regina George <43589175+teststudent-kb@users.noreply.github.com> Date: Thu, 6 Jan 2022 12:21:44 -0800 Subject: [PATCH 3/6] Created using Colaboratory From 37a96720f1734c020d36bd3d72d4d29d95c28df3 Mon Sep 17 00:00:00 2001 From: Regina George <43589175+teststudent-kb@users.noreply.github.com> Date: Thu, 6 Jan 2022 12:31:40 -0800 Subject: [PATCH 4/6] Created using Colaboratory --- DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb index d764107..ae72f1d 100644 --- a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb +++ b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb @@ -54,7 +54,7 @@ "id": "X4RaiSB8kVNJ" }, "source": [ - "## Question 0\n", + "ahpwergpio;akjhbnfdglak;jbsrdg## Question 0\n", "\n", "Make a prediction. \n", "\n", From 548d89800d046afbe045e446b0583547440b1fa2 Mon Sep 17 00:00:00 2001 From: Regina George <43589175+teststudent-kb@users.noreply.github.com> Date: Thu, 6 Jan 2022 15:20:17 -0800 Subject: [PATCH 5/6] Created using Colaboratory From 4ad35c486f41b1938167ff45b942fd50f21f35e0 Mon Sep 17 00:00:00 2001 From: Regina George <43589175+teststudent-kb@users.noreply.github.com> Date: Thu, 6 Jan 2022 15:23:08 -0800 Subject: [PATCH 6/6] Created using Colaboratory --- DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb index ae72f1d..96d11f5 100644 --- a/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb +++ b/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb @@ -45,7 +45,7 @@ "source": [ "# The Distribution of First Digits\n", "\n", - "In this lab, you will explore the distribution of first digits in real data. For example, the first digits of the numbers 52, 30.8, and 0.07 are 5, 3, and 7 respectively. In this lab, you will investigate the question: how frequently does each digit 1-9 appear as the first digit of the number?" + "In this lab, you will explore the distribution of first digits in real data. For example, the first digits aiuwehtgaio;wughb;aoujbweghr;awiklj the numbers 52, 30.8, and 0.07 are 5, 3, and 7 respectively. In this lab, you will investigate the question: how frequently does each digit 1-9 appear as the first digit of the number?" ] }, {