Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
},
"colab": {
"name": "DATA 301 Lab 1A - YOUR NAMES HERE",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/teststudent-kb/test-assignment-teststudent-kb/blob/main/DATA_301_Lab_1A_YOUR_NAMES_HERE.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fQLu8g7NkVNI"
},
"source": [
"# The Distribution of First Digits\n",
"\n",
"In this lab, you will explore the distribution of first digits in real data. For example, the first digits of the numbers 52, 30.8, and 0.07 are 5, 3, and 7 respectively. In this lab, you will investigate the question: how frequently does each digit 1-9 appear as the first digit of the number?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "X4RaiSB8kVNJ"
},
"source": [
"## Question 0\n",
"\n",
"Make a prediction. \n",
"\n",
"1. Approximately what percentage of the values do you think will have a _first_ digit of 1? What percentage of the values do you think will have a first digit of 9?\n",
"2. Approximately what percentage of the values do you think will have a _last_ digit of 1? What percentage of the values do you think will have a last digit of 9?\n",
"\n",
"(Don't worry about being wrong. You will earn full credit for any justified answer.)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WlxSEeCxkVNK"
},
"source": [
"**ENTER YOUR WRITTEN EXPLANATION HERE.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IKpeJAfokVNL"
},
"source": [
"## Question 1\n",
"\n",
"The [S&P 500](https://en.wikipedia.org/wiki/S%26P_500_Index) is a stock index based on the market capitalizations of large companies that are publicly traded on the NYSE or NASDAQ. The CSV file (https://dlsun.github.io/pods/data/sp500.csv) contains data from February 1, 2018 about the stocks that comprise the S&P 500. We will investigate the first digit distributions of the variables in this data set.\n",
"\n",
"Read in the S&P 500 data. What is the unit of observation in this data set? Is there a variable that is natural to use as the index? If so, set that variable to be the index. Once you are done, display the `DataFrame`."
]
},
{
"cell_type": "code",
"metadata": {
"id": "LxNsWuUNkVNM"
},
"source": [
"# ENTER YOUR CODE HERE."
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7IXwAbCnkVNQ"
},
"source": [
"**ENTER YOUR WRITTEN EXPLANATION HERE.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jXLVHxjIkVNR"
},
"source": [
"## Question 2\n",
"\n",
"We will start by looking at the `volume` column. This variable tells us how many shares were traded on that date.\n",
"\n",
"Extract the first digit of every value in this column. (_Hint:_ First, turn the numbers into strings. Then, use the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` to extract the first character of each string.) Make an appropriate visualization to display the distribution of the first digits. (_Hint:_ Think carefully about whether the variable you are plotting is quantitative or categorical.)\n",
"\n",
"How does this compare with what you predicted in Question 0?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "gCnuPUejkVNS"
},
"source": [
"# ENTER YOUR CODE HERE."
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "YiTi4orlkVNU"
},
"source": [
"**ENTER YOUR WRITTEN EXPLANATION HERE.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gX4YumLtkVNV"
},
"source": [
"## Question 3\n",
"\n",
"Now, repeat Question 2, but for the distribution of _last_ digits. Again, make an appropriate visualization and compare with your prediction in Question 0."
]
},
{
"cell_type": "code",
"metadata": {
"id": "PdKf6S7DkVNX"
},
"source": [
"# ENTER YOUR CODE HERE."
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "JPsZCTnAkVNZ"
},
"source": [
"**ENTER YOUR WRITTEN EXPLANATION HERE.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v3GOfL93kVNa"
},
"source": [
"## Question 4\n",
"\n",
"Maybe the `volume` column was just a fluke. Let's see if the first digit distribution holds up when we look at a very different variable: the closing price of the stock. Make a visualization of the first digit distribution of the closing price (the `close` column of the `DataFrame`). Comment on what you see.\n",
"\n",
"(_Hint:_ What type did `pandas` infer this variable as and why? You will have to first clean the values using the [text processing functionalities](https://pandas.pydata.org/pandas-docs/stable/text.html) of `pandas` and then convert this variable to a quantitative variable.)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "0EAC_EY3kVNb"
},
"source": [
"# ENTER YOUR CODE HERE."
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "YI6oR6sjkVNe"
},
"source": [
"**ENTER YOUR WRITTEN EXPLANATION HERE.**"
]
}
]
}