- A template file for this problem is provided: stats2.ipynb
This problem is a continuation of problem 4.1. Recall that you wrote a function named get_stats()
that takes a list and returns a tuple of minimum, maximum, mean, and median. To use this function, you have to convert your IPython notebook to a regular .py
file. One way to do this is to use the IPython %%script%%
magic function; in an IPython notebook cell, type (assuming the filename of your IPython notebook from Problem 4.1 is stats.ipynb
):
%%bash
ipython3 nbconvert --to python stats.ipynb
and press shift + enter. This will create a Python script
named stats.py
.
(Note: If ipython3 nbconvert
produces an error and says that there is no
module named pygments
, you have to run docker pull lcdm/info490
to download
the latest image. You also have to stop and remove the running notebook server,
and create a new notebook server with the latest image. Remember to back up
your notebooks before running a new notebook server, because removing a
container will erase all data in the container.)
We will import this as a module in stats2.ipynb
:
from stats import get_stats
We will use the function get_stats()
to compute basic statistics of a number of columns from the arline performance dataset we downloaded in week 2. Namely, we will use the following columns:
- Column 15, "ArrDelay": arrival delay, in minutes,
- Column 16, "DepDelay": departure delay, in minutes, and
- Column 19, "Distance": distance, in miles.
To extract these columns from the CSV file,
-
Write a function named
get_column(filename, n, header = True)
that reads then
-th column from a file and returns a list of integers. -
You may assume that the column is made of integers.
-
We will also use the optional argument
header
because the first line of our file lists the names of the columns, but we might want to turn this off to handle a file that doesn't have a header. -
Use a combination of
with
statement andopen()
function to openfilename
in theget_column()
function.Tip: When I tried to use
open()
to read2001.csv
, I had the following error:'utf-8' codec can't decode byte 0xe4 in position 343: invalid continuation byte
You can avoid this error by using
encoding='latin-1'
option inopen()
. -
Skip the first line if the
header
parameter isTrue
; do not skip if it'sFalse
. -
Some columns have missing values
'NA'
, and you need a way to handle these missing values. If then
-th column is missing, you should not include that column inresult
; that is, skip all rows with'NA'
. As a result, lists returned from different columns may have different lengths.
We also want to print out the results in a nicely formatted manner.
- The
print_stats(input_list, title = None)
function is already for you. You don't have to write this function.
It takes a list of integers and prints out the basic statistics, e.g.
>>> print_stats([1, 2, 3, 4, 5], title = 'Title')
Title
Minimum: 1
Maximum: 5
Mean: 3.0
Median: 3.0