Problem 4.2. Simple Statistics II.

A template file for this problem is provided: stats2.ipynb

This problem is a continuation of problem 4.1. Recall that you wrote a function named get_stats() that takes a list and returns a tuple of minimum, maximum, mean, and median. To use this function, you have to convert your IPython notebook to a regular .py file. One way to do this is to use the IPython %%script%% magic function; in an IPython notebook cell, type (assuming the filename of your IPython notebook from Problem 4.1 is stats.ipynb):

%%bash
ipython3 nbconvert --to python stats.ipynb

and press shift + enter. This will create a Python script named stats.py.

(Note: If ipython3 nbconvert produces an error and says that there is no module named pygments, you have to run docker pull lcdm/info490 to download the latest image. You also have to stop and remove the running notebook server, and create a new notebook server with the latest image. Remember to back up your notebooks before running a new notebook server, because removing a container will erase all data in the container.)

We will import this as a module in stats2.ipynb:

from stats import get_stats

Function: get_stats()

We will use the function get_stats() to compute basic statistics of a number of columns from the arline performance dataset we downloaded in week 2. Namely, we will use the following columns:

Column 15, "ArrDelay": arrival delay, in minutes,
Column 16, "DepDelay": departure delay, in minutes, and
Column 19, "Distance": distance, in miles.

To extract these columns from the CSV file,

Write a function named get_column(filename, n, header = True) that reads the n-th column from a file and returns a list of integers.
You may assume that the column is made of integers.
We will also use the optional argument header because the first line of our file lists the names of the columns, but we might want to turn this off to handle a file that doesn't have a header.
Use a combination of with statement and open() function to open filename in the get_column() function.

Tip: When I tried to use open() to read 2001.csv, I had the following error:
```
  'utf-8' codec can't decode byte 0xe4 in position 343: invalid
  continuation byte
```
You can avoid this error by using encoding='latin-1' option in open().
Skip the first line if the header parameter is True; do not skip if it's False.
Some columns have missing values 'NA', and you need a way to handle these missing values. If the n-th column is missing, you should not include that column in result; that is, skip all rows with 'NA'. As a result, lists returned from different columns may have different lengths.

Function: print_stats()

We also want to print out the results in a nicely formatted manner.

The print_stats(input_list, title = None) function is already for you. You don't have to write this function.

It takes a list of integers and prints out the basic statistics, e.g.

>>> print_stats([1, 2, 3, 4, 5], title = 'Title')
Title
Minimum: 1
Maximum: 5
Mean: 3.0
Median: 3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2.md

p2.md

Problem 4.2. Simple Statistics II.

Function: get_stats()

Function: print_stats()

Files

p2.md

Latest commit

History

p2.md

File metadata and controls

Problem 4.2. Simple Statistics II.

Function: get_stats()

Function: print_stats()