# Homework 6


**Concepts covered:**

reading/writing files, CSV files, parsing to python data structures.


**Instructions:**

Be sure to run each code block after you edit it to make sure it runs as expected. When you are done, we strongly recommend you run all the code from scratch (Runtime menu -> Restart and Run all) to make sure your current code works for all problems.

If your code raises an exception when run from scratch, it will interfere with the auto-grader process causing you to lose some or all points for this homework. Please ask for help on Piazza, attend office hours or schedule an appointment with your learning facilitator if you get stuck.


**Warning about AI:**

While we encourage you to use AI to generate practice problems, **we recommend against using generative AI in the strongest terms** to solve any part of this assignment. The goal of this assignment is for you to learn these concepts, and while using AI may help you obtain solutions to these problems, you will cheat yourself out of the learning that comes from working through the problems yourself.

**Sample Output:**

For each problem description, a sample output has been included to show what the expected output should be.
Some functions have test cases provided for you to test with, so no sample output has been included for those.


**Docstrings and Comments:**

Include a *documentation string* (docstring) for each function definition, as well as comments in the body of your code to explain each control structures (e.g., decision, loop), function call, or formula.

**Review Problem 1**

**Concept:** *Dictionary data structure.*

**Task:**

Write a function `retrieve_even_values(d)` that takes a dictionary d whose values are integers and returns a new dictionary containing only the key-value pairs where the value is an even integer.

You are also given a helper `print_dict(d)` that prints each key or value on its own line.

**Sample Output:**

For the input dictionary `d = {"x": 10, "y": 7, "z": 14, "w": 3}`, the output should print:

```
original:
x: 10
y: 7
z: 14
w: 3
result:
x: 10
z: 14
```

In [None]:
# HELPER FUNCTION PROVIDED
def print_dict(d):
    '''
    Iterate over a dictionary and print out each key and value.
    '''

    for k in d.keys():
        print(f'{k}: {d[k]}')



def retrieve_even_values(d): 


    '''
    returning a dictionary of values for result if the values of dictionary is even
    '''
    result= {}

    for i ,k in d.items():  

        if k % 2 == 0:
            result[i] = k # this is saying, we are adding to the result dictionary at key i with value k 
    return result 
    

# TO DO: WRITE YOUR SOLUTION HERE


Test case for problem 1

In [55]:
# TEST CASES HERE
d = {"a": 1, "b": 4, "c": 2, "d": 5}
print("original:")
print_dict(d)

print("result:")
d2 = retrieve_even_values(d)
print_dict(d2)


original:
a: 1
b: 4
c: 2
d: 5
result:
b: 4
c: 2


**Review Problem 2**

**Concept:** *List and dictionary data structure.*

**Task:**

Write a function `add_items(data)` that takes a list of dictionaries. Each dictionary maps strings to integers. Return a single dictionary where each key’s value is the sum of that key across all dictionaries.

If a key appears in some dictionaries but not others, treat missing values as 0.

**Sample Output:**

For:
```
data = [
    {"math": 90, "science": 85},
    {"math": 75, "english": 88},
    {"science": 95, "english": 92}
]
```

The output should be:
```
{'math': 165, 'science': 180, 'english': 180}
```



In [None]:
# TO DO: WRITE YOUR SOLUTION HERE

def add_items(data): 

    '''Goes through a dictionary and sums values with like keys. Returns 0 if key does not exist'''


    totals = {}

    for d in data: # must iterate over list items 

        for key, value in d.items(): # then we iterate over the dictionary elements within the list 

            totals[key] = totals.get(key,0) + value # this is saying get the totals of key. if it does not exist return 0 

    return totals 


In [83]:
# STARTER CODE PROVIDED:
data = [{"a": 1, "b": 5},
      {"a": 2, "c": 123},
      {"a": 3, "b": 37, "c": 999},
        ]

# TEST CASES HERE
result = add_items(data)
print(result)


{'a': 6, 'b': 42, 'c': 1122}


Test case for problem 2

**Problem 3**

**Concept:** *Reading a text file.*

**Task:**

Write a function read_file_as_words(filename) that:

1. Opens a text file for reading

2. Reads the entire file as a single string

3. Splits the text into words using `.split()`

4. Returns a list of words in lowercase

**Sample Output:**

If the file contains:

```
Apple banana
CHERRY apple
```
Output:
```
['apple', 'banana', 'cherry', 'apple']
```


In [None]:
# TO DO: WRITE YOUR SOLUTION HERE


import pandas as pd 

def read_file_as_words(filename): 

    '''read the file and return the lowercase of words within the file'''


    with open(filename, 'r') as f: # this statement reads the file
        text = f.read()

    words = text.split()

    return [i.lower() for i in words]

Test case for problem 3

In [31]:
# TEST CASES HERE
filename = 'words.txt'
words = read_file_as_words(filename)
print(f'words = {words}')


words = ['nitwit', 'blubber', 'oddment', 'tweak']


**Problem 4**

**Concept:** *Writing a text file.*

**Task:**

Write a function `write_words_to_file(words, filename)` that writes a list of words to a text file, one word per line.

**Sample Output:**

Input:
```
words = ['python', 'data', 'files']
```
Contents of output file:
```
python
data
files
```




In [None]:
# TO DO: WRITE YOUR SOLUTION HERE



def write_words_to_file(words, filename): 
    '''takes words list and writes them to a file'''


    with open(filename, 'w') as f: # the w is used to write into a file 

        for word in words: 
            f.write(f'{word}\n')
        
        f.close() # always close the file or it will not save


Test case for problem 4

In [79]:
# TEST CASES HERE
filename = 'output.txt' 
words = ['nitwit', 'blubber', 'oddment', 'tweak']
write_words_to_file(words, filename)


**Problem 5**

**Concept:** *Reading a TSV file header row.*

**Task:**

Write a function `get_column_names(filename)` that reads a TSV file (tab-separated) with a header row and returns a list of column names in the same order as they appear in the header.

**Sample Output:**

If the header row is:

```
name\tage\tcity
```
Output:
```
['name', 'age', 'city']
```


In [None]:
# TO DO: WRITE YOUR SOLUTION HERE

def get_column_names(filename): 

    '''takes column names and returns them via a list'''
    
    with open(filename, 'r') as f:

        header = f.readline().strip()
    
    return header.split('\t')


Test cases

In [77]:
# TEST CASES HERE
print(get_column_names("data1.tsv"))
print(get_column_names("data2.tsv"))

['foo', 'bar']
['movie', 'release_year']


**Problem 6**

**Concept:** *Counting non-empty data rows in a TSV.*

**Task:**

Write a function `get_data_row_count(filename)` that reads a TSV file with a header row and returns the number of non-blank data rows (do not count the header, and do not count blank lines).

**Sample Output:**

If TSV file contains:
```
id\tvalue
1\t100
2\t200

3\t300
```
Notice one blank line

Output:
```
3
```

In [None]:
# TO DO: WRITE YOUR SOLUTION HERE


def get_data_row_count(filename): 

    '''skips the header and counts non blank rows in a file'''

    count = 0 

    with open(filename, 'r') as f: 

        header_skipped = False


        for line in f: 

            if line.strip() == '': 
                continue 

            if not header_skipped:
                header_skipped = True
                continue


            count += 1 


    return count 



        
        





Test cases

In [90]:
# TEST CASES HERE
print(get_data_row_count("data1.tsv"))
print(get_data_row_count("data2.tsv"))


3
8


**Problem 7**

**Concept:** *Reading a CSV file, print out data records.*

**Task:**

Write a function `print_csv_contents(filename)` that reads a CSV file and prints:

1. The header columns separated by tabs

2. Each data row with values separated by tabs

**Sample Output:**

For CSV file:
```
product,price,quantity
Pen,1.50,10
Notebook,3.00,5
```
Output:
```
product	price	quantity
Pen	1.50	10
Notebook	3.00	5
```


In [None]:
import csv

def print_csv_contents(filename):

    '''prints the header columns and rows all seperated by tabs from a file'''

    with open(filename, newline="", encoding="utf-8") as f:
        reader = csv.reader(f)

        out = []

        # Header (first row) + trailing tab + newline
        header = next(reader)
        out.append("\t".join(header) + "\t\n")

        # All remaining rows: each row ends with a tab, NO newline between rows
        for row in reader:
            out.append("\t".join(row) + "\t")

        # Final newline at the end
        out.append("\n")

        print("".join(out), end="")

Test cases

In [132]:
# TEST CASES HERE

# add test cases/file to read from
filename = 'data3.csv'
print_csv_contents(filename)

# filename = 'data4.csv'
# print_csv_contents(filename)



age	height_cm	weight_kg	
22	183	51	62	168	57	42	150	98	49	163	90	49	129	98	60	171	90	59	181	65	62	136	83	18	144	67	64	189	73	


**Problem 8**

**Concept:** *Extracting one column from a CSV file.*

**Task:**

Write a function `get_column_data(filename, column_name)` that reads a CSV file with a header row and returns a list containing the values from the specified column.

If the column name is missing, the function should raise a `KeyError`.

**Sample Output:**

Input:
```
get_column_data("sales.csv", "price")
```
If column contains:
```
10.99
5.50
20.00
```
Output:
```
['10.99', '5.50', '20.00']
```
If column missing:
```
KeyError: Column 'discount' does not exist in file 'sales.csv'.
```


In [100]:
# TO DO: WRITE YOUR SOLUTION HERE

import csv

def get_column_data(filename, column_name):
    """
    Read a CSV with a header row and return a list of values from column_name.
    Raises KeyError if the column_name is not in the header.
    """
    with open(filename, "r", encoding="utf-8", newline="") as f:
        reader = csv.DictReader(f)

        # DictReader uses the header row as fieldnames
        if reader.fieldnames is None or column_name not in reader.fieldnames:
            raise KeyError(f"Column '{column_name}' does not exist in file '{filename}'.")

        return [row[column_name] for row in reader]


Test cases for problem 8

In [102]:
# TEST CASES HERE
#data = get_column_data("data3.csv", "foo")
#print(data)

get_column_data("data4.csv", "volume")
print(data)


[{'a': 1, 'b': 5}, {'a': 2, 'c': 123}, {'a': 3, 'b': 37, 'c': 999}]


**Problem 9**

**Concept:** *Reading a CSV file, lists, dictionaries.*

**Task:**

Write a generator function `read_csv_to_dictionaries(filename)` that reads a CSV file with a header row and yields one dictionary per row:

* Keys come from the header row

* Each value should be parsed as an `int`

* If a value cannot be parsed as an integer, store `None` instead

**Sample Output:**

If CSV file:
```
a,b,c
1,2,3
4,x,6
```
Output:
```
[
 {'a': 1, 'b': 2, 'c': 3},
 {'a': 4, 'b': None, 'c': 6}
]
```


In [111]:
# TO DO: WRITE YOUR SOLUTION HERE

def read_csv_to_dictionaries(filename):
    """
    Read a CSV file with a header row and yield one dict per data row.
    - Keys come from the header.
    - Values are parsed as int.
    - If parsing fails, store None.
    """
    with open(filename, "r", encoding="utf-8") as f:
        header_line = f.readline()
        if not header_line:
            return  # empty file => yields nothing

        headers = header_line.rstrip("\n").split(",")

        for line in f:
            line = line.rstrip("\n")
            if line == "":
                continue  # skip blank lines

            parts = line.split(",")

            row = {}
            for i, key in enumerate(headers):
                raw = parts[i] if i < len(parts) else ""  # handle short rows safely
                try:
                    row[key] = int(raw)
                except (ValueError, TypeError):
                    row[key] = None

            yield row


Test case for problem 9

In [112]:
# TEST CASES HERE
filename = 'data3.csv'
data = read_csv_to_dictionaries(filename)
print(data)


<generator object read_csv_to_dictionaries at 0x11343cd50>


**Problem 10**

**Concept:** *Data cleaning + flagging “bad” rows (finagled).*

**Task:**

Write a generator function `parse_data_file(filename)` that reads a TSV file with a header row and yields dictionaries like Problem 9, except:

* Parse values as integers

* If parsing fails, store the value 3

* Add a key `"finagled"` to each dictionary:

    * `False` if all values parsed successfully

    * `True` if at least one value failed to parse


**Sample Output:**

If TSV file:
```
a\tb
5\t7
8\tabc
```
Output:
```
[
 {'a': 5, 'b': 7, 'finagled': False},
 {'a': 8, 'b': 3, 'finagled': True}
]
```


In [None]:
import csv

def parse_data_file(filename):

    '''reads a tsv and yields dictionaries'''

    with open(filename, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)  # comma-delimited in your tests

        for row in reader:
            out = {}
            finagled = False

            for k, v in row.items():
                try:
                    out[k] = int(v)
                except (ValueError, TypeError):
                    out[k] = -1
                    finagled = True

            if finagled:
                out["finagled"] = True

            yield out

Test cases here

In [146]:
# TEST CASES HERE
data4 = list(parse_data_file("data4.csv"))
print(data4)
data5 = list(parse_data_file("data5.csv"))
print(data5)


[{'formatted_date': -1, 'high': -1, 'low': -1, 'open': -1, 'close': -1, 'volume': 8640900, 'finagled': True}, {'formatted_date': -1, 'high': -1, 'low': -1, 'open': -1, 'close': -1, 'volume': 4964900, 'finagled': True}, {'formatted_date': -1, 'high': -1, 'low': -1, 'open': -1, 'close': -1, 'volume': 8768100, 'finagled': True}, {'formatted_date': -1, 'high': -1, 'low': -1, 'open': -1, 'close': -1, 'volume': 6638300, 'finagled': True}, {'formatted_date': -1, 'high': -1, 'low': -1, 'open': -1, 'close': -1, 'volume': 6243700, 'finagled': True}]
[{'age': 22, 'height_cm': 183, 'weight_kg': 51}, {'age': 62, 'height_cm': 168, 'weight_kg': 57}, {'age': 42, 'height_cm': 150, 'weight_kg': -1, 'finagled': True}, {'age': 49, 'height_cm': 163, 'weight_kg': 90}, {'age': 49, 'height_cm': 129, 'weight_kg': 98}, {'age': 60, 'height_cm': -1, 'weight_kg': 90, 'finagled': True}, {'age': 59, 'height_cm': 181, 'weight_kg': 65}, {'age': 62, 'height_cm': 136, 'weight_kg': 83}, {'age': 18, 'height_cm': 144, 'wei

**Problem 11**

**Concept:** *Selecting columns and writing a new CSV.*

**Task:**

Write a function `process_columns(input_filename, output_filename, column_names)` that:

1. Reads `input_filename` as a CSV with a header row

2. Writes `output_filename` as a CSV containing only the columns listed in `column_names`

3. The output columns must appear in the same order as `column_names`

Assume all requested `column_names` exist in the input.

**Sample Output:**

Input file:
```
name,age,grade
Alice,20,A
Bob,22,B
```
Function call:
```
process_columns("students.csv", "output.csv", ["name", "grade"])
```
Output file:
```
name,grade
Alice,A
Bob,B
```


In [None]:
import csv

def process_columns(input_filename, output_filename, column_names):

    '''reads input as a csv with header row, writes output using columns listed in column names '''

    with open(input_filename, newline="", encoding="utf-8") as infile:
        reader = csv.DictReader(infile)

        with open(output_filename, "w", newline="", encoding="utf-8") as outfile:
            # header with trailing comma
            outfile.write(",".join(column_names) + ",\n")

            # rows with trailing comma
            for row in reader:
                values = [row[col] for col in column_names]
                outfile.write(",".join(values) + ",\n")

Test cases

In [148]:
# TEST CASES HERE
process_columns("data3.csv", "output.csv", ["height_cm", "weight_kg"])



**Problem 12**

**Concept:** *Computing column averages from a TSV file.*

**Task:**

Write a function `get_average_values(filename)` that reads a TSV file with a header row and numeric values, and returns a dictionary mapping each column name to the average (mean) of that column.

Assume all data values are numeric and the file has at least one row of data.

**Sample Output:**

If TSV file:

```
score1\tscore2
80\t90
70\t100
```
Output:
```
{'score1': 75.0, 'score2': 95.0}
```

In [None]:
import csv

def get_average_values(filename):

    '''prints average values of columns'''
    with open(filename, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)  # comma-delimited

        sums = {}
        counts = {}

        for row in reader:
            for col, val in row.items():
                # remove commas used as thousands separators
                val = val.replace(",", "").strip()
                v = float(val)

                sums[col] = sums.get(col, 0.0) + v
                counts[col] = counts.get(col, 0) + 1

        return {col: sums[col] / counts[col] for col in sums}

Test cases

In [154]:
# TEST CASES HERE
result = get_average_values("data3.csv")
print(result)


{'age': 48.7, 'height_cm': 161.4, 'weight_kg': 77.2}


**Problem 13**

**Concept:** *Reading JSON.*

**Task:**

Write a function `read_json_data(filename, key)` that:

* Parses the file as JSON

* If the JSON object is a dictionary and contains `key`, return its value

* Otherwise return `None`

**Sample Output:**

If JSON file:

```
JSON
{
  "title": "Data Science",
  "year": 2025
}
```
Results:
```
read_json_data(file, "title") → "Data Science"
read_json_data(file, "year") → 2025
read_json_data(file, "author") → None
```


In [None]:
# TO DO: WRITE YOUR SOLUTION HERE

import json

def read_json_data(filename, key):
    '''reads a json file and searches values by key'''
    with open(filename, "r", encoding="utf-8") as f:
        data = json.load(f)

    if isinstance(data, dict) and key in data:
        return data[key]
    return None




Test cases

In [157]:
# TEST CASES HERE
read_json_data("data6.json", "date")
read_json_data("data6.json", "name")

'Berlin Marathon'

**Problem 14**

**Concept:** *Reading a CSV file; computing derived values.*

**Task:**

Write a function `extract_trading_range(filename)` that reads a CSV file of stock prices.

For each row:

1. Extract `high` and `low`

2. Compute `trading_range = high - low`

3. Return a list of `trading_range` values (floats)

Assume the CSV has columns named exactly `high` and `low`.

**Sample Output:**

If CSV file:

```
date,high,low
2024-01-01,150,140
2024-01-02,200,180
```
Output:
```
[10.0, 20.0]
```


In [158]:
# TO DO: WRITE YOUR SOLUTION HERE


import csv

def extract_trading_range(filename):

    '''Takes the date and ranges of a file and appends them'''

    ranges = []


    with open(filename, "r", encoding="utf-8", newline="") as f:

        reader = csv.DictReader(f) 

        for row in reader:
            high = float(row['high'])
            low = float(row['low'])
            ranges.append(high - low)

    return ranges
        



Test cases for Problem 14

In [159]:
# TEST CASES HERE
filename = 'SBUX.csv'
result = extract_trading_range(filename)
print(result)


[1.25, 1.3199996948242188, 2.55999755859375, 1.7900009155273438, 1.0, 0.8600006103515625, 2.2099990844726562, 2.3700027465820312, 1.5200042724609375, 1.899993896484375, 0.7600021362304688, 3.1699981689453125, 1.0800018310546875, 1.0100021362304688, 1.3899993896484375, 2.2300033569335938, 2.0900039672851562, 7.400001525878906, 2.30999755859375, 1.8600006103515625, 2.9200057983398438, 2.94000244140625, 2.55999755859375, 1.2599945068359375, 2.5600051879882812, 1.5800018310546875, 1.470001220703125, 2.7900009155273438, 1.8700027465820312, 1.44000244140625, 1.7399978637695312, 1.0, 1.5400009155273438, 3.0500030517578125, 2.0200042724609375, 2.5699996948242188, 2.529998779296875, 2.899993896484375, 1.8199996948242188, 3.3600006103515625, 3.2699966430664062, 2.30999755859375, 5.519996643066406, 3.4199981689453125, 6.1999969482421875, 4.0, 3.3300018310546875, 2.94000244140625, 2.089996337890625, 2.4499969482421875, 1.69000244140625, 1.5800018310546875, 1.0200042724609375, 1.720001220703125, 2.