Python has a number of ways to read in an input file -- in this first exercise, practice one of the more fundamental ways to read in a file: line by line!

**NOTE:** The only trick which is making this not a one-line solution [though I'm sure it can still be with a little creativity] is that you *must* strip whitespace from either side of the lines themselves. Do you remember the Python string command to do this? How could you look it up if you don't?

In [1]:
#grade

# INPUT:
# file is the relative path of the file being processed (string)
# OUTPUT:
# A list containing the complete collection of substrings formed by splitting on line breaks.
# NOTE:
# The output list should contain each line (even empty lines) in the order they are read (top to bottom)
# To ensure full credit, you should strip whitespace from both sides of each line.
def stringParseLine(file):
    listofstrings = []
    with open(file) as myFile:
        for line in myFile:
            listofstrings.append(line.strip())
    return listofstrings

For a slightly harder (but also more generally useful) parsing function, lets read in an input file not as a single list of lines but as a list of lists of individual cells separated by some break character. This is a very common style of format (by tab, by comma, by space, etc...) and a great way of storing a two-dimensional matrix. 

Accordingly your output here should be a matrix where line i is stored at index i and each line is first stripped of whitespace (like the above function) but then also split according to the break character.

In [2]:
#grade

# stringParseLineBreaks (5)
# INPUT:
# file is the relative path of the string file being processed (string)
# bchar is the break character (string)
# OUTPUT:
# A list of lists where each line in the file is parsed as a
# separate list by "splitting" each line at the break characters.
# NOTE:
# The output list should contain each line (even empty lines) in the order they are read (top to bottom)
# To ensure full credit, you should strip whitespace from both sides of each line *before* splitting.
def stringParseLineBreaks(file, bchar):
    listedofstrings = []
    with open(file) as myFile: 
        for line in myFile:
            listedofstrings.append(line.strip().split(bchar))
    return listedofstrings

Having completed the two functions above, now lets use them to do some data processing. For the first of these exercises given a line number and an input file, count the number of lines that exactly match the input file (excluding whitespace). 

In [3]:
#grade

# matchingLines (7.5)
# INPUT:
# an integer i corresponding to the line number we are trying to count matches
# an input file consisting of strings separated by lines
# OUTPUT:
# An integer containing the count of matching lines to the line found at index i
# NOTE: There is always at least one matching line (line i always matches itself)
def matchingLines(i, file):

    with open(file) as myFile: 
        count = 0
        text = myFile.readlines()
        #text = text[:-1]
        print(text)
        correctline = text[i]
        print(correctline)
        for line in text:
            if line == correctline:
                count +=1
        return count
        
            
            

For the final exercise lets practice reading comma-separated values files! Given a list of indices corresponding to columns (you may assume the list contains only valid indices), sum only those indices (you may assume all values you are asked to sum are integers but *not* that all columns are integers) to produce a single sum for each row. You should then return a single list containing the sum for row i at index i.

In [1]:








#grade

# sumColumns (7.5)
# INPUT:
# a list of integers corresponding to the columns to be summed on each row
# an input comma-separated values file to be parsed
# OUTPUT:
# A list of integers corresponding to the sum of the input columns at each row.
# NOTE: The output list should be the same size as the number of rows
# NOTE: You may assume all the indices you are asked to sum are integers but 
# should NOT assume that all csv columns are integers.
def sumColumns(sumIndices, file):
    # YOUR CODE HERE
    correctedlist = []
    correct = stringParseLineBreaks(file, ",")
    for line in correct:
        storelist= []
        for i in range(len(line)):
            if i in sumIndices:
                storelist.append(int(line[i]))
        correctedlist.append(sum(storelist))
    return correctedlist

In [5]:
print(stringParseLine("data/parse1.txt"))
print(stringParseLineBreaks("data/parse2.txt", "CA"))
print(matchingLines(1, "data/parse3.txt"))


['$ABC$CDE$ GGG$AA', '$1213']
[['TCGATA', 'GGATGA', 'GTACGAT'], ['T', 'GATACGACGA', '', 'GCTACT'], ['CGCGCGCGCGCGCGCGCGCGCGC'], ['ATATATATATATATATATATATA']]
['1010101110101010101010101\n', '1010101110101010101010101\n', '1010101110101010101010101\n', '1010101110101010101010101\n', '1010101110101010101010101']
1010101110101010101010101

4
