# W3D4: More files

## Ex 3: File based version of Subs

Create a file based version of 'SUBS'.

Again, the 'Lego' approach works. However, first we have to make a new 'brick' first, one that opens the file that holds the sequence and target sequences, reads them sequentially and stores them in a variable, and then closes the file again.

In [None]:
in_filename = 'rosalind_subs.txt'
out_filename = 'answer_subs.txt'

in_file = open(in_filename, 'r')
seq = in_file.readline().strip('\n')
target = in_file.readline().strip('\n')
in_file.close()

Before we proceed, let's first see if reading went ok:

In [None]:
print 'sequence: ',seq
print 'target: ',target

Then we need some code to search for the target in the sequence.
Note that the code in the next cell is ***exactly*** the same as in the original SUBS solution! That's a 'Lego' brick right there....

In [None]:
n = len(target)
locs = []
for i in range(len(seq)-n):
    if seq[i:i+n] == target:
        locs += [i+1]

So that was supereasy since we recycled that code. Now what's left is to write the results to an outfile. We need to open that file and enable writing first. Then we loop through the list that holds the solutions, store all values in a string, values separated by a space. That string then has to be written to a file. 

In [None]:
line = ''
for loc in locs:
    line += '%d ' % loc
line = line[:-1] # strip last space (probably not needed)

out_file = open(out_filename, 'w')
out_file.write(line+'\n')
out_file.close()

In [None]:
%%bash
cat answer_subs.txt

## Processing the Wheel-of-Fortune data file
Data for the Wheel of Fortune program is stored in a comma-delimited file of two
columns: name and chance. See `WoFinput.csv` for an example file.
Inside the program, we use a dictionary with names as keys and chances as
associated values. 

Create a program that reads the data from a file like WoFinput.csv and stores it
in a dictionary. Then let the program print the names and chances.

**Tooltip:** If `my_dict` is a dictionary, a for loop over my_dict gives all keys in turn.
For example, the following piece of code prints all keys from my_dict line by line:
```Python
for key in my_dict:
 print key
```

This will be similar to using the more explicit my_dict.keys() method, that (in Python2) returns a list with the keys:
```Python
for key in my_dict.keys():
 print key
```
[You can actually try this in the Python shell after reading the file into a
dictionary.]
The order of keys will be arbitrary. To get them into alphabetical order use:
```Python
for key in sorted(my_dict.keys()):
 print key
```
Now extend the program – after the part reads the data – to write a data file
WoFbyName.csv with records in alphabetical order.

A good approach, before starting to read stuff from a file, is to explore what is in it. Shell scripting tools are very useful for that. For instance, to quickly see how many lines are in the file, and what it contains:
 

In [None]:
%%bash
wc -l WoFinput.csv
cat WoFinput.csv | head

The input consists of two columns, separated by a ',' (comma). So, this is the drill: open file for reading, loop through, split elements based on ',' and then store in a dictionary, where the 'keys' are from the first column, and the 'values' are from the second column. 

In [None]:
in_filename = 'WoFinput.csv'
out_filename = 'WoFbyName.csv'

src = open(in_filename)      # open file; note: default is 'r'.
header = src.readline()      # keep header for writing
data = {}                    # initialize dictionary, with name 'data'. Alt: data = dict()
for line in src:             # loop through file
    parts = line.strip().split(',')  # split in elements based on ',', returns list.
    name = parts[0]          # 'name' is from first column
    chance = float(parts[1]) # 'chance' is from second column; convert to float!
    data[name] = chance      # assign value 'chance' to dictionary 'data' with key 'name'

src.close()                  # don't forget to close the file!

print data

In [None]:
print '----'
for name in data.keys():  # Loop through dictionary with name 'data'
    print '%-20s : %6.2f' % (name, data[name]) # print key-value pairs, nice layout.
print '----'


Dictionaries are unordered; If you want to have an output ordered alfabetically you need to take the keys, sort them, and then use that ordered list of keys to loop through.

Lets first have a look at the keys:

In [None]:
my_keys = data.keys()
print type(my_keys) # Note: in Python3 this is slightly different.  
my_keys

To then make a sorted list, you can simply use the Python built-in function that works on lists. Remember, that if ever you want to know what a function does:
```Python
sorted?
```


In [None]:
sorted?
sorted(my_keys)
# or, sorted in reverse:
sorted(my_keys, reverse = True)

So, now we print the sorted dictionary - sorted on **keys** that is (sorting on values is a bit more complicated)...

In [None]:

print '----'
for name in sorted(data.keys()):
    print '%-20s : %6.2f' % (name, data[name])
print '----'


.. and we also write the sorted dictionary to file. 

In [None]:
dst = open(out_filename, 'w')
dst.write(header)
for name in sorted(data.keys()):
    dst.write('%s,%.2f\n' % (name, data[name]))
dst.close()

Finally, let's see if the file contains what we expect:

In [None]:
%%bash
cat WoFbyName.csv