# 1.3.1 File Reading Example - Foreign Exchange Rate Dataset

Datafile: "14_Foreign_Exchange_Rates_PureNumeric.csv.csv"

Originally developed by Jingwei Liu (2020-08-29)


### Let's first define some functions to show the data we will read and help us get some basic information about the dataset.
In this example, the dataset will be read as<font color = "red"> **a list of lists** </font>, so we will define a small function to show the values in the list of lists

In [None]:
# Define a "ShowData" function - note the default value for the (now) optional parameter.
#  dataset is a list of lists
def ShowData(dataset = [["No dataset sent"]]):
    for r in dataset:
        # print elements in a tab-separated format
        print ("\t".join(r))

# sample calls
ShowData([["one", "two", "three"], ["four", "five", "six"], ["seven", "eight", "nine"]])
#show()

### Also, it is always good to check the shape of the dataset you read 
*The function below will show the number of rows and columns in the list of lists.*
<br>
**Here, Row number means how many elements in the list. Column number means how many elements in each element list.**

In [None]:
# Define a "ShowRowsAndCols" function which show the number of rows and columns in the dataset
# dataset is a list of list. Row number means how many elements in the list. Column number means how many elements in each element list.
def ShowRowsAndCols(dataset = [["No dataset sent"]]): 
    print("There are {} rows in the dataset".format(len(dataset)))
    print("There are {} columns in the dataset".format(len(dataset[1])))
    
# sample calls
ShowRowsAndCols([["one", "two", "three"], ["four", "five", "six"]])
#show()

### Now, let's read the data set into a list of lists using different Python methods

One thing you should know is , each element in the list of lists is stored <font color = "red">**as string**</font>. (even it is a number).

In [None]:
# Initial version - "standard programming"
#
# Define a list for the data.  Will be a list of lists.
data = []
# open the file
fname = "../data/14_Foreign_Exchange_Rates_PureNumeric.csv"
f = open(fname, "r")
# ignore the first 5 lines
for i in range(6):
    line = f.readline()
# loop until we run out of lines
while (line):
    # strip the newline and tokenize (split on commas, in this case)
    tokens = line.rstrip().split(',')
    # append this record to the dataset
    data.append(tokens)
    # read the next line
    line = f.readline()
# close the file
f.close()
# show the data
ShowData(data)

After running the above cell, you should see the data is read as a list of lists. We read all rows in the dataset and each row is a list and also an element of a bigger list. **So, that's why we call this a list of lists**

Now, Let's try to check the value and data type of the first element of the first row *(keep in mind that the subscript in python starts from 0)*

In [None]:
data[0][0]

In [None]:
type(data[0][0])

#### A Python-esque version of the code.
You can see in this cell, it uses fewer lines to do the same work.  For your assignment, you are free to use any of the code versions as a starting point.

In [None]:
#
# Python-esque version 1
#
# Grab all the lines from the file starting with line 6, strip
# the newline and tokenize
with open("../data/14_Foreign_Exchange_Rates_PureNumeric.csv") as f:
    vdataset = [line.rstrip().split(',') for line in f.readlines()[5:]]
# show the data
ShowData(vdataset)


#### Another Python-esque version of the codes
This time we use a module to help us read the dataset and we will read all rows.  Note that this version retains the column heading rows.

In [None]:
#
# Python-esque version 2 
#
# use the csv module
import csv
ds = []
with open("../data/14_Foreign_Exchange_Rates_PureNumeric.csv") as f:
    reader = csv.reader(f)
    for row in reader:
        ds.append(row)
# show the data
ShowData(ds)

### After reading the file, check row and column number in the list of lists (all three versions)

In [None]:
ShowRowsAndCols(data)
ShowRowsAndCols(vdataset)
ShowRowsAndCols(ds)

### We can do some simple calculation with the dataset we read
Here, I just show you about calculating the mean value of Australia data. 

In [None]:
sum = 0
# iterate from first row to last row
for i in data:
    # add Australia data of every row to sum
    sum = sum + float(i[2])
mean = sum/len(data)
mean

### Look at the column headers

In [None]:
# Use the dataset that includes the headers (ds)
ds[0]