# Lesson 5

This lesson will focus on how we can read in results from a file.

I'll show you 3 ways to read in results from a csv file, this can easily be extended to other file formats (with a little googling!).

First, let's create the file we'll use, run the code below:

In [None]:
import numpy as np
import csv

with open("test_file.csv", 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Feature 1', 'Feature 2', 'Feature 3'])
    writer.writerows(np.random.random((100,3)))

This file creates 100 rows of data with 3 columns, the labels at the top of te columns are "Feature 1", "Feature 2" and "Feature 3" to represent 3 different aspects of our data.

Find the file and open it to see what we're dealing with.

## No modules

First, lets open the file using no additional modules:

In [None]:
fin=open("test_file.csv", 'r').readlines()

results = []
for line in fin:
    results.append(line.strip().split(","))

titles = results[0]
results = np.array(results[1:], dtype=float)

Now, there's a LOT of code in a small space in the box above, let's break it down.

First line:

fin=open("test_file.csv", 'r').readlines()

open is a Python keyword that takes in the name of the file you want to open and whether you want to read a file ('r') or write to a file ('w'). This returns something called a 'text wrapper', but it's more convenient for us to have it in a list format. We can convert the wrapper to a list by using the ".readlines()" at the end of the line.

This code is the same as writing:

fin=open("test_file.csv", 'r')
fin=fin.readlines()

We've just put it all on one line to save space.

The next line that requires some explanation is the following:

results.append(line.strip().split(","))

You'll be familiar with "results.append" by now; we're just adding our results to a list we've created called results. the rest of the line, 

line.strip().split(",")

can be broken down to understand it. Initially, line.strip() will remove the newline character from the end of the line ("\n"), ".split(",")" will then split our line into a list that separates elements based on where the commas are, e.g. 'a,b,c' would become ['a', 'b', 'c'].

The final 2 lines break the data into the column titles:

titles = results[0]

and the data:

results = np.array(results[1:], dtype=float)

results[1:] says that we only want to use the second element onwards (so we don't use the column titles). By passing our reults to a numpy array, we convert it into a much more useful format, and we can convert ALL our data in one go to numbers by telling the numpy array that the type is a float.

That's a lot of text! So below, can you plot results? HINT: make sure you import the correct module!

In [None]:
#Your code here

## csv reader

Let's do the same above, using the module "csv":

In [None]:
import csv

fin = open("test_file.csv", 'r')
reader = csv.reader(fin, delimiter=',')

results_csv = list(reader)
titles_csv = results_csv[0]

results_csv = np.array(results_csv[1:], dtype=float)

This is slightly more friendly! Here we use our csv reader:

reader = csv.reader(fin, delimiter=',')

to load in our csv file and do all the strip() and split(',') functions behind the scene. So we can convert this to a list in the same way we convert anything to a list, e.g. list(reader).

The rest of the code is the same as the previous example!

So, can you plot results below? It should exactly match the graph above.

In [None]:
#Your code here

## Pandas

Pandas is my favourite module for this kind of work, lets' see why:

In [None]:
import pandas as pd

results_pd = pd.read_csv("test_file.csv", header=0)

That's it!

Well... not quite, there are a few caveats, but for now, print out results_pd in the box below. Do this once with a print statement, and once without and notice the difference.

In [None]:
#Your code here

Pandas will try it's best to sort out your data for you and prints it out in a friendly format - using pandas will be the topic of our next lesson, but for now, can you plot your data:

In [None]:
#Your code here

This should be exactly the same as the other graphs.

## Task

Run the code below:

In [None]:
data = np.cos(np.arange(0, 100, 0.1))
with open("task_file.csv", 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["data"])
    writer.writerows(data.reshape(-1, 1))

Find the file and open it up to see what we're working with.

Your task is to read in the data and plot a line graph with a legend (HINT: you'll need to look up how to do this!)

BONUS: can you change the colour of the line?