# In-class exercises Class 09

---

Often times we will want to read data from a file with our python program.  Obviously we don't want to type all of the date in by hand every time that we run our program!!  Data can be in my types of formats, but today we will focus on text files.  These are simple ascii files that can contain any characters (letter and numbers) and are human readable.  That is, if you look at the file, you will understand what is stored there.  




## Running Linux command-line programs from within Python 

Before we get started, let me show you a trick to run Linux commands from inside of Python.  We can use the "os" module which stands for Operating System.  We can run linux command-line programs with the "os.popen" command, and use the ".read()" python command to store the returned information in a variable.  I'll call it "reply".  

This can be useful if you want to have access to information inside your program about the current directory, or the Python environment. 

In [53]:
import os
reply=os.popen("ls -al").read()
print(reply)


total 676
drwxr-sr-x 4 rcg6p users   6144 Sep 26 10:56 .
drwxr-sr-x 3 rcg6p users    512 Sep 26 10:27 ..
-rw-r----- 1 rcg6p users  17546 Sep 26 10:56 class09.ipynb
-rw-r--r-- 1 rcg6p users  47099 Sep 26 10:09 co2_mm_mlo0.txt
-rw-r----- 1 rcg6p users    104 Sep 26 10:09 data.dat
drwxr-sr-x 8 rcg6p users   5632 Sep 26 10:13 .git
-rw-r--r-- 1 rcg6p users  69632 Sep 26 10:34 image2.png
-rw-r--r-- 1 rcg6p users 479505 Sep 26 10:36 image3.png
-rw-r--r-- 1 rcg6p users   9155 Sep 26 10:21 index.jpg
drwxr-sr-x 2 rcg6p users    512 Sep 26 10:19 .ipynb_checkpoints
-rw-r--r-- 1 rcg6p users     68 Sep 26 10:53 output2.txt
-rw-r--r-- 1 rcg6p users    150 Sep 26 10:41 output3.txt
-rw-r--r-- 1 rcg6p users     68 Sep 26 10:52 output.txt
-rw-r--r-- 1 rcg6p users    251 Sep 26 10:13 README.md



Do you understand what this did?  

Can you see "data.dat"?  If so, great.  We can get to work!

Let's look at it...  One way to do it (definitely not the best way!) is with the os.popen command again...

In [54]:

reply=os.popen("more data.dat").read()
print(reply)


::::::::::::::
data.dat
::::::::::::::
Iris-setosa	5.1  3.5  1.4  0.2
Iris-versicolor 7.0  3.2  4.7  1.4  
Iris-virginica	6.3  3.3  6.0  2.5  



This is a very simple data file, have you heard of the Iris?  It is a flower:
<br>
![Image](index.jpg) 
<br>
In our dataset there are 5 collumns:
1. class
2. sepal length in cm
3. sepal width in cm
4. petal length in cm
5. petal width in cm


## Reading files with Python

We will start by learning how to read a very simple test file.  I named it data.dat and you just verified that it was part of your repository.   You could read it using "os.popen" like we did above, but Python has much better file input/output tools that that!   We are going to play with this simple data file.  Let's first see *the right way* to open the file for reading in Python:

In [55]:
f=open('data.dat','r')  #Open data.dat for reading 'r' --> the "file handle" is returned can we call it "f"
f.name  #We can get the name of the file from the file handle
data=f.read()  #read all of the data from the file
print(data)   

Iris-setosa	5.1  3.5  1.4  0.2
Iris-versicolor 7.0  3.2  4.7  1.4  
Iris-virginica	6.3  3.3  6.0  2.5  



Great!  That was easy (like so many things in python!) we now know how to read in the contents of a file!  Now, we just have a string of the data that includes the full contents of the file, and we can do anything to it that we could do to a string!  
<br> But, our file includes some strings and some data and clearly each line corresponds to attributes of an Iris class.  So, we can read this in a smarter way!  

BTW:  When we are done reading a file, *we should always close it*:

In [56]:
f.close()

Our file is clearly organized by lines, there is a nice function to read in the file putting each line as a string object in a list:

In [57]:
f=open('data.dat','r') 
data_list=f.readlines()
print(f"Our file, {f.name}, has {len(data_list)} lines of text.\n")
f.close()

Our file, data.dat, has 3 lines of text.



See how I checked how many lines of date were in the file?  Now, if we wanted, we could do something with each line in the file.

In [58]:
mydat=[]
for line in data_list:  #Loop over data list made from lines in file
    row=[]
    for i in line.strip().split(): #loop over the independent entries in the row after splitting it up
        row.append(i)  #append the entries to the row list
    for j in (1,2,3,4): #Loop over the 4 floats
        row[j]=float(row[j])  #Convert the strings to floats
    mydat.append(row)  #Append the row to mydat
print(mydat)

[['Iris-setosa', 5.1, 3.5, 1.4, 0.2], ['Iris-versicolor', 7.0, 3.2, 4.7, 1.4], ['Iris-virginica', 6.3, 3.3, 6.0, 2.5]]


Now we have access to the data in file data.dat.  The basic process would work the same for any file where the collumns are seperated by spaces.  What would you change if they were seperated by commas?  

## **EXERCISE 1**:  
<span style="color:red"> Which line would you change if they were seperated by commas?   </red>

<span style="color:red"> 1) Answer here.     </red>

It turns out that there is a much better way to read in a file line-by-line.  With f.readlines(), you read the entire file into memory.  This is fine for our simple file, but what if the file had 100,000 lines of code!  Huge memory footprint!  The with command allows you to convert a file one line at a time, since file handles are iterable objects.  Observe:

In [59]:
mydat=[]
with open("data.dat") as f:
    
    for line in f:
        row=line.strip().split()
        #print(row)
        for j in (1,2,3,4): #Loop over the 4 floats
            row[j]=float(row[j])  #Convert the strings to floats
        mydat.append(row)
print(mydat)

[['Iris-setosa', 5.1, 3.5, 1.4, 0.2], ['Iris-versicolor', 7.0, 3.2, 4.7, 1.4], ['Iris-virginica', 6.3, 3.3, 6.0, 2.5]]


With automatically closes the file when done, and reading one line at a time means that we don't need to store the entire file in memory as strings.  

Now we have python lists, but let's put them into a numpy array.

In [60]:
import numpy as np
a=np.array(mydat)
print(a)


[['Iris-setosa' '5.1' '3.5' '1.4' '0.2']
 ['Iris-versicolor' '7.0' '3.2' '4.7' '1.4']
 ['Iris-virginica' '6.3' '3.3' '6.0' '2.5']]


## **EXERCISE 2**:  
<span style="color:red"> Why are they strings?   </red>

2) Answer here.  

## **EXERCISE 3**:  
<span style="color:red"> Extract a numpy array of the sepal length as a float, in cm, and print the array, the length of the array, the average value, and the standard deviation to the screen. Hint -numpy has a built-in tool for calculating average and standard deviation! Use the documentation!  </red>

In [72]:
#3) write the code here

There are many built-in tools for reading in data with numpy, to make it easier.  The most common data format is several collums of numeric data.  Numpy has a convenient "loadtxt" function for reading in this data.  The file "co2_mm_mlo.txt", contains the data since 1958 of the monthly mean CO2 mole fraction.  The mole fraction of CO2, expressed as parts per million (ppm) is the number of molecules of CO2 in every one million molecules of dried air (water vapor removed). Let's see what a few lines of data look like.

![Image](image2.png) 

We can read in the data from this file using the np.loadtxt() function:

In [73]:
co2_data=np.loadtxt("co2_mm_mlo0.txt")
shape=np.shape(co2_data)
print(shape)

(675, 7)


So, our file has 675 rows of data each with 7 collums.

### Writing Files in Python

So, now we know two ways how to read data in.  Let's learn how to write to a file.  Writing to a file is very similar printing to the screen, but we are printing to the file.  First we need to open the file for writing.  Then we can write our strings to the file before closing.  We can make our strings the normal ways in python (fstrings, or other string manipulation).  The old way is using the "+" operator or format codes:

![Image](image3.png) 

For example:

In [63]:
myint=20
mystr="test"
myfloat=37.23454
mystring="You could make a string like this: int:%d string:%s float %4.2f" %(myint,mystr,myfloat)
print(mystring)

You could make a string like this: int:20 string:test float 37.23


The new way is to use fstrings:

In [64]:
mystring=f"You could make a string like this: int:{myint} string:{mystr} float {myfloat}"
print(mystring)

You could make a string like this: int:20 string:test float 37.23454


fstrings are easier, in that Python figures out the types for you.  

Now, let's write our string to a file.  Now, when we open the file, we specify 'w' for write or 'a' for append:

In [65]:
f_out=open('output.txt','w')
f_out.write(mystring)
f_out.close()

In [66]:
f_in=open('output.txt','r')
reply=f_in.read()
f_in.close()
print(reply)


You could make a string like this: int:20 string:test float 37.23454


As noted above, it is prefferable to use the "with" command:

In [67]:
with open('output2.txt','w') as f_out:
    f_out.write(mystring)
    

In [68]:
f_in=open('output2.txt','r')
reply=f_in.read()
f_in.close()
print(reply)

You could make a string like this: int:20 string:test float 37.23454


That works too!  

For writing data to a file, you can format a string and write it to a file one line at a time using loops, or you can use a numpy function called "savetxt":

In [69]:
x=np.linspace(1,3,3)
y=x**3
np.savetxt("output3.txt",(x,y))


## EXERCISE 4:  
<span style="color:red"> Explain what the "linspace" code above did: </red>**: 

4)  Put your explanation here...  

In [70]:
f_in=open('output3.txt','r')
reply=f_in.read()
f_in.close()
print(reply)

1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
1.000000000000000000e+00 8.000000000000000000e+00 2.700000000000000000e+01



## EXERCISE 5
<span style="color:red"> That output is ugly!  Yuck!  Use Google or the documentation in your book to figure out how to write this to a file in a more reasonable format.  Also include a "comment" line at the top of the file. Write to your file, then read it back in and print it to the screen.  </red>

In [71]:
# 5) write the code here