## Reading in Data in Python

This is a tutorial to get you started with reading in data in Python. Reading in data seems straightforward but can get pretty complicated depending on how your data is formatted. In this class each project group will be dealing with different data so the way that you end up reading in data might be different. Hopefully, this tutorial gives you a starting point as to how simple data is read so that you understand what you're doing when you touch more complicated formats.

Some other online tutorials can be found below. These tutorials tend to use slightly different methods but all are valid ways to read in data. Some of these tutorials focus on reading in 'strings' (words) to Python which is also useful to know about:
https://newcircle.com/s/post/1572/python_for_beginners_reading_and_manipulating_csv_files#opening-a-csv-file

http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python

http://www.pythonforbeginners.com/systems-programming/using-the-csv-module-in-python/

http://opentechschool.github.io/python-data-intro/core/text-files.html

higher-level, more astronomy specific: https://python4astronomers.github.io/files/asciifiles.html

For this tutorial I will be reading data in from the "example.txt" file. The data in this file looks like:

When reading in data I find the numpy library to be particularly useful. So let's first load the numpy library.

In [1]:
import numpy as np

I use numpy to read in data in two different ways using: loadtxt and genfromtxt. These are both great at reading in data when it's not too complicated

The documentation for these two commands can be found here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

Let's first use loadtxt to read in our data

In [2]:
data = np.loadtxt('example.txt')

print(data)

[[   0.   15.]
 [   1.   30.]
 [   2.   60.]
 [   3.  120.]
 [   4.  240.]
 [   5.  480.]]


Yay! We have successfully read in our data into Python! However, are data is not in a very usable form right now. For example, if I wanted to plot the second column versus the first column I wouldn't be able to do that right away. So now let's change the keywords that we use to read in data:

In [3]:
data_x, data_y = np.loadtxt('example.txt', unpack = True)

print(data_x, data_y)

[ 0.  1.  2.  3.  4.  5.] [  15.   30.   60.  120.  240.  480.]


Here, I've used the keyward "unpack" which puts each of the columns in their own array. Using this keyword means that I cannot store all of my data in one variable anymore and I now have to have a variable for each of the columns given. Take a look at some of the other loadtxt keywords that you can use in the documentation above.

Let's now use genfromtxt to read in the same data. genfromtxt is a very similar package to loadtxt, however, you can use it to read in more complicated data because it has more keywords!

In [4]:
data_x, data_y = np.genfromtxt('example.txt', unpack = True)

print(data_x, data_y)

[ 0.  1.  2.  3.  4.  5.] [  15.   30.   60.  120.  240.  480.]


So far reading in the data is the same. However, genfromtxt can do some things that loadtxt cannot. For example, you can tell python how many rows you want to read in from your data. Here, I tell python to just read the first two rows

In [5]:
data_x, data_y = np.genfromtxt('example.txt', unpack = True, max_rows=2)

print(data_x, data_y)

[ 0.  1.] [ 15.  30.]


## Try it yourself!

Try to use keywords in loadtxt or genfromtxt to change how your data is read. Or, look at some of the tutorials about reading in data in python and try one of those methods out!