# Tutorial 07: Basic Data Files in Python

Sections of this notebook come from several other tutorial notebooks including but not limited to: the Officialy https://docs.python.org/3/tutorial/inputoutput.html, tutorials at the W3 school 
(https://www.w3schools.com/python/python_ref_file.asp), the Geeks for Geeks website (https://www.geeksforgeeks.org/python/file-handling-python/), Numpy user guide 
(https://numpy.org/devdocs/user/how-to-io.html). the Krittika Tutorials, and additional sources.

This tutorial was compiled for the PAARE project at South Carolina State University in partnership with Clemson University and the University of the Virgin Islands and funded by NSF. (NSF grant AST 2319415)

* Originally posted
    * JCash June 25, 2025
* Last modified:
    * JCash July 8, 2025

## Overview 
- File input and output
- - Simple text formats
  - csv, xls
- Accessing data
- - Numpy array slices (data-sci/ 1-numpy)
  - Pandas dataframe (data-sci/ 4-pandas)

 
In data science, it is very common for your Python code to access a file containing the data you want to analyze. Like most programming languages, there are a variety of techniques available in Python to work with file input and output (I/O for short). 

Luckily, most astronomical data is saved in formats that are easier to work with once you understand the functions available. In addition to built-in functions for I/O, there are useful functions in both the NumPy and Pandas packages that we will explore in this tutorial. 


A later tutorial will work with several other common astronomy file formats including fits files. 




### Imports needed

* `os` is a package that allow access to the operating system to check paths and files
* `numpy` is the standard Numerical Python Package
* `pandas` is a standard package that works with data tables.

In [2]:
import os
import numpy as np
import pandas as pd

### Data files needed

The data files used in this tutorial are located on GitHub with the tutorials. 

If you downloaded the entire tutorial repository, the files should be stored in a folder called data in the same directory folder as the tutorials. 

If you download the tutorials individually, you will need to create a new folder called `data` alongside the tutorials and download/copy the individual files into that data folder. 



<blockquote> 
    
    **Caution**
    
    Depending on how you are opening this tutorial or running this code in Python, the kernel will have different rules for how you must specify the pathname for the datafile you are accessing. You may need to change the path definition in the various cells which refer to the data files you should use.
    
</blockquote>

The cell below can check to see if you have the first file we will be using in the expected location.

In [None]:
testfilename = './data/the-zen-of-python.txt'

if os.path.isfile(testfilename):
    print('It finds the file in the expected location')
else:
    print('The file was not found in the expected location.')
    print('Your notebooks working directory is: \n')
    print(os.getcwd())


## 1.0) Data file formats

Before you can choose the best way to import your data file, you need to know more about the format of that data file. 


### ascii files

ASCII files are generic format files that can be read or produced by most applications. There are three common ASCII data formats: .DAT, .CSV, and .TXT. ASCII files are generic format files read or produced by most applications. These files can also be imported into most applications, including word processors, spreadsheets, and ASCII editors.

Ascii files can be viewed by text editors and web browsers very easily. You will want to visually look at the file contents (at least the first few lines) to understand the data better. 

Things to look for:
* Can I view the text?
* Is there a common format on each line of the file?
* What separates one piece of information from the next (space, comma, tab).

### Spreadsheet files

Add more info here...

.xls and .ods

### Other data formats

In this tutorial, we note that there are many other file formats that can be used for storing data. A complete coverage of these files is beyond the scope of this tutorial. Another later tutorial in this series does cover a common astronomical data file type called a .fits file. 

### Data Examples

Throughout this tutorial, we will show specific examples of opening Data files with the various techniques. For these tutorials, all data files will be contained in a directory `data/` stored alongside the tutorial files. You will need to ensure these data files are downloaded/uploaded with the Jupyter notebooks. 

If you are working on a jupyter-notebook server such as Anaconda on the Cloud or the Rubin Science Platform, you should be able to view the data files in the jupyter-server. 

When we say `filename`, that is a string which contains both the path to the file and the name of the file. 

If you download the entire tutorial directory with the `data/` directory, you can reference individual files with the syntax `./data/filename`
For example the first file we will look at is the file named "the-zen-of-python.txt". In that case the full filename with path would be written as 
`"./data/the-zen-of-python.txt"`


<blockquote> 
    
    **Caution**
    
    Depending on how you are opening this tutorial or running this code in Python, the kernel will have different rules for how you must specify the pathname for the datafile you are accessing. You may need to change the path definition in the various cells which refer to the data files you should use.
    
</blockquote>

## 2.0)  Unformatted text files

If a file contains ascii text but there is no standard format line by line, then you will probable need to read each line of the file into string variables. 

Depending on what you need from the file, you may use a variety of string functions and conditional statements to extract that information. 

### 2.1) Built-in open function

Within Python, one of the built-in functions is `open()`.


- A typical call will be of the form
    - `file = open(filename)`
- The parameter you pass in this function is `filename` which should be a string variable
- Optionally you can specify the mode
    - The default if you do not specify the mode is 'r' for read access
    - Other common options are: 'w' to write to a file and 'a' to append to an existing file
- The output is a file object (not the contents of the file)
    - You still need to use other functions to read or write to that file
 
The **file object** is a Python class with a variety of methods available. 
We will look at several of these to help you understand the options.

As we move forward, we will use shorter versions of these calls. 

In [None]:
#Uncomment the line below by removing the # symbol to see the full help information on this function.
#help(open)

In [None]:
#Specifying the filename. 
filename = "./data/the-zen-of-python.txt"

print(type(filename))

In [None]:
#Opening a file to create a file object.
fileobj = open("./data/the-zen-of-python.txt",'r')
print(type(fileobj))

#### Checking a file

When you first work with a data file, you may need to check to see if it is readable before moving forward. 

In general, you will already know what type of file you have and can skip this step. 

In [None]:
#Testing is a file is readable.
print(fileobj.readable())

#### Reading the full file

You can read in the full file into one big string using the `.read()` function.


In [None]:
fileobj = open(filename)
result = fileobj.read()
print(type(result))
print(len(result))
result

#### Reading in the file line by line

The `readlines()` method will give you a list where each item in the list is a string containing one line of the file.

You can then do things with each line of the file by indexing the list and using string operations.

In [None]:
fileobj = open(filename)
lines = fileobj.readlines()

print(type(lines))

In [None]:
#Shows the number of lines in the file.
len(lines)

In [None]:
#Printing out a single line by indexing.
print(lines[4])

In [None]:
#Print a subsection of the lines.
print(lines[4:6])

In [None]:
#Iterating over the list of lines to test for a substring.
for line in lines:
    if "by" in line:
        print(line)

#### Splitting the lines into words

If you needed to separate each line of the file into individual words, we can then use string splitting on the list of lines. 


In [None]:
fileobj = open(filename)          #open the fileobject
lines = fileobj.readlines()       #read the lines into a list of lines
words = []                        #create an empty list to hold the words
for line in lines:                #go line by line
    words.append(line.split(' ')) # split each line at the spaces

print(words[0:4])

#### Closing a file

Notice that in the above examples, we had to open the file each time. Technically, we should be closing the file in between these open calls. This frees up the memory that was being held by the fileobject

Two methods exist in python:
* `.closed()` checks to see if the fileobj is open or closed and return a True or False Boolean value
* `.close()`  closes an open fileobj

Execute the three cells below to 
1) check the status (should get that it is still open)
2) close the file
3) check the status (should now get that it is closed)


In [None]:
#check to see if a fileoject is closed
if fileobj.closed == True:
    print('it is closed')
elif fileobj.closed == False:
    print('it is still open')

In [None]:
#the syntax to close a file
fileobj = open(filename)
lines = fileobj.readlines()
fileobj.close()

In [None]:
#check to see if a fileoject is closed now
if fileobj.closed == True:
    print('it is closed')
elif fileobj.closed == False:
    print('it is still open')

**with statements to open a file**

Using a `with` statement allows python to open the file, execute a section of code and then properly close the `fileobject` without having to do an explicit `close` command. For this reason, it is often the preferred method. 

The syntax is a little different for the order of the command but it contains the same information in a more compact format.

Since the variable for the fileobject is only used inside the with statement, it is often shortened to just `f` (just make sure you have not already used that variable name for something else).



In [None]:
filename = "./data/the-zen-of-python.txt"
with open(filename, 'r') as f:
    lines = f.readlines()
    words =[]
    for line in lines:
        words.append(line.split(' '))

print(words)

### 2.2) Advantages of the open function

The advantage of using the built-in `open()` function in Python is that it will work for any ascii textfile. 

- The file can contain any number of rows of any length. 
- The length of each line doesn't matter.
- The lines can contain any type of information
- You can treat each line anyway you need to in order to extract any information you need

### 2.3) DisAdvantages of the open function

Although the `open()` function is very powerful, there are often more efficient ways to access the data in the file **IF** it has a well ordered structure for the data. 

You do need to know your data first to use these other methods but once you do, you can use the best method from the other ones in this tutorial. 

## 3.0) Column formatted text files 

If the data file contains ascii text with a standard format on each line, we have some more efficient ways to read in and work with the data.

In particular, in Data Science we often have columns of data with each row containing the same number of columns. 

Below are examples of a few files that we will now be using. We summarize the first few lines of each files as raw text here just to give you a view of each file. 

We could use the open file method described in the previous section. 

It will read in the lines and put the strings into a list. 

To use the values as numbers, we would still need to: 
- Strip off the next line character,
- Split the lines into strings,
- Convert each string into a number.

This works as shown below (without a detailed explanation of each step), but takes a lot of code to do everything we need.

> **Note** This is not the recommended method to read in a data file like this. 

In [None]:
filename = "./data/syn.txt" 
with open(filename) as f:
    lines = f.readlines()

data =[]
for line in lines:
    temp = line.rstrip('\n')
    vals = temp.split(' ')
    values = []
    for val in vals:
        values.append(float(val))
    data.append(values)
    
data[0:2]

### 3.1) Numpy loadtxt

**If** the data file contains ascii text  of numbers organized into columns of data...

One more efficient method you can use is the numpy function `np.loadtxt`.

The general syntax is `data = np.loadtxt(filename, delimiter = None, skiprows = 0)`. 

If you do not specify a delimiter, it will assume whitespace.

If you do not specify a number of rows to skip at the start of the file, it will start with the first line.

Full documentation is given at:
https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

#### Files with just numbers and spaces

In [12]:
#Here is the code to read in the syn.txt file. 
filename = "./data/syn.txt"     #Set the path to the file.
data = np.loadtxt(filename)     #Read in the file contents to a numpy array, no special options needed.

# Here we can view the first tthree rows to make sure that it is reading in the data as expected
data[0:3]

array([[ 0.04896234, 12.0700342 ],
       [ 0.47187692, 12.64081926],
       [ 0.66405504, 12.85580919]])

In [13]:
#Here we can examine information about the data.
print('Type:',type(data))
print('Length:',len(data))
print('Dimensions:',data.ndim)
print('Shape:',np.shape(data))
print('Type of individual value:',type(data[0][0]))

Type: <class 'numpy.ndarray'>
Length: 500
Dimensions: 2
Shape: (500, 2)
Type of individual value: <class 'numpy.float64'>


As you can see above, the data is immediately accessible as a NumPy 2D data array with just a single line of code to readin the data from the file and format it as numbers. 

#### Files with a header

A header is a line or lines at the top of the data file containing information about the data. This information is very useful in understanding the data, but we need to be careful in how we read in the file. 

For `np.loadtxt` you can skip reading in these rows using the `skiprows=` parameter. 
- The default value is None
- Otherwise, it should be an integer equal to the number of lines to skip before reading the data.
- lines which start with the `#` symbol are considered comments and skipped automatically, but can use a skiprows parameter

For our example files, 
- syn.txt had no header
- GCN25560.txt has a single line header


In [None]:
#Example that skips the header line.
filename = './data/GCN25560.txt'
data = np.loadtxt(filename,skiprows=1)
data[0:2]

#### Files that are comma separated

By default, the `np.loadtxt` assumes that the separator between the data columns in a whitespace. 
If a data file has commas seperating the values in the columns, we can still use the same method but we have to specify the delimiter.

These comma-separated-values files are often given the extension of `.csv` but can also have `.txt` or `.dat` extensions.


In [15]:
#Here is a comma delimited example with one header row.
filename = './data/galaxies.txt'
data = np.loadtxt(filename, delimiter =',', skiprows =1)

data[0:2]

array([[1.0000000e+00, 1.3337110e+02, 5.7598427e+01, 3.9515216e-02],
       [2.0000000e+00, 1.3368567e+02, 5.7480250e+01, 4.1055806e-02]])

#### Getting Numpy column data to work with

If you read your data into a numpy 2D array, then you can access a data column using index slicing.

In general, `data[:,i]` will grab all rows for the i index column and return a 1D array.

In [16]:
# This grabs all rows from the column with index 0 and assigns it to the variable col0
col0 = data[:,0]

# We can examine that variable to better understand the shape and format
print('Dimensions:',col0.ndim)
print('Shape of the column array:',col0.shape)
print('Type of column array:',type(col0))
print('First few values of col0', col0[0:2])

Dimensions: 1
Shape of the column array: (4656,)
Type of column array: <class 'numpy.ndarray'>
First few values of col0 [1. 2.]


In [20]:
# This syntax grabs the index 3 column and creates a 1D array
arr3 = data[:,3]

# We can examine that variable to better understand the shape and format
print('Dimensions:',col0.ndim)
print('Shape of the column array:',arr3.shape)
print('Type of flattened array:',type(arr3))
print('First few values of col0', arr3[0:2])


Dimensions: 1
Shape of the column array: (4656,)
Type of flattened array: <class 'numpy.ndarray'>
First few values of col0 [0.03951522 0.04105581]


### 3.2) Using Numpy if data are not all numbers

NumPy is most efficient when working with numbers. Further a numpy array must have only one data type. By default, `np.loadtxt` assumes that the data can all be converted to float values. 

If even one value in the data file is a non numeric string, `np.loadtxt` will give an error when it tries to convert that string into a float value. 

We can work around this by specifically telling numpy to use a string data type when working with the file. 

Below, are examples of using `np.loadtxt` with the Moons_and planets.csv file
- The first shows the correct syntax to use to get strings
- The second cell shows the error statement you will get without the data type
    - uncomment the command and execute to see the error

---
**Correct method**

In [None]:
#This is the correct call
file = "./data/Moons_and_planets.csv"
data2 = np.loadtxt(file,dtype="str",delimiter=',',skiprows=1)

print(type(data2))
data2[0:2]

---
**Incorrect method**

Which will give an error.

In [None]:
file = "./data/Moons_and_planets.csv"
#This will give a ValueError

#Uncomment the line below to see what the error looks like
#data2 = np.loadtxt(file,delimiter=',',skiprows=1)

Now the numpy array is an array of string values, both the words and numbers are left as strings.


Additional information on using Numpy including a few additional functions and formats can be found at:

https://numpy.org/doc/stable/user/how-to-io.html

### 3.3) Reading in files with Pandas

Since data files may have mixed data types, numpy is not the right choice for all data files. Several other packages focus on different ways to work with Data files. **Pandas** is a very commonly used one. 

In their documentation they use the description: 

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Full documentation and Users guides can be found at https://pandas.pydata.org/docs/

Advantages of pandas
- It deals with multiple data types easily
- The resulting data structure for numeric columns can be easily converted to numpy arrays
- There are functions and methods to work with the data in the table
- Any header information can be used to define column names (instead of just skipping the lines)

#### Pandas read csv

When working with data files with comma seperated values, you can use the `pd.read_csv()` function. 

The required parameter is the string filename, and the output is a pandas dataframe object.

It is common to use df in the output variable name to indicate that it is a DataFrame, but this is not required.

In [24]:
#Reading in the datafile
filename = './data/galaxies.csv'
gal_df = pd.read_csv(filename)

type(gal_df)

pandas.core.frame.DataFrame

#### Viewing the Dataframe
We can use the same format to examine the first few lines of the data as we did with the NumPy arrays, but we immediately see that the output is easier to read. 

The column names were taken from the header automatically, and the values are shown in the normal decimal place format instead of the scientific notation format we saw in the numpy arrays. 

When you view an entire dataframe that is large, you will get a truncated view with the first 5 rows, and line of `...` and then the last five rows. At the end it shows the shape of the data frame.


In [26]:
# Show the dataframe as a whole
gal_df

Unnamed: 0,# mangaid,objra,objdec,redshift
0,1,133.37110,57.598427,0.039515
1,2,133.68567,57.480250,0.041056
2,3,136.01717,57.092330,0.046571
3,4,133.98996,57.677967,0.014351
4,5,136.75137,57.451435,0.046406
...,...,...,...,...
4651,4652,228.41486,28.244461,0.046080
4652,4653,226.99060,28.881860,0.058178
4653,4654,228.07332,29.657210,0.069022
4654,4655,227.04141,29.222193,0.111297


In [27]:
# You can show specific lines with their index range

#index 0 up to 3
gal_df[0:3]

Unnamed: 0,# mangaid,objra,objdec,redshift
0,1,133.3711,57.598427,0.039515
1,2,133.68567,57.48025,0.041056
2,3,136.01717,57.09233,0.046571


In [29]:
# You can show specific lines with their index range

#index 150 up to 156
gal_df[150:156]

Unnamed: 0,# mangaid,objra,objdec,redshift
150,151,317.04016,-0.250806,0.095097
151,152,316.97476,0.875871,0.05805
152,153,317.3149,0.523469,0.050923
153,154,322.94946,-1.03917,0.051821
154,155,322.9288,0.357835,0.030129
155,156,324.092,0.947727,0.103859


#### Getting Pandas column data to work with

If you read your data file into a Pandas dataframe, you can extract out a single column using the keywords in the header. 

If you aren't sure what the keywords are, you can use the  `df.columns` method as shown below. You will get a list of the strings.


In [30]:
# List the columns in the gal_df datafram
gal_df.columns

Index(['# mangaid', 'objra', 'objdec', 'redshift'], dtype='object')

To get just one column you will use the format `df['name']` which will return an object which is a Pandas Series. 

In most situations, this Series can be used as if it was a NumPy array. 

The example below, shows the extraction of the redshift column and then the numpy function to return the maximum value.

In [34]:
# This extracts the redshift column and assigns it to the variable name reds
reds = gal_df['redshift']

# Here we get some info on that extracted column
print('The column is the type:',type(reds))
print('The column has a length of:', len(reds))

# We can do math on the column
print('The maximum value is:',np.max(reds))

The column is the type: <class 'pandas.core.series.Series'>
The column has a length of: 4656
The maximum value is: 0.27818364


#### Pandas with other column data files

We can use the `pd.read_csv()` function even if the data has a different separator.

You will need to specify the `delimiter` keyword or the equivalent `sep` keyword (short for separator)

- A single space delimiter will use `sep=' '`
- A variable number of white spaces will use `sep='\s+'`
- A comma would use `sep=','` if you leave off sep, the comma will be assumed

If you don't have a header line on the file, you can use the `header=None` keyword. In that case, the column names will be the index numbers.

In [22]:
#Using the syn.txt file
filename = './data/GCN25560.txt'
df1 = pd.read_csv(filename,sep='\s+')

df1[0:2]

Unnamed: 0,JD,dt_minutes,ap_mag,Mag_err
0,2458725.366,51.16,16.93,0.01
1,2458725.372,59.8,17.21,0.02


In [43]:
#Using the syn.txt file
#Here we set the delimiter and also say no header
filename = './data/syn.txt'
df2 = pd.read_csv(filename, sep=' ', header=None)

print('Column names:\n',df2.columns)

df2[0:2]

Column names:
 Index([0, 1], dtype='int64')


Unnamed: 0,0,1
0,0.048962,12.070034
1,0.471877,12.640819


#### Pandas with mixed data types

While the Moons_and_planets data file was very hard to deal with using numpy, pandas has no difficulty with it at all. 

We can examine the dataframe to see the datatypes for each column.

In [41]:
#Here we see that it easily handles text and numbers
filename = "./data/Moons_and_planets.csv"
df3 = pd.read_csv(filename)

df3[0:2]

Unnamed: 0,# Name of Moon,Name of Planet,Diameter (km)
0,Moon,Earth,1737.1
1,Phobos,Mars,11.1


In [42]:
print('Column names:',df3.columns)
print('DataTypes for each column:\n',df3.dtypes)

Column names: Index(['# Name of Moon', ' Name of Planet', ' Diameter (km)'], dtype='object')
DataTypes for each column:
 # Name of Moon      object
 Name of Planet     object
 Diameter (km)     float64
dtype: object


## 4.0) Working with spreadsheet files

While text and csv files are ascii files where the data is stored in a way that you can directly view the file, Spreadsheet programs such as Microsoft Excel or LibreOffice Calc, create files that are not ASCII. 

To read in the data files, we need to use different approaches.

### 4.1) convert to csv

If you only need to work with one spreadsheet file, it may be easier to open that spreadsheet with spreadsheet software and use the `Save As` options to save it out as a .csv file. Then you can use the methods described above to bring in the new .csv file into python.

### 4.2) Pandas and Excel

Excel is commonly used enough that Pandas has a method to work with the Excel files using `pd.read_excel()` instead of `pd.read_csv()`.

The function call is very similar to the `read_csv` if you only have a single sheet in the spreadsheet file. 

If you have multiple sheets or only need to pull in a specific range of cells from the spreadsheet, there are keyword parameters to do this. 

We show only the simple example here.

In [None]:
filename = './data/galaxies.xlsx'

df = pd.read_excel(filename)

df[0:2]

## 5.0) Writing Output files

**not finished**

Need 
* writing out numpy 1D as a column
* writing multiple 1D as columns
* writing out 2D as rows and columns
* adding header
* delimitors 

### Simple ASCII text files of numpy arrays.

SImilar in nature to the `np.loadtxt()` fucntion, we have a `np.savetxt()` function.

The basic syntax is `np.savetxt(fname,X)`
* fname is a string for the filename with path
* X is a 1D or 2D numpy array

Additional optional keyword arguments are:
* delimiter = ' '  which is a string or character separating columns
* header = ''  a string that will be written at the beginning of the file
* footer ='' which is a string that will be written at the end of the file

There are several additional options  that are not covered in this tutorial, but full documentation can be found at:
https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html#numpy.savetxt

--- 
**Examples**

Each of these examples will use the same data and create a new output file. Once you run the code you can view the output files to see the effects of each choice. If you rerun the code, it will overwrite any previous version of that output file. 

Notice that the various files also use the np.stack functions to deal with mutliple arrays

In [None]:
#Setting up some arrays to test writing out data
x = np.arange(50)
y = x**3


In [None]:
#This will create a file with the x values listed as a column
np.savetxt('out1.txt',x)

In [None]:
#This will create a file with the x values listed as a column with some formatting
np.savetxt('out2.txt',x,fmt='%.1f')

In [None]:
#This will create a file with the x and y values listed as individual rows with space delimiters.
np.savetxt('out3.txt',np.stack([x,y],axis=0),fmt='%.1f')

In [None]:
#This will create a file with the x and y values listed as individual columns with space delimiters.
np.savetxt('out4.txt',np.stack([x,y],axis=1),fmt='%.1f')

In [None]:
#This will create a file with the x and y values listed as individual columns with comma delimiters.
np.savetxt('out5.txt',np.stack([x,y],axis=1),delimiter=',',fmt='%.1f')

In [None]:
#This will create a file with the x and y values listed as individual columns with comma delimiters.
#A header line is added as well
np.savetxt('out6.txt',np.stack([x,y],axis=1),delimiter=',',header = 'x,y',fmt='%.1f')

In [None]:
#This will create a file with the x and y values listed as individual columns with tab delimiters.
#A header line is added as well
np.savetxt('out7.txt',np.stack([x,y],axis=1),delimiter='\t',header = '   x \t y',fmt='%5.1f')

### Simple csv file from pandas

The previous section looked at writing out data from numpy arrays. If your data is in a pandas dataframe, then you can write out the data to a csv file using methods from the pandas library. 

To continue with one of our previous examples, we used pandas to read in the `syn.txt` data file. We can now write that file out as a `.csv` file.

Since it is a method it is called on the dataframe object with the general format of:
`mydf.to_csv(filename)`

You can use additional optional keyword arguments to adjust the formatting of the data in the file. 
Full documentation can be found at: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

In the code cell below, we first read in the `.txt` data file and then write it out to a new filename as a `.csv` file

In [None]:
#Using the syn.txt file
#Here we set the delimiter and also say no header
filename = './data/syn.txt'
syndf = pd.read_csv(filename, sep=' ', header=None)

#Here we write out the file with a header and not including the column of row index
syndf.to_csv('./data/syn.csv',header = ['time','flux'], index = False)

# Assignments


## Exercise 1

In this exercise, you will be working with the file 'NGC5272.txt'

---
**Instructions**

1) Use the built-in open method to read in the lines of the file
    - Don't forget to close the file when done
2) Print the first three lines of the file
   - Use this to determine if the file has a header line
   - Use this to determine what delimiter is used
3) Using numpy, read in the data
    - Print out the number of data rows
    - Print out the first five rows of data

In [None]:
# Step 1: use open and readlines to get the data


# Step 2: print out the first three lines


# Step 3: use numpy to read the data


# Step 3b: print out the number of rows of data


# Step 3c: print out the first five rows of data




## Exercise 2  

This exercise continues with the same data file `NGC5272.txt` you worked with in the last exercise. 

---
**Instructions**

1) Use Pandas to read the data into a dataframe
2) Show the top 5 rows of the dataframe
3) Write out the dataframe to a `.csv` file


In [None]:
#Put your code (feel free to separate your code into multiple cells)


## Exercise 3

To practice reading and writing data files, complete this exercise using the syn.txt datafile. You will practice both the numpy and pandas ways to read in the data and get specific columns to work with. 

---
**Instructions**

1) Use np.loadtxt to read in the `syn.txt` file
2) Print out the dimensions of the data
3) Assign the first column to the variable name x and the second to the variable name y


4) Now use Pandas to read in the `syn.txt` file
5) Print out the dimensions of the data
6) Assign the first column to the variable name x and the second to the variable name y


In [None]:
# Code here for 1-3

In [44]:
# Code here for 4-6