# Accessing Files for Data

We have seen how we can create data and access it in the form of `lists`, `tuples` and `dictionaries`. In real life, we might pieces of information from a variety of places. Data curation and structuring is arguably the tedious job which unfortunately still requires a lot of human effort.

Data commonly resides in files. Think images captured from a camera, or sound recorded from a music instrument. Most of the times, data can be read from a file just like we can read text out of a word processing document. Here we are going to learn ways in which we can **read** and **write** text files. This comes in very handy to analyse data in real life. Python also allows us to **modify** file contents.


## Access Files

The first step is to `open` the file in Python. There are three ways in which any file can be opened. One is `read-only (r)`, others allow write permission `(w)` or `append` (a) if contents need to be modified.

`open(filename,"r/w/a")` opens the file with name `filename`.



In [None]:
# Lets open a text file named olympics.txt as a ready only.

fileopen = open("olympics.txt", "r") #note the double quote around the r flag



The above code holds a reference of the file in the identifier `fileopen`. To close the file, we cna use the method `fileopen.close()`.

In [None]:
# Write your Python code here that closes the above opened file.



## Modifiying Files

Once we have created an object or the reference of the file using the `open()` function, we can access the contents of the file using the `read()`. This returns all the contents of the file in the form of a string. We can use the `write()` function to add  a string towards the end of the file.

In [6]:

fileopen = open("olympics.txt", "a")

# Add a line towards the end of the file

fileopen.write("\nMusa Tahir,M,1,Team GB,Toddler MMA, Gold")

fileopen.close()


## Iterating over File Contents

For scientific computation, we require data processing and for that we need to examine the contents of a file, line by line. Since this is a repetitive process (the steps being the same for each line), we can write a loop for this job.

Whilst handling files, the function `readlines()` returns the contents of a file line by line as a string. A computer defines a line at the which there is a special newline character `\n` which is normally hidden from our view.

```python
for line in myFile.readlines():
    statement1
    statement2
    ...
```

Using the `split()` function, we can break a line into a list wherever a comma `,` is read on a line.

In [5]:
olypmicsfile = open("olympics.txt", "r")

for aline in olypmicsfile.readlines():
    values = aline.split(",")
    print(values[0], "is from", values[3], "and is on the roster for", values[4])

olypmicsfile.close()

Name is from Team and is on the roster for Event
A Dijiang is from China and is on the roster for Basketball
A Lamusi is from China and is on the roster for Judo
Gunnar Nielsen Aaby is from Denmark and is on the roster for Football
Edgar Lindenau Aabye is from Denmark/Sweden and is on the roster for Tug-Of-War
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Per Knut Aaland is from United States and is on the roster for Cross Country Skiing
Per Knut Aaland is from United States and is on the roster for Cross Country Skiing
Per Knut Aaland 

## CSV Formatted Files

Comma separated values files are very common where data is made accessible to others. Typically one would use a spreadsheet software like Excel or Google Sheets to read and access the data. We deal with them the same as normal text files.

In [1]:
n = [0] * 12
for i in range(1,13):
    n[i-1] = i *12
outfile = open("Multiples of 12", "w")
for j in range(0, len(n)):
    outfile.write(str(j+1) + ',' + str(n[j]))
    # +1 to j since the array starts at 0 and we start counting at 1
    outfile.write('\n')
outfile.close()
