# Reading and Writing to Files

1. Open and read text file
2. Read, readlines, readline
3. Parsing lines and saving information
4. Files with headers
5. Converting from string to other data types
6. Performing operations
7. Writing to a file

## Open and read a text file line by line

In [1]:
f = open("files/csvdemo.txt", "r")
for line in f:
    print(line)
f.close()

Steven,apples,3.0

Adam,cherries,4.3

Karen,apples,5.6

April,grapes,0.4


Note that these are read as strings. The "r" parameter means you are reading from this file. The reading mode is the default, so it is not strictly necessary to supply it unless you want more options.

In [2]:
f = open("files/csvdemo.txt")
print(type(f))
for line in f:
    print(type(line))
f.close()

<class '_io.TextIOWrapper'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


If you want the name of the file, use the `.name` method on the file object.

In [3]:
fo = open("files/csvdemo.txt")
print("Name of the file: {}".format(fo.name))
fo.close()

Name of the file: files/csvdemo.txt


The `.readlines` method does what it says it does: reads all the lines. Each line is saved as a string and composed into a list.

In [4]:
fo = open("files/csvdemo.txt")
line = fo.readlines()
print("Read Line: {}".format(line))

line = fo.readlines(2)
print("Read Line: {}".format(line))

# Close opened file
fo.close()

Read Line: ['Steven,apples,3.0\n', 'Adam,cherries,4.3\n', 'Karen,apples,5.6\n', 'April,grapes,0.4']
Read Line: []


In [5]:
fo = open("files/csvdemo.txt")
line = fo.readlines(20)
print("Read Line: {}".format(line))

line = fo.readlines()
print("Read Line: {}".format(line))

# Close opened file
fo.close()

Read Line: ['Steven,apples,3.0\n', 'Adam,cherries,4.3\n']
Read Line: ['Karen,apples,5.6\n', 'April,grapes,0.4']


* `read(size)` -> size is an optional numeric argument and this func returns a quantity of data equal to size. If size if omitted, then it reads the entire file and returns it
* `readline()` -> reads a single line from file with newline at the end
* `readlines()` -> returns a list containing all the lines in the file

In [6]:
f = open("files/fasta_sample.fna")
data = f.read()
print(data)
f.close()

>NC_000913 Escherichia coli K12, complete genome.
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA


In [7]:
f = open("files/fasta_sample.fna")
data = f.read(100)
print(data)
f.close()

>NC_000913 Escherichia coli K12, complete genome.
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAA


In [8]:
f = open("files/fasta_sample.fna")
data = f.readlines()
print(data)
f.close()

['>NC_000913 Escherichia coli K12, complete genome.\n', 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC\n', 'TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG\n', 'TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC\n', 'ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT\n', 'AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG\n', 'CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT\n', 'ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC\n', 'AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG\n', 'GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA']


In [9]:
f = open("files/fasta_sample.fna")
seq_info = f.readline()
data = f.readlines()
print(seq_info)
print(data)
f.close()

>NC_000913 Escherichia coli K12, complete genome.

['AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC\n', 'TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG\n', 'TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC\n', 'ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT\n', 'AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG\n', 'CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT\n', 'ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC\n', 'AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG\n', 'GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA']


In [10]:
with open("files/fasta_sample.fna", 'r') as f:
    for line in f.readlines():
        print(line)

>NC_000913 Escherichia coli K12, complete genome.

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC

TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG

TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC

ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT

AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG

CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT

ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC

AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG

GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA


In [11]:
with open("files/fasta_sample.fna", 'r') as f:
    seq_info = f.readline()
    data = f.read()
    print(data)
    print(type(data), len(data))

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA
<class 'str'> 548


In [12]:
f = open("files/fasta_sample.fna", 'r')
data = f.read()
print(data)
f.close()

>NC_000913 Escherichia coli K12, complete genome.
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA


In [None]:
#with open('data') as input_file, open('result', 'w') as output_file:
#   for line in input_file:
#     output_file.write(parse(line))