# Introduction
In this module, we explore new ways of acquiring data in our application. Up to this point, we have been using data values that are either entered by the user or are persistent in our programs. This limits the types of tasks our applications can handle. In this chapter we will learn how to use python to interface with external files. We will learn how to open files, read files line-by-line and write data to new or existing files. 

# Files 
## Opening Files
Because python is an interpreted language, interfacing with the operating system is handled by the interpreter. Therefore, when files are ‘opened’ python requests a file handler that acts as a middleman between python and the file. If you think of it from an ownership perspective, python owns the commands in python and the operating system owns the files stored on the computer. Neither really understands how to interact with one-another, so a middleman is needed to translate python’s requests into directions the operating system understands. 

In [1]:
with open('support/random_numbers.txt') as fileHandler: 
    print(fileHandler)
    print(type(fileHandler))

<_io.TextIOWrapper name='support/random_numbers.txt' mode='r' encoding='UTF-8'>
<class '_io.TextIOWrapper'>


The **with** keyword above provides a safe way of working with files because files must be opened and closed to ensure data integrity. If your code crashes after you open the file, the file may be left open and may be corrupted. The with statement ensures that files are closed when the with block of code completes. For indentation purposes, code that needs access to the file handler should be indented one level beyond the indentation level of the with (just like conditional and iteration blocks).  

## Reading files
### Line-by-Line
There are several mechanisms for reading from files. The first, and probably easiest, is to treat the file handler as an iterable list where each iteration through the for loop produces the next line of text from the source file. The code below reads the text one line at a time and exits the for loop when there are no lines left to read. 

Using fileHandler as an iterable...

In [3]:
with open('support/random_numbers.txt') as fileHandler: 
    for index, line in enumerate(fileHandler):
        print(f"{index}: {line}")
        break

0: 926,927,928,929,930

1: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

2: 488,489,490,491,492,493,494,495

3: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

4: 169,170,171,172,173,174,175,176,177,178,179,180

5: 419,420

6: 455,456,457,458,459,460,461,462,463,464,465

7: 883,884,885

8: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

9: 786,787,788,789,790,791,792,793,794,795

10: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

11: 912,913,914,915

12: 81,82,83,84,85,86,87,88,89,90

13: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

14: 450

15: 649,650,651,652,653,654,655,656,657,658,659,660

16: 194,195

17: 825

18: 739,740,741,742,743,744,745,746,747,748,749,750

19: 805,806,807,808,809,810

20: 395,396,397,398,399,400,401,402,403,404,405

21: 358,359,360

22: 351,352,353,354,355,356,357,358,359,360

23: 575,576,577,578,579,580,581,582,583,584,585

24: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15



In [6]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 1
    for index, line in enumerate(fileHandler):
        print(f"Line {lineNumber}: {line}") 
        lineNumber += 1

Line 1: 926,927,928,929,930

Line 2: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

Line 3: 488,489,490,491,492,493,494,495

Line 4: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

Line 5: 169,170,171,172,173,174,175,176,177,178,179,180

Line 6: 419,420

Line 7: 455,456,457,458,459,460,461,462,463,464,465

Line 8: 883,884,885

Line 9: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

Line 10: 786,787,788,789,790,791,792,793,794,795

Line 11: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

Line 12: 912,913,914,915

Line 13: 81,82,83,84,85,86,87,88,89,90

Line 14: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

Line 15: 450

Line 16: 649,650,651,652,653,654,655,656,657,658,659,660

Line 17: 194,195

Line 18: 825

Line 19: 739,740,741,742,743,744,745,746,747,748,749,750

Line 20: 805,806,807,808,809,810

Line 21: 395,396,397,398,399,400,401,402,403,404,405

Line 22: 358,359,360

Line 23: 351,352,353,354,355,356,357,358,359,360

Line 24: 575,576

Using .readline()...

In [7]:
with open('support/random_numbers.txt') as fileHandler: 
    line = fileHandler.readline()
    print(line)

926,927,928,929,930



In [9]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 1
    while True:
        line = fileHandler.readline()
        if not line:
            break
        print(f"Line {lineNumber}: {line}") 
        lineNumber += 1

Line 1: 926,927,928,929,930

Line 2: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

Line 3: 488,489,490,491,492,493,494,495

Line 4: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

Line 5: 169,170,171,172,173,174,175,176,177,178,179,180

Line 6: 419,420

Line 7: 455,456,457,458,459,460,461,462,463,464,465

Line 8: 883,884,885

Line 9: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

Line 10: 786,787,788,789,790,791,792,793,794,795

Line 11: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

Line 12: 912,913,914,915

Line 13: 81,82,83,84,85,86,87,88,89,90

Line 14: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

Line 15: 450

Line 16: 649,650,651,652,653,654,655,656,657,658,659,660

Line 17: 194,195

Line 18: 825

Line 19: 739,740,741,742,743,744,745,746,747,748,749,750

Line 20: 805,806,807,808,809,810

Line 21: 395,396,397,398,399,400,401,402,403,404,405

Line 22: 358,359,360

Line 23: 351,352,353,354,355,356,357,358,359,360

Line 24: 575,576

Using .readlines()...

In [10]:
with open('support/random_numbers.txt') as fileHandler: 
    lines = fileHandler.readlines()

In [11]:
lines

['926,927,928,929,930\n',
 '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660\n',
 '488,489,490,491,492,493,494,495\n',
 '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45\n',
 '169,170,171,172,173,174,175,176,177,178,179,180\n',
 '419,420\n',
 '455,456,457,458,459,460,461,462,463,464,465\n',
 '883,884,885\n',
 '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255\n',
 '786,787,788,789,790,791,792,793,794,795\n',
 '137,138,139,140,141,142,143,144,145,146,147,148,149,150\n',
 '912,913,914,915\n',
 '81,82,83,84,85,86,87,88,89,90\n',
 '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270\n',
 '450\n',
 '649,650,651,652,653,654,655,656,657,658,659,660\n',
 '194,195\n',
 '825\n',
 '739,740,741,742,743,744,745,746,747,748,749,750\n',
 '805,806,807,808,809,810\n',
 '395,396,397,398,399,400,401,402,403,404,405\n',
 '358,359,360\n',
 '351,352,353,354,355,356,357,358,359,360\n',
 '575,576,577,578,579,580,581,582,583,584,585\n',
 '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\n']

In [18]:
lines[0]

'926,927,928,929,930\n'

In [19]:
len(lines)

25

In [16]:
for index, line in enumerate(lines):
    print(f"Line {index}: {line}") 

Line 0: 926,927,928,929,930

Line 1: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

Line 2: 488,489,490,491,492,493,494,495

Line 3: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

Line 4: 169,170,171,172,173,174,175,176,177,178,179,180

Line 5: 419,420

Line 6: 455,456,457,458,459,460,461,462,463,464,465

Line 7: 883,884,885

Line 8: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

Line 9: 786,787,788,789,790,791,792,793,794,795

Line 10: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

Line 11: 912,913,914,915

Line 12: 81,82,83,84,85,86,87,88,89,90

Line 13: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

Line 14: 450

Line 15: 649,650,651,652,653,654,655,656,657,658,659,660

Line 16: 194,195

Line 17: 825

Line 18: 739,740,741,742,743,744,745,746,747,748,749,750

Line 19: 805,806,807,808,809,810

Line 20: 395,396,397,398,399,400,401,402,403,404,405

Line 21: 358,359,360

Line 22: 351,352,353,354,355,356,357,358,359,360

Line 23: 575,576,

Notice that this code seems to put an additional return after every line. This is because each line in the source file has a newline character (\n) at the end of it, and the print function automatically appends a newline character to the end of printed statements. Therefore, it will be good practice to use the .strip() method to clean any extra returns from the line we’ve read from the source file. 

In [22]:
for index, line in enumerate(lines):
    cleanLine = line.strip()
    print(f"Line {index}: {repr(cleanLine)}") 

Line 0: '926,927,928,929,930'
Line 1: '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660'
Line 2: '488,489,490,491,492,493,494,495'
Line 3: '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45'
Line 4: '169,170,171,172,173,174,175,176,177,178,179,180'
Line 5: '419,420'
Line 6: '455,456,457,458,459,460,461,462,463,464,465'
Line 7: '883,884,885'
Line 8: '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255'
Line 9: '786,787,788,789,790,791,792,793,794,795'
Line 10: '137,138,139,140,141,142,143,144,145,146,147,148,149,150'
Line 11: '912,913,914,915'
Line 12: '81,82,83,84,85,86,87,88,89,90'
Line 13: '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270'
Line 14: '450'
Line 15: '649,650,651,652,653,654,655,656,657,658,659,660'
Line 16: '194,195'
Line 17: '825'
Line 18: '739,740,741,742,743,744,745,746,747,748,749,750'
Line 19: '805,806,807,808,809,810'
Line 20: '395,396,397,398,399,400,401,402,403,404,405'
Line 21: '358,359,360'
Line 22: '351,352,353,354,355,356,357,358,359

### All at once
Or, you can read the entire file contents into a variable. 

In [23]:
with open('support/random_numbers.txt') as fileHandler: 
    fileContent = fileHandler.read() 

In [24]:
fileContent

'926,927,928,929,930\n646,647,648,649,650,651,652,653,654,655,656,657,658,659,660\n488,489,490,491,492,493,494,495\n31,32,33,34,35,36,37,38,39,40,41,42,43,44,45\n169,170,171,172,173,174,175,176,177,178,179,180\n419,420\n455,456,457,458,459,460,461,462,463,464,465\n883,884,885\n241,242,243,244,245,246,247,248,249,250,251,252,253,254,255\n786,787,788,789,790,791,792,793,794,795\n137,138,139,140,141,142,143,144,145,146,147,148,149,150\n912,913,914,915\n81,82,83,84,85,86,87,88,89,90\n256,257,258,259,260,261,262,263,264,265,266,267,268,269,270\n450\n649,650,651,652,653,654,655,656,657,658,659,660\n194,195\n825\n739,740,741,742,743,744,745,746,747,748,749,750\n805,806,807,808,809,810\n395,396,397,398,399,400,401,402,403,404,405\n358,359,360\n351,352,353,354,355,356,357,358,359,360\n575,576,577,578,579,580,581,582,583,584,585\n1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\n'

In [25]:
print(fileContent)

926,927,928,929,930
646,647,648,649,650,651,652,653,654,655,656,657,658,659,660
488,489,490,491,492,493,494,495
31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
169,170,171,172,173,174,175,176,177,178,179,180
419,420
455,456,457,458,459,460,461,462,463,464,465
883,884,885
241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
786,787,788,789,790,791,792,793,794,795
137,138,139,140,141,142,143,144,145,146,147,148,149,150
912,913,914,915
81,82,83,84,85,86,87,88,89,90
256,257,258,259,260,261,262,263,264,265,266,267,268,269,270
450
649,650,651,652,653,654,655,656,657,658,659,660
194,195
825
739,740,741,742,743,744,745,746,747,748,749,750
805,806,807,808,809,810
395,396,397,398,399,400,401,402,403,404,405
358,359,360
351,352,353,354,355,356,357,358,359,360
575,576,577,578,579,580,581,582,583,584,585
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15



In [27]:
print("Length of document:" + str(len(fileContent)))

Length of document:843


## Searching a file
When searching a file, it is often best to scan the file line-by-line. This will prevent any memory or storage issues if you are trying to process a very large data file (>1GB). For small files, it may be easiest to simply read in the entire file, but I would recommend you tailor your thinking to a line-by-line processing approach as it is a more generic, portable approach. 

In [28]:
with open('support/random_numbers.txt') as fileHandler: 
    fileContent = fileHandler.read()

In [29]:
fileLines = fileContent.split("\n")

In [30]:
fileLines

['926,927,928,929,930',
 '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660',
 '488,489,490,491,492,493,494,495',
 '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45',
 '169,170,171,172,173,174,175,176,177,178,179,180',
 '419,420',
 '455,456,457,458,459,460,461,462,463,464,465',
 '883,884,885',
 '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255',
 '786,787,788,789,790,791,792,793,794,795',
 '137,138,139,140,141,142,143,144,145,146,147,148,149,150',
 '912,913,914,915',
 '81,82,83,84,85,86,87,88,89,90',
 '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270',
 '450',
 '649,650,651,652,653,654,655,656,657,658,659,660',
 '194,195',
 '825',
 '739,740,741,742,743,744,745,746,747,748,749,750',
 '805,806,807,808,809,810',
 '395,396,397,398,399,400,401,402,403,404,405',
 '358,359,360',
 '351,352,353,354,355,356,357,358,359,360',
 '575,576,577,578,579,580,581,582,583,584,585',
 '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15',
 '']

In [31]:
for index, line in enumerate(fileLines):
    cleanLine = line.strip() 
    if len(cleanLine) > 10:
        print(f"this line is longer than 10 characters: {cleanLine}") 

this line is longer than 10 characters: 926,927,928,929,930
this line is longer than 10 characters: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660
this line is longer than 10 characters: 488,489,490,491,492,493,494,495
this line is longer than 10 characters: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
this line is longer than 10 characters: 169,170,171,172,173,174,175,176,177,178,179,180
this line is longer than 10 characters: 455,456,457,458,459,460,461,462,463,464,465
this line is longer than 10 characters: 883,884,885
this line is longer than 10 characters: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
this line is longer than 10 characters: 786,787,788,789,790,791,792,793,794,795
this line is longer than 10 characters: 137,138,139,140,141,142,143,144,145,146,147,148,149,150
this line is longer than 10 characters: 912,913,914,915
this line is longer than 10 characters: 81,82,83,84,85,86,87,88,89,90
this line is longer than 10 characters: 256,257,258,259,2

Your search for content needs to be aware of the fact that lines read in from a file are always read in as a string. Therefore, if you expect numbers, you will need to take the necessary steps to convert the content into a numeric form. The code above checks the length of the line to verify it is longer than 10 characters. If we wanted to convert the line to a list and then check for lines with more than 10 items, we would do this: 

In [32]:
with open('support/random_numbers.txt') as fileHandler: 
    fileContent = fileHandler.read()

In [33]:
fileLines = fileContent.split("\n")

In [34]:
fileLines

['926,927,928,929,930',
 '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660',
 '488,489,490,491,492,493,494,495',
 '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45',
 '169,170,171,172,173,174,175,176,177,178,179,180',
 '419,420',
 '455,456,457,458,459,460,461,462,463,464,465',
 '883,884,885',
 '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255',
 '786,787,788,789,790,791,792,793,794,795',
 '137,138,139,140,141,142,143,144,145,146,147,148,149,150',
 '912,913,914,915',
 '81,82,83,84,85,86,87,88,89,90',
 '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270',
 '450',
 '649,650,651,652,653,654,655,656,657,658,659,660',
 '194,195',
 '825',
 '739,740,741,742,743,744,745,746,747,748,749,750',
 '805,806,807,808,809,810',
 '395,396,397,398,399,400,401,402,403,404,405',
 '358,359,360',
 '351,352,353,354,355,356,357,358,359,360',
 '575,576,577,578,579,580,581,582,583,584,585',
 '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15',
 '']

In [35]:
for index, line in enumerate(fileLines):
    cleanLine = line.strip() 
    numList = cleanLine.split(",") 
    if len(numList) > 10: 
        print(f"this line has more than 10 elements: {numList}") 

this line has more than 10 elements: ['646', '647', '648', '649', '650', '651', '652', '653', '654', '655', '656', '657', '658', '659', '660']
this line has more than 10 elements: ['31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45']
this line has more than 10 elements: ['169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', '180']
this line has more than 10 elements: ['455', '456', '457', '458', '459', '460', '461', '462', '463', '464', '465']
this line has more than 10 elements: ['241', '242', '243', '244', '245', '246', '247', '248', '249', '250', '251', '252', '253', '254', '255']
this line has more than 10 elements: ['137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150']
this line has more than 10 elements: ['256', '257', '258', '259', '260', '261', '262', '263', '264', '265', '266', '267', '268', '269', '270']
this line has more than 10 elements: ['649', '650', '651', '652', '653

In the previous examples, we parse the file line-by-line because the task (e.g., checking for lines with more than 10 numbers) imply a line-by-line search. If we had a different requirement we may parse the file differently. For example, if we wanted to collect all the numbers greater than 500 in to a list, regardless of what line they occur, it would make sense to process the file element-by-element. We might do the following:  

In [38]:
with open('support/random_numbers.txt') as fileHandler: 
    fileContent = fileHandler.read()

In [45]:
fileElements = fileContent.strip().replace("\n", ",").split(",")

In [46]:
fileElements

['926',
 '927',
 '928',
 '929',
 '930',
 '646',
 '647',
 '648',
 '649',
 '650',
 '651',
 '652',
 '653',
 '654',
 '655',
 '656',
 '657',
 '658',
 '659',
 '660',
 '488',
 '489',
 '490',
 '491',
 '492',
 '493',
 '494',
 '495',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42',
 '43',
 '44',
 '45',
 '169',
 '170',
 '171',
 '172',
 '173',
 '174',
 '175',
 '176',
 '177',
 '178',
 '179',
 '180',
 '419',
 '420',
 '455',
 '456',
 '457',
 '458',
 '459',
 '460',
 '461',
 '462',
 '463',
 '464',
 '465',
 '883',
 '884',
 '885',
 '241',
 '242',
 '243',
 '244',
 '245',
 '246',
 '247',
 '248',
 '249',
 '250',
 '251',
 '252',
 '253',
 '254',
 '255',
 '786',
 '787',
 '788',
 '789',
 '790',
 '791',
 '792',
 '793',
 '794',
 '795',
 '137',
 '138',
 '139',
 '140',
 '141',
 '142',
 '143',
 '144',
 '145',
 '146',
 '147',
 '148',
 '149',
 '150',
 '912',
 '913',
 '914',
 '915',
 '81',
 '82',
 '83',
 '84',
 '85',
 '86',
 '87',
 '88',
 '89',
 '90',
 '256',
 '257',
 '258',
 '259',
 

In [47]:
bigNumList = [] 
for index, element in enumerate(fileElements):
    num = int(element)
    if num > 500: 
        bigNumList.append(num)

In [48]:
bigNumList

[926,
 927,
 928,
 929,
 930,
 646,
 647,
 648,
 649,
 650,
 651,
 652,
 653,
 654,
 655,
 656,
 657,
 658,
 659,
 660,
 883,
 884,
 885,
 786,
 787,
 788,
 789,
 790,
 791,
 792,
 793,
 794,
 795,
 912,
 913,
 914,
 915,
 649,
 650,
 651,
 652,
 653,
 654,
 655,
 656,
 657,
 658,
 659,
 660,
 825,
 739,
 740,
 741,
 742,
 743,
 744,
 745,
 746,
 747,
 748,
 749,
 750,
 805,
 806,
 807,
 808,
 809,
 810,
 575,
 576,
 577,
 578,
 579,
 580,
 581,
 582,
 583,
 584,
 585]

## Writing a file
Any time our application processes data, we will probably want to save the results in some format. To save them as a text file, we would simply use the write() method of our file handler to send data to our file just as we would use the print() function to send data to the console. 

In [49]:
with open('support/big_numbers.txt', 'w') as fileHandler: 
    for index, bigNum in enumerate(bigNumList): 
        fileHandler.write(str(bigNum) + "\n") 

In the code above, we create a file handler using the ‘w’ option which tells the operating system to open the file for the purpose of writing data to it. Be careful with the ‘w’ option because it will erase the content of the file if the file already exists. Otherwise, it will create the file. Also note the use of the str() method when writing to our file. Python can only write strings so any attempt to write an integer will result in a type conflict error. Finally, I append the newline character to the end of each line. This forces each number to a newline in our file. Without this character, all of the numbers would have appeared on the same line. 
# Exercise
Write code to scan random_numbers.txt for numbers divisible by 7 (number % 7 == 0 will be true if there is no remainder when dividing by 7) and add them to a list. 

In [None]:
# Step 1...

# Step 2...