## 10.1. Introduction: Working with Data Files

So far, the data we have used in this book have all been either coded right into the program, or have been entered by the user. In real life data reside in files. For example the images we worked with in the image processing unit ultimately live in files on your hard drive. Web pages, and word processing documents, and music are other examples of data that live in files. In this short chapter we will introduce the Python concepts necessary to use data from files in our programs.

For our purposes, we will assume that our data files are text files–that is, files filled with characters. The Python programs that you write are stored as text files. We can create these files in any of a number of ways. For example, we could use a text editor to type in and save the data. We could also download the data from a website and then save it in a file. Regardless of how the file is created, Python will allow us to manipulate the contents.

In Python, we must open files before we can use them and close them when we are done with them. As you might expect, once a file is opened it becomes a Python object just like all other data. Table 1 shows the functions and methods that can be used to open and close files.

![image.png](attachment:image.png)



## 10.2. Reading a File

In [None]:
fileref = open("requirements.txt","r")
content = fileref.read()
print(content[:200])
fileref.close()

In [12]:
fileref = open("requirements.txt","r")
lines = fileref.readlines()
print(lines[:10])
fileref.close()

['# SPDX-FileCopyrightText: Copyright (C) Siemens AG 2023\n', '#\n', '# SPDX-License-Identifier: MIT\n', '\n', '# direct dependencies\n', 'simaticai==2.2.0\n', '\n', 'matplotlib==3.7.5\n', 'numpy==1.24.2\n', 'opencv-python-headless==4.9.0.80\n']


In [None]:
fileref = open("requirements.txt","r")
lines = fileref.readlines()
for lin in lines[:10]:
    print(lin.strip()) # strip - retira os espacos em branco
fileref.close()

In [None]:
# nessa versao é possivel iterar por todo o texto como acima, porém não é possivel definir uma faixa especifica
# é a forma mais comum de uso 

fileref = open("requirements.txt","r")
#lines = fileref.readlines()
for lin in fileref:
    print(lin.strip()) # strip - retira os espacos em branco
fileref.close()

In [None]:
# extrair o numero de linhas do arquivo

fileref = open("requirements.txt","r")
lines = fileref.readlines()
print(len(lines))
#for lin in lines[:10]:
#    print(lin.strip()) # strip - retira os espacos em branco
fileref.close()

81


In [None]:
# extrair o numero de carcateres do arquivo

fileref = open("requirements.txt","r")
contents = fileref.read()
print(len(contents))
#for lin in lines[:10]:
#    print(lin.strip()) # strip - retira os espacos em branco
fileref.close()

1497


## 10.3. Alternative File Reading Methods
Once you have a file “object”, the thing returned by the open function, Python provides three methods to read data from that object. The read() method returns the entire contents of the file as a single string (or just some characters if you provide a number as an input parameter. The readlines method returns the entire contents of the entire file as a list of strings, where each item in the list is one line of the file. The readline method reads one line from the file and returns it as a string. The strings returned by readlines or readline will contain the newline character at the end. Table 2 summarizes these methods and the following session shows them in action.

![image.png](attachment:image.png)

In this course, we will generally either iterate through the lines returned by readlines() with a for loop, or use read() to get all of the contents as a single string.

In other programming languages, where they don’t have the convenient for loop method of going through the lines of the file one by one, they use a different pattern which requires a different kind of loop, the while loop. Fortunately, you don’t need to learn this other pattern, and we will put off consideration of while loops until later in this course. We don’t need them for handling data from files.

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

## 10.4. Iterating over lines in a file

We will now use this file as input in a program that will do some data processing. In the program, we will examine each line of the file and print it with some additional text. Because readlines() returns a list of lines of text, we can use the for loop to iterate through each line of the file.

A line of a file is defined to be a sequence of characters up to and including a special character called the newline character. If you evaluate a string that contains a newline character you will see the character represented as \n. If you print a string that contains a newline you will not see the \n, you will just see its effects (a carriage return).

As the for loop iterates through each line of the file the loop variable will contain the current line of the file as a string of characters. The general pattern for processing each line of a text file is as follows:

![image.png](attachment:image.png)

To process all of our olympics data, we will use a for loop to iterate over the lines of the file. Using the split method, we can break each line into a list containing all the fields of interest about the athlete. We can then take the values corresponding to name, team and event to construct a simple sentence.

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

