Input & Output
In our common model of programming, we think of every task as consisting of three stages: input, analysis, and output. Input can range from simple command line queries to connections to real-time sensor data or Internet archives. This page covers the tools involved in acquiring and formatting data in order to be useful.
Streams (Basic I/O)
One of the most fundamental operations a programmer needs is the ability to ask the user for information. Python provides the
input function for just this purpose.
name = input( 'What is your name?' ) print( 'Your name is ' + name )
There really aren't any options to
input; just use it as necessary.
str implicitly before output).
# the following are equivalent print statements print( 'there are ' + 5.6 + ' apples' ) print( 'there are', 5.6, 'apples' )
Incidentally, programmers often refer to the command-line input as standard input (
stdin) and the
Although several technical options are available, only occasionally do we need to make use of them. The following may be useful:
# change the line ending from line break to nothing print( 'this text is on one line', end='' ) print( ' and this text is on the same line')
# change the text divider from space to hyphen print( 'normally', 'text', 'is', 'separated', 'by', 'spaces' ) print( 'but', 'you', 'can', 'change', 'that', sep='-' )
Files are stored on disk as a collection of bytes in order. Most of the time, the programs we use (like Paint or OS X Preview) take care of interpreting the data in a file in a way that makes sense for us to use. However, as coders, we often need to access data sources directly. The
open keyword returns a
file data type to us, which we can use to extract data we may need.
# assuming the file `words.txt` exists myfile = open( 'words.txt' ) mydata = myfile.read() myfile.close() print( mydata )
Two convenient methods are available to us when we
open a file:
readobtains the entire contents of the file as a single
string, including newlines (
readlinesobtains the entire contents of the file as a
readlinesis equivalent to executing
A file is always
opened as a
string (or a
strings), never as numbers. We will have to convert our
There are several technical notes which we must make as well:
Just as a file may be
opened, it must be
closed. Otherwise you risk corrupting the data file.
Several things can go wrong with
open: the file may not exist (check your spelling and the folder); or you may not have permission to see the file.
Once you have read data from a file using either
readlines, you have to
openthe file again to read data again:
# the wrong way myfile = open( 'words.txt' ) mydataasstring = myfile.read() mydataaslist = myfile.readlines() myfile.close()
After this code executes,
mydataaslistis an empty
None). There are two correct ways to do what the above code was (presumably) trying to accomplish:
# one right way myfile = open( 'words.txt' ) mydataasstring = myfile.read() myfile.close() myfile = open( 'words.txt' ) mydataaslist = myfile.readlines() myfile.close()
# another right way myfile = open( 'words.txt' ) mydataasstring = myfile.read() myfile.close() mydataaslist = mydataasstring.split('\n')
We will only use text-based files in CS 101. (The alternative is binary encoding, which makes data values look as they would in memory rather than as a string representation.)
Just as we can read from a file, we can also write to disk. This requires that we specify the
'w'rite mode when we invoke
# to open a file for writing data: myoutputfile = open( 'newfile.txt', 'w' ) myoutputfile.write( 'a message' ) myoutputfile.close() # doubly important here!
write functions much like
string format to disk.
close becomes more important here, as a file
opened for writing without being closed can lose all of its data when your program ends.
Sometimes it becomes necessary for us to specify the mode of access to a file—that is, if we are reading from or writing to the file. In these cases, we have to add a new argument to
# to open a file for writing data: myoutputfile = open( 'newfile.txt', 'w' ) myoutputfile.write( 'a message' ) myoutputfile.close()
Although many file modes are available, we will only introduce the three most useful:
||reading from file|
||writing to file|
||appending to file|
(We won't use
shutil in CS 101.)
Certain types of data tend to be stored in files of standard formats. One such format is the comma-separated value (CSV) file, which we may think of as like a spreadsheet with values in rows and columns. Rows are divided by line breaks, while columns are separated by commas. Values are stored as text. For instance, a CSV file may look like this:
Year,Make,Model,Price 2007,Chevrolet,Camaro,5000.00 2010,Ford,F150,8000.00 2011,Dodge,Grand Caravan,7500.00
We can read such a file in line-by-line, split each line at the commas, and process data directly from the string. However, a very convenient library called
csv provides a simpler interface to accessing data stored in CSV files.
The major function
csv provides to us is
DictReader, which gives us a way to access data values by treating column headers as
# assuming that we have a file autos.csv from csv import DictReader reader = DictReader( open( 'autos.csv' ) ) for row in reader: print( row[ 'Make' ], row[ 'Price' ] )
This prevents us from having to remember column offsets, particularly in large spreadsheets with many columns, as well as makes our code easier to read (and therefore to debug).
Although many data sets may be available directly on our hard drive as files, in other cases we will need to access online sources over the Internet. Python provides several standard libraries to facilitate networking, but none of these are as easy to use as
requests, you can normally sweep uncomfortable facts about network behavior, server responses, and other technical details under the rug. For instance, to access a web page such as github.com, you simply request the page contents directly:
import requests website = requests.get( 'https://www.github.com/' ) print( website.text )
As you can see from the code, the
text attribute of the data structure returned by
requests.get contains the contents of the website as plain text. If you run the code snippet, you'll see a huge string of HTML printed to the screen. Of course, websites are written in HTML so we shouldn't expect anything different when we request a website via Python.
The formatting of web pages can make it difficult to extract data cleanly, so we will stick to plain text data files that have been posted on the web. Try accessing a URL containing data from Lab #6:
import requests website = requests.get( 'https://raw.githubusercontent.com/UI-CS101/cs101-fa16/master/lab07/latin' ) print( website.text )
The string resulting from accessing that online resource is much cleaner, and you should see ways of parsing it into
float pairs immediately.
requests is not a standard Python module, so you'll need to install it first using the Python package manager,
conda depending on your installation). At the command line:
pip install requests
This package is already installed on any CS 101 system, however.
This code grabs the predicted high temperature for Champaign Willard Airport at 01h00 tomorrow from the U.S. National Oceanic and Atmospheric Administration forecast page. The code extracts the relevant value from a string of text by searching for the
KCMI station identifier and using a precalculated offset.
import requests website = requests.get( 'http://www.nws.noaa.gov/mdl/gfslamp/lavlamp.shtml' ) offset = website.text.find( 'KCMI' ) temperature_string = website.text[ offset+169:offset+172 ] temperature = float( temperature_string )