Input & Output

Neal Davis edited this page Oct 3, 2016 · 5 revisions

In our common model of programming, we think of every task as consisting of three stages: input, analysis, and output. Input can range from simple command line queries to connections to real-time sensor data or Internet archives. This page covers the tools involved in acquiring and formatting data in order to be useful.

Streams (Basic I/O)

input

One of the most fundamental operations a programmer needs is the ability to ask the user for information. Python provides the input function for just this purpose.

name = input( 'What is your name?' )
print( 'Your name is ' + name )

There really aren't any options to input; just use it as necessary.

References

print

Similarly, the print statement (which we've already used a lot) formats strings for output (or converts other data types to str implicitly before output).

# the following are equivalent print statements
print( 'there are ' + 5.6 + ' apples' )
print( 'there are', 5.6, 'apples' )

Incidentally, programmers often refer to the command-line input as standard input (stdin) and the printed output as standard output (stdout).

Although several technical options are available, only occasionally do we need to make use of them. The following may be useful:

# change the line ending from line break to nothing
print( 'this text is on one line', end='' )
print( ' and this text is on the same line')
# change the text divider from space to hyphen
print( 'normally', 'text', 'is', 'separated', 'by', 'spaces' )
print( 'but', 'you', 'can', 'change', 'that', sep='-' )

References

See also

File I/O

file

Files are stored on disk as a collection of bytes in order. Most of the time, the programs we use (like Paint or OS X Preview) take care of interpreting the data in a file in a way that makes sense for us to use. However, as coders, we often need to access data sources directly. The open keyword returns a file data type to us, which we can use to extract data we may need.

# assuming the file `words.txt` exists
myfile = open( 'words.txt' )
mydata = myfile.read()
myfile.close()
print( mydata )

File input

Two convenient methods are available to us when we open a file:

  • read obtains the entire contents of the file as a single string, including newlines ('\n').
  • readlines obtains the entire contents of the file as a list of strings, where readlines is equivalent to executing read().split('\n').

A file is always opened as a string (or a list of strings), never as numbers. We will have to convert our strings into int or float explicitly.

There are several technical notes which we must make as well:

  • Just as a file may be opened, it must be closed. Otherwise you risk corrupting the data file.

  • Several things can go wrong with open: the file may not exist (check your spelling and the folder); or you may not have permission to see the file.

  • Once you have read data from a file using either read or readlines, you have to open the file again to read data again:

    # the wrong way
    myfile = open( 'words.txt' )
    mydataasstring = myfile.read()
    mydataaslist = myfile.readlines()
    myfile.close()
    

    After this code executes, mydataaslist is an empty list (not None). There are two correct ways to do what the above code was (presumably) trying to accomplish:

    # one right way
    myfile = open( 'words.txt' )
    mydataasstring = myfile.read()
    myfile.close()
    myfile = open( 'words.txt' )
    mydataaslist = myfile.readlines()
    myfile.close()
    
    # another right way
    myfile = open( 'words.txt' )
    mydataasstring = myfile.read()
    myfile.close()
    mydataaslist = mydataasstring.split('\n')
    

We will only use text-based files in CS 101. (The alternative is binary encoding, which makes data values look as they would in memory rather than as a string representation.)

File output

Just as we can read from a file, we can also write to disk. This requires that we specify the 'w'rite mode when we invoke open:

# to open a file for writing data:
myoutputfile = open( 'newfile.txt', 'w' )
myoutputfile.write( 'a message' )
myoutputfile.close()  # doubly important here!

write functions much like print to save our data in string format to disk.

close becomes more important here, as a file opened for writing without being closed can lose all of its data when your program ends.

File modes

Sometimes it becomes necessary for us to specify the mode of access to a file—that is, if we are reading from or writing to the file. In these cases, we have to add a new argument to open:

# to open a file for writing data:
myoutputfile = open( 'newfile.txt', 'w' )
myoutputfile.write( 'a message' )
myoutputfile.close()

Although many file modes are available, we will only introduce the three most useful:

Mode Application
'r' reading from file
'w' writing to file
'a' appending to file

References

See Also

(We won't use os or shutil in CS 101.)

csv

Certain types of data tend to be stored in files of standard formats. One such format is the comma-separated value (CSV) file, which we may think of as like a spreadsheet with values in rows and columns. Rows are divided by line breaks, while columns are separated by commas. Values are stored as text. For instance, a CSV file may look like this:

Year,Make,Model,Price
2007,Chevrolet,Camaro,5000.00
2010,Ford,F150,8000.00
2011,Dodge,Grand Caravan,7500.00

We can read such a file in line-by-line, split each line at the commas, and process data directly from the string. However, a very convenient library called csv provides a simpler interface to accessing data stored in CSV files.

The major function csv provides to us is DictReader, which gives us a way to access data values by treating column headers as dict keys:

# assuming that we have a file autos.csv
from csv import DictReader
reader = DictReader( open( 'autos.csv' ) )
for row in reader:
    print( row[ 'Make' ], row[ 'Price' ] )

This prevents us from having to remember column offsets, particularly in large spreadsheets with many columns, as well as makes our code easier to read (and therefore to debug).

References

Web data

requests

Although many data sets may be available directly on our hard drive as files, in other cases we will need to access online sources over the Internet. Python provides several standard libraries to facilitate networking, but none of these are as easy to use as requests. With requests, you can normally sweep uncomfortable facts about network behavior, server responses, and other technical details under the rug. For instance, to access a web page such as github.com, you simply request the page contents directly:

import requests
website = requests.get( 'https://www.github.com/' )
print( website.text )

As you can see from the code, the text attribute of the data structure returned by requests.get contains the contents of the website as plain text. If you run the code snippet, you'll see a huge string of HTML printed to the screen. Of course, websites are written in HTML so we shouldn't expect anything different when we request a website via Python.

The formatting of web pages can make it difficult to extract data cleanly, so we will stick to plain text data files that have been posted on the web. Try accessing a URL containing data from Lab #6:

import requests
website = requests.get( 'https://raw.githubusercontent.com/UI-CS101/cs101-fa16/master/lab07/latin' )
print( website.text )

The string resulting from accessing that online resource is much cleaner, and you should see ways of parsing it into str/float pairs immediately.

Installation

requests is not a standard Python module, so you'll need to install it first using the Python package manager, pip (possibly pip3 or conda depending on your installation). At the command line:

pip install requests

This package is already installed on any CS 101 system, however.

Examples

This code grabs the predicted high temperature for Champaign Willard Airport at 01h00 tomorrow from the U.S. National Oceanic and Atmospheric Administration forecast page. The code extracts the relevant value from a string of text by searching for the KCMI station identifier and using a precalculated offset.

import requests
website = requests.get( 'http://www.nws.noaa.gov/mdl/gfslamp/lavlamp.shtml' )
offset = website.text.find( 'KCMI' )
temperature_string = website.text[ offset+169:offset+172 ]
temperature = float( temperature_string )

References

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.