** Parsing .txt Files **

This example introduces the Python concepts necessary to use data from .txt files. We can create these files in a number of ways. For example, we could use a text editor to type in and save the data. We could also download the data and then save it in a file. Regardless of how the file is created, Python will allow us to manipulate the contents.

** Finding a File On Your Disk **

Opening a file requires that you and Python agree about the location of the file on your disk. The way that files are located on disk is by their path. You can think of the filename as the short name for a file, and the path as the full name. For example, on a Mac if you save the file filename.txt in your home directory, then the path to that file is /Users/yourname/filename.txt. On a Windows machine, the path looks a bit different but the same principles are in use. For example, on Windows the path might be C:\Users\yourname\hello.txt.

You can access files in folders, also called directories, under your home directory by adding a slash and the name of the folder. For example, if you had a file called program.py in a folder called Spring2018 that was inside a folder called PythonProjects under your home directory, then the full name for program.py stored in your folder would be /Users/yourname/PythonProjects/Spring2018/program.py.

Here’s the important rule to remember: If your file and your Python program are in the same directory, you can simply use the filename to refer to your file. If your file and your Python program are in different directories then you should use the path to the file. For the sake of this example, we will assume that the .txt file is in the same directory as our Python program.

** Video Game Sales and Ratings **

As an example, suppose we have a file called Video_Game_Sales.txt that contains the following data for video game titles: Rank, Name, Platform, Year of Release, Genre, Publisher, North America Sales, Europe Sales, Japan Sales, Other Sales, and Global Sales. We will take a look at using this file to develop some insights and statistics.


In [6]:
# In Python, we must open files before we can use them and close them when we are finished with them. 
# Once a file is opened, it becomes a Python object just like all other data. 

# To open this file, we would call the open function. The variable, video_game_file, now holds a reference to the file 
# object returned by open. When we are finished with the file, we can close it by using the close method. 
# After the file is closed any further attempts to use fileref will result in an error.

video_game_file = open("Video_Game_Sales.txt", "r")

In [7]:
# We will now use this file as input in a program that will do some data processing. In the program, we will 
# read each line of the file and print it with some additional text. Because text files are sequences of lines 
# of text, we can use the for loop to iterate through each line of the file.

# A line of a file is defined to be a sequence of characters up to and including a special character called the 
# newline character. If you evaluate a string that contains a newline character you will see the character 
# represented as \n. If you print a string that contains a newline you will not see the \n, you will just see 
# its effects. When you are typing a Python program and you press the enter or return key on your keyboard, 
# the editor inserts a newline character into your text at that point.

# To process all of our video game data, we will use a for loop to iterate over the lines of the file. 
# Using the split method using the "\t" delimeter since items in the file are separated by tabs, we can break 
# each line into a list containing all the fields of interest about the game. 
# We can then take the values corresponding to the Name, Year, Publisher, Global Sales, and User Score to print 
# information about the game. Since this is a huge, file, we will only print information for the first 10 games.

for i in range(0, 10):
    line = video_game_file.readline()
    values = line.split('\t')
    print(values[1] + ", published by " + values[5] + " in " + values[3] + ", had $" + values[9] + " billion in global sales.")

Wii Sports, published by Nintendo in 2006, had $8.46 billion in global sales.
Super Mario Bros., published by Nintendo in 1985, had $0.77 billion in global sales.
Mario Kart Wii, published by Nintendo in 2008, had $3.31 billion in global sales.
Wii Sports Resort, published by Nintendo in 2009, had $2.96 billion in global sales.
Pokemon Red/Pokemon Blue, published by Nintendo in 1996, had $1 billion in global sales.
Tetris, published by Nintendo in 1989, had $0.58 billion in global sales.
New Super Mario Bros., published by Nintendo in 2006, had $2.9 billion in global sales.
Wii Play, published by Nintendo in 2006, had $2.85 billion in global sales.
New Super Mario Bros. Wii, published by Nintendo in 2009, had $2.26 billion in global sales.
Duck Hunt, published by Nintendo in 1984, had $0.47 billion in global sales.


In [8]:
# Once finished with a file, we must close it.
video_game_file.close()

In [9]:
# Now let's use the data for something a bit more useful. Say we want to know how much sales a particular publisher,
# Nintendo, has had across all of its games. We can figure this out by looping through the file and selecting only the
# values that correspond with that publisher. 

# Notice that the loop looks a bit different from above since we are simply going through every line in the file.
# We again split based on the '\t' delimeter, check to see whether the publisher field matches the string 'Nintendo', 
# and grab the sales field from our values list. Note that we must cast, or convert, the string representing sales
# to a float so that we can do anything useful with it (which in this case is adding it to our total).
video_game_file = open("Video_Game_Sales.txt", "r")

total_nintendo_sales = 0
for line in video_game_file:
    values = line.split('\t')
    if values[5] == 'Nintendo':
        game_sales = float(values[9])
        total_nintendo_sales = total_nintendo_sales + game_sales
        
video_game_file.close()
    
print("Nintendo has sold $" + str(total_nintendo_sales) + " billion dollars in games.")

Nintendo has sold $92.5800000000004 billion dollars in games.


In [10]:
# Finally, let's do something even more involved with the data. Say we want to find the most popular game in Europe
# that sold better than it did in the United States. Note that when we compare data in our values list, we must cast
# them so that we are comparing floats, not strings.
video_game_file = open("Video_Game_Sales.txt", "r")

europe_sales = 0
europe_title = ''
for line in video_game_file:
    values = line.split('\t')
    if float(values[7]) > float(values[6]): # if Europe_sales > America_sales
        if float(values[7]) > europe_sales:
            europe_sales = float(values[7])
            europe_title = values[1]
            #print(europe_title + "  " + values[6])
            
video_game_file.close()
            
print(europe_title)

Grand Theft Auto V
