# MSDS 430 Module 4 Python Assignment 

<div class="alert alert-block alert-warning"><b>In this assignment you will read through the notebook and complete the exercises. Once you are satisfied with the results, submit your notebook, html file, and output.txt file to Canvas. Your files should include all output, i.e. run each cell and save your file before submitting.</b></div>

<div class="alert alert-block alert-danger"><b>Note:</b> You also must submit your <b>output.txt</b> file to Canvas for grading in addition to the usual notebook and html files.</div>

<div class="alert alert-block alert-info">One aspect of data science is working with data from files. In this assignment we will learn to read in a text file and extract some useful information. In the process we will be creating and manipulating Python lists. We will also see how data can be written to a new text file. Later in the course we'll learn more about how to display this information neatly and manipulate the data more efficiently, but for now we start by learning the basics of reading and writing text files.</div>

### Reading Files

You are given a file `pitching_stats.txt` that contains a `YTD` snapshot of `MLB` pitching stats for the top ten pitchers based on ERA around mid-season in 2018. Each row in the text file is a list of seven values (`name`, `team`, `games won`, `games lost`, `ERA`, `games pitched`, `innings pitched`) separated by spaces:

`Degrom NYM 7 7 1.81 24 159.0
Sale BOS 12 4 1.97 23 146.0
Snell TB 13 5 2.18 22 128.0
Scherzer WSH 15 5 2.19 25 168.2
Bauer CLE 12 6 2.22 25 166.0
Nola PHI 13 3 2.28 24 154.0
Verlander HOU 11 8 2.52 26 164.1
Kluber CLE 15 6 2.68 25 168.0
Cole HOU 10 5 2.75 24 153.2
Mikolas STL 12 3 2.85 24 151.1`


In Python, there is an `open` method that takes the name of a text file in the current directory (or more generally a path to a text file in any directory on your computer) and returns what is known as a `file object`. This file object can be used to read from existing text file, create and write to a new file or append text to a pre-existing file. See 

__[Opening Files in Python](https://docs.python.org/3/library/functions.html#open)__

For example, 
```python
fileName = open('my_file.txt',r)
```

would open a file with filename `my_file.txt` for reading (i.e. `mode = 'r'`) and returns a corresponding file object which is assigned to the variable `fileName`. 

If the file cannot be opened for some reason (e.g. if the file doesn't exist in the current directory), then an error is generated. More specifically, an `Exception` object is created and said to be "thrown". 

In [22]:
fileName = open('pitching_stats.txt','r')
#%pwd

The FileNotFoundError (Exception object) was generated (thrown) because there is no my_file.txt in the current directory. If you are unsure what the default working directory is on your computer, you can type the command `%pwd` in a code cell then run it.

If the file exists in another directory we could specify a path to the file instead, e.g. `'/Users/jsmith/data/my_file.txt'`.

Below we provide error-handling code with the `open` method that prevents the program from crashing if something goes wrong. But exceptions and error handling will be discussed in more detail in a future module.

First make sure that you save the provided file `pitching_stats.txt` in the same directory as this Python notebook.

<div class="alert alert-block alert-success"><b>Problem 1</b>: Complete the program in the cell below so that a user is prompted with <b>Enter the name of the file for reading:</b> when the cell is executed and will open the file if it is in the current directory. The file we're interesting in reading is <b><i>pitching_stats.txt</i></b>:</div>


<div class="alert alert-block alert-info">If you enter <b><i>pitching_stats.txt</i></b> (which you saved in the current directory), you should see the following:

`Enter the name of the file for reading: pitching_stats.txt
The file was opened succesfully!`

If you enter a different text file name, you will be told the file does not exist and you will be prompted again for a file name to enter:

`Enter the name of the file for reading: wrong_file.txt
Can not open the file [Errno 2] No such file or directory: 'wrong_file.txt'
Enter the name of the file for reading:`


The process is repeated until you enter the right filename.

After you finish the problem, continue to read the markdown cells below for further instructions.</div>

In [23]:
while True:
    
    # TO DO: 
    # Prompt the user to enter the name of a file and save the user's input to a variable
    nameofFile = input("Enter the name of the file for reading: ")
    
    try:
        # TO DO:
        # Open the file with name specified by the user using the open() method 
        # Make sure to save the file object to a variable.
        fileName = open(nameofFile,'r')
        
        print("The file was opened succesfully!")
        
        break
    # code to "catch" the exception
    except IOError as err:
        print('Can not open the file', err)

Enter the name of the file for reading: t
Can not open the file [Errno 2] No such file or directory: 't'
Enter the name of the file for reading: pitching_stats.txt
The file was opened succesfully!


### Displaying File Contents

Now that you have completed the first problem, you are going to learn how to use the file object to get the contents of the file. In `Problem 2` specific values from each row will be obtained in order to display a human readable summary of each pitcher's data. In `Problem 3`, on the other hand, we will focus on specific fields in order to obtain "columnized" output instead.

If `filename` is a file object corresponding to a text, you can iterate over the lines of text in the file as follows:
```python
for line in filename:
   # Do something with each line...for example we can print the line
   # print(line)
```
We will use the `split()` method (defined in the String class) to break up each line of the file (which is a String object) into a list of its seven string values (`name, team, games won, games lost, ERA, games pitched, innings pitched`). We will study the String class and more of its methods in detail in a later module. 

To use the split method we need to first have a String object. Below we create String object called `line`. Then we call the `split()` method on this object in this way:
```python
line.split()
```

Run the cell below to see what you get...

In [24]:
line = 'Degrom NYM 7 7 1.81 24 159.0'
lst = line.split()
print(lst)

['Degrom', 'NYM', '7', '7', '1.81', '24', '159.0']


Run the following three cells for some examples showing how to access elements of the list..

In [25]:
print('The first element of the list is',lst[0])

The first element of the list is Degrom


In [26]:
print('The fifth element of the list is',lst[4])

The fifth element of the list is 1.81


In [27]:
print('The last element of the list is',lst[-1])

The last element of the list is 159.0


<div class="alert alert-block alert-success"><b>Problem 2</b>: Iterate over lines in the file as demonstrated above and print the following for each line:</div>

`" < team > 'pitcher' < name > 'has an ERA of' < ERA >"`<br>

<div class="alert alert-block alert-info">For example, the first line printed should look like this: </div>

`NYM pitcher Degrom has an ERA of 1.81`

In [28]:
bbfile = open("pitching_stats.txt", "r")

for line in bbfile:
    
    # TO DO
    # Split the line into a list of strings..
    lst = line.split()
    
    # TO DO
    # Print the sentence...
    print(lst[1], "pitcher", lst[0], "has an ERA of", lst[4])

# We close the file
bbfile.close()

NYM pitcher Degrom has an ERA of 1.81
BOS pitcher Sale has an ERA of 1.97
TB pitcher Snell has an ERA of 2.18
WSH pitcher Scherzer has an ERA of 2.19
CLE pitcher Bauer has an ERA of 2.22
PHI pitcher Nola has an ERA of 2.28
HOU pitcher Verlander has an ERA of 2.52
CLE pitcher Kluber has an ERA of 2.68
HOU pitcher Cole has an ERA of 2.75
STL pitcher Mikolas has an ERA of 2.85


### Creating Lists

Our next objective is to create two lists from the data: (1) a list of all of the pitcher's names and (2) the corresponding list of the number of games each pitcher won. 

But first we open the `pitching_stats.txt` file for reading again. This time we read all the lines at once using the file object's `readlines` method. What do you get when you run the following method?

In [29]:
bbfile = open("pitching_stats.txt", "r")
lines = bbfile.readlines()
print(lines)

['Degrom NYM 7 7 1.81 24 159.0\n', 'Sale BOS 12 4 1.97 23 146.0\n', 'Snell TB 13 5 2.18 22 128.0\n', 'Scherzer WSH 15 5 2.19 25 168.2\n', 'Bauer CLE 12 6 2.22 25 166.0\n', 'Nola PHI 13 3 2.28 24 154.0\n', 'Verlander HOU 11 8 2.52 26 164.1\n', 'Kluber CLE 15 6 2.68 25 168.0\n', 'Cole HOU 10 5 2.75 24 153.2\n', 'Mikolas STL 12 3 2.85 24 151.1\n']


We iterate over `lines` in much the same way we iterated over (the file object) `bbfile`. But first let us give some examples of how the `append` list method can be used to "grow" a list from scratch. As usual, you want to make sure you are running each of cells in the notebook one at a time...

In [30]:
# start with an empty list
my_list = []
# say we have a value we would like to append to the list
name = "John Doe"
# add it to the list
my_list.append(name)
# here is value of a different type...
age = 25
# append that to my_list as well...
my_list.append(age)
# print the list
print(my_list)

['John Doe', 25]


<div class="alert alert-block alert-success"><b>Problem 3</b>: Complete the code in the cell below. The program starts with two empty lists: <b><i>names</i></b> and <b><i>games_won</i></b>. It should then iterate over the <b><i>lines</i></b> list, splitting each line in turn, and then obtaining both the name of the pitcher and the corresponding games he won and adding the values to the corresponding list.</div>

In [31]:
names = []
games_won = []

for line in lines:
    # TO DO
    # Append the name of each pitcher and the corresponding games won to the appropriate list.
    lst = line.split()
    names.append(lst[0])
    games_won.append(int(lst[2]))

Run the following two cells to check that `name` and `games_won` lists were constructed properly.

In [32]:
print(names)

['Degrom', 'Sale', 'Snell', 'Scherzer', 'Bauer', 'Nola', 'Verlander', 'Kluber', 'Cole', 'Mikolas']


In [33]:
print(games_won)

[7, 12, 13, 15, 12, 13, 11, 15, 10, 12]


### Working with Methods

Next we will introduce two list methods and ask you use them together in a program. First we have the `max` method to get the maximum value in a list.

In [34]:
my_list = [1,2,3,10,4,5,6]
max(my_list)

10

Second, we can get the "position" of any value in the list using the `index` method. Note that the first position has `index` **zero** and not **one**. So it would be more accurate to think of the `index` as the `offset` as opposed to the `position`. 

In [35]:
my_list.index(10)

3

Run the following cell to double check that the value in position (offset) 3 really is 10...

In [36]:
my_list[3]

10

<div class="alert alert-block alert-success"><b>Problem 4</b>: Complete the program in the cell below. We are defining a function <b><i>print_top_pitcher</i></b> that takes two list arguments: <b><i>names_list</i></b> and <b><i>games_won_list</i></b>. Assume both lists have the same length <b><i>n</i></b>.

Assume as well that <b><i>names_list</i></b>  is a (string) list of players' names and <b><i>games_won_list</i></b> the corresponding (integer) list of the number of games each player won. In other words, the player whose name is <b><i>names_list[i]</i></b> won <b><i>games_won_list[i]</i></b> games, where  <b><i>0 ≤ i ≤ n</i></b>. 

The function should find the player(s) who won the most games and print the name(s) of the player(s) together with the number of games won. </div>

<div class="alert alert-block alert-info">For example, <br>

`print_top_pitcher(['John','Max','Jill'], [10,12,9])` should print (something like)

**Max won the most games: 12**<br>

while `print_top_pitcher(names,games_won)` should print <br>

**Scherzer won the most games: 15**<br>
**Kluber won the most games: 15**</div>

In [37]:
def print_top_pitcher(names_list,games_won_list):
    # TO DO
    # Find the list of name(s) of the pitcher(s) who won the most games 
    # and the number of games he (they) won. Save these values to the variables
    # top_pitcher and max_games, respectively. The given print statement should display 
    
    # Scherzer won the most games: 15
    # Kluber won the most games: 15
    top_pitcher = names[games_won.index(max(games_won))]
    max_games = max(games_won)
    print(top_pitcher, "won the most games:", max_games)   

In [38]:
# Run this cell to test the print_top_pitcher method
print_top_pitcher(names,games_won)

Scherzer won the most games: 15


### Writing to a File

Finally, we will redo `Problem 2` so that the output is written to a file instead of printed in a Jupyter notebook cell. The following line creates an empty file in the current directory and opens it for writing (`mode='w'`):
```python
outfile = open("output.txt", "w")
```
Here `outfile` is a file object that can be used to write to the file:
```python
outfile.write("This is the first line written to the output file")
```

If we wish to append lines of text to an existing text file then we should open it for appending it instead (`mode='a'`):
```python
outfile = open("output.txt", "a")
```

<div class="alert alert-block alert-success"><b>Problem 5</b>: Repeat `Problem 2` but instead of printing the output to a Jupyter notebook cell write the output, line by line, to a file with filename `output.txt`. Also add code to count the number of lines in the file. Finally, open the file once again, this time to append a string specifying the number of pitchers in the file. A possible string that you can append:</div>

`There are 10 pitchers in the file.`

<div class="alert alert-block alert-warning">Again, don't write this string as given. You need to first find (using Python) the number of rows in the file and then use that information to construct the string that you will be appending to (end of) the file.</div>

In [39]:
# TO DO
# Open the file for writing.
outfile = open("output.txt", "w")
#print(len(lines))
count = 0
for line in lines:
    
    # TO DO
    # Split the line into a list of strings..
    lst = line.split()
    stats = lst[1] + ' ' + "pitcher" + ' ' + lst[0] + ' ' + "has an ERA of" + ' '+ lst[4]
    
    # TO DO
    # Write the line to a file...
    outfile.write(stats + '\n')
    count = 1 + count
total_lines = str(count)
outfile.write("There are" + ' ' + total_lines + ' ' + "lines." + '\n')
#TO DO:
# Close the file.
outfile.close()

#TO DO:
# Open the file for appending, construct the string you are going to write to the file
# and then write the line to the end of the file.
# Don't forget to close the file.

appendfile = open("output.txt", "r")
new_lines = 'NYM pitcher Degrom has an ERA of 1.81'
count2 = 0
for new_lines in appendfile:
    newlst = new_lines.split()
    if 'pitcher' in newlst:
        count2 = count2 +1
    else:
        break
count2 = str(count2)
appendfile.close()
appendfile = open("output.txt", "a")
appendfile.write("There are" + ' ' + count2 + ' ' + "pitchers in the file." + '\n')
appendfile.close()


In [41]:
#Bonus
statslookup = open("pitching_stats.txt", "r")
names_lower = []
team = []
games_lost = []
ERA = []
games_pitched = []
innings_pitched = []
for line in lines:
    lst = line.split()
    names_lower.append(lst[0].lower())
    team.append(lst[1])
    games_lost.append(int(lst[3]))
    ERA.append(float(lst[4]))
    games_pitched.append(int(lst[5]))
    innings_pitched.append(float(lst[6]))

pitcher_lookup = input("Enter the pitcher you want to lookup: ")
pitcher_lookup = pitcher_lookup.lower()
what_lookup =input("What do you want to lookup (team, games_won, games_lost, ERA, games_pitched, innings_pitched): ")

line_index = names_lower.index(pitcher_lookup)
if what_lookup == "team":
    print(team[line_index])
elif what_lookup == "games_won":
    print(games_won[line_index])
elif what_lookup == "games_lost":
    print(games_lost[line_index])
elif what_lookup == "ERA":
    print(ERA[line_index])
elif what_lookup == "games_pitched":
    print(games_pitched[line_index])
elif what_lookup == "innings_pitched":
    print(innings_pitched[line_index])
else:
    print("That is not available, please try again.")

statslookup.close()

Enter the pitcher you want to lookup: Bauer
What do you want to lookup (team, games_won, games_lost, ERA, games_pitched, innings_pitched): innings_pitched
166.0


### Viewing File Contents

Though you can easily view the contents of the output text file you created using your favorite text editor there are ways of doing this using Python. Here we are going to use the shell command appropriate to your operating system. Note to access the shell commands from within the Jupyter notebook we need to prefix them with the `!` character.

In [42]:
import platform

if (platform.system() == 'Windows'):
    !type output.txt
else:        
    !cat output.txt

NYM pitcher Degrom has an ERA of 1.81
BOS pitcher Sale has an ERA of 1.97
TB pitcher Snell has an ERA of 2.18
WSH pitcher Scherzer has an ERA of 2.19
CLE pitcher Bauer has an ERA of 2.22
PHI pitcher Nola has an ERA of 2.28
HOU pitcher Verlander has an ERA of 2.52
CLE pitcher Kluber has an ERA of 2.68
HOU pitcher Cole has an ERA of 2.75
STL pitcher Mikolas has an ERA of 2.85
There are 10 lines.
There are 10 pitchers in the file.
