# Functions, flow control, working with data files

In this part, we're going to introduce you to reusable functions, basic file operations and conditional statements (sort of like =IF() statements in Excel), plus a few other flow-control tools.

### `if` statements

Just like in Excel, you can use "if [condition], do [thing]" logic:

```python
if condition:
    # do stuff here if logical test passes
elif second_condition:
    # if the first test fails, test a second condition -- if true, do stuff here
    # (you can have multiple elif statements)
else:
    # the fallback if none of the conditions evaluate to True
```

The `condition` is any logical statement that returns a boolean value (True or False). For example:

```python
my_name = 'Cody'

if my_name == 'Cody':
    print('Hello, Cody!')
else:
    print('Hello, stranger!')
```

In this example, `my_name == 'Cody'` would return `True`, so the first branch would be executed.

Notice the double equals sign. Using one equals sign assigns a value to a variable; using two compares one thing to another. The double equals is [just one of a handful of logical operators you can use to compare one thing to another and return a boolean value](https://docs.python.org/3/library/stdtypes.html#comparisons).

You could, for instance, use the `in` operator to test if an item exists in a list:

```python
if 'Cody' in interns:
    print('Hooray, I\'m an intern!')
else:
    pass
```

This checks to see if any of the items in the `interns` list matches the string `'Cody'`. If there's a match, it prints something. If there isn't, it will [`pass`](https://docs.python.org/3/tutorial/controlflow.html#pass-statements) -- in other words, "do nothing."

One other thing to notice about this example: I'm using a backslash (`\`) to _escape_ the apostrophe in the word "I'm." Otherwise the print function would think that I was closing out the string I wanted to print, and it would throw an error.

Let's import `my_cool_vars` again **as** mcv. Now:

- **Assign our list of interns, `mcv.djnf_interns` to the variable `interns`.**
- **Assign the _number_ of interns in the list to a variable called `intern_count` (Hint: use the `len` function).**
- **Now write an if/elif/else test to print out a statement telling us whether the number of interns is greater or less than 10, or 10 exactly, or something else.**

In [1]:
import my_cool_vars as mcv

interns = mcv.djnf_interns
intern_count = len(interns)

if intern_count > 10:
    print('There are more than 10 interns')
elif intern_count == 10:
    print('There are exactly 10 interns')
elif intern_count < 10:
    print('There are fewer than 10 interns')
else:
    print('Something went wrong')

There are more than 10 interns


### `for` loops

Let's print out the names of all the interns in the list. Based on what we know so far, we'd have to do something like this:

```python
print(interns[0])
print(interns[1])
print(interns[2])
print(interns[3])

# etc.
```

Gross. Let's use a `for` loop instead.

A `for` loop goes through every item in a list, whether it's got 10 items or 10,000, and does whatever you tell it to do to each item. The basic syntax is:

```python
for variable_that_you_define in your_list:
    # do something to variable_that_you_define
```

`for` and `in` are special Python words. The `variable_that_you_define` is a variable that serves as your handle to each item in the list as the loop passes over it. The variable name can be anything -- `banana`, `freedom`, `crapulence`, whatever -- but it's good practice to use a variable name that describes what one individual list item is.

**Write a `for` loop to print out each intern in the list we have saved as the variable `interns`.**

In [2]:
for intern in interns:
    print(intern)

Iuliia
Jonathan
Sahil
Camille
Aidan
Alexander
Rilyn
Hayley
Kenneth
Olivia
Annie
William
Olivia
Shannon
David
Isha
Justina
Harry


### Iterating over dictionaries

You can also loop over dictionaries, but -- this is important! -- you can't guarantee that the order in which your loop grabs each item will be the same as the order in which you added things to the dictionary. (If you need to use a dictionary that preserves order, use an [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict) instead.)

When you loop over a dictionary, the variable you define in the first line of your `for` loop is a reference the item's _key_. To get the item's _value_, you would use bracket notation to tell the loop to look up that key in the dictionary. (To make this clear to myself, I usually call the variable "key.") For example:

```python
var my_cool_dict = {'a': 1, 'b': 2, 'c': 3}

for key in my_cool_dict:
    print(key, '=>', my_cool_dict[key])  # prints 'a => 1' on first loop iteration
```

**Write a `for` loop that prints out the keys and values of `mcv.djnf_dict`**

In [3]:
for key in mcv.djnf_dict:
    print(key, mcv.djnf_dict[key])

place Columbia, MO
event DJNF data training
students ['Iuliia', 'Jonathan', 'Sahil', 'Camille', 'Aidan', 'Alexander', 'Rilyn', 'Hayley', 'Kenneth', 'Olivia', 'Annie', 'William', 'Olivia', 'Shannon', 'David', 'Isha', 'Justina', 'Harry']


### Hollering 'Stop!' The `break` statement

To break out of a `for` loop, use a [`break` statement](https://docs.python.org/3/reference/simple_stmts.html#the-break-statement).

A common use case: You're looping over a big set of data, and once some condition is met -- a condition tested for with an `if` statement -- you want to stop the `for` loop instead of going all the way to the end.

### Enumeration

Sometimes you need to know _where_ you are in the list you're looping over -- the index of the item as well as its value. To do this, use the [`enumerate()`](https://docs.python.org/3/library/functions.html#enumerate) function. Example:

```python
for idx, name in enumerate(interns):
    print(idx, name)
    # first iteration prints '0 Iuliia'
```

**Using a combination of `break` and `enumerate`, print the first five names in the `interns` list. (Don't forget that Python lists are zero-indexed.)**

In [6]:
for idx, name in enumerate(interns):
    if idx > 4:
        break
    print(name)

Iuliia
Jonathan
Sahil
Camille
Aidan


### Significant whitespace is significant

One of the fun (?!) things about Python is: whitespace matters, and your scripts will break if you don't follow the rules. You'll notice in our if/else and for loops that some lines are indented. Knowing when to indent -- and how much -- takes some practice.

Without getting too deep into the weeds here, you need to make sure that your indentations are consistent -- [you can use tabs or spaces, just be consistent](https://www.youtube.com/watch?v=SsoOG6ZeyUI&feature=youtu.be&t=50s). The [~official~ Python style guide](https://www.python.org/dev/peps/pep-0008/) recommends using four spaces, so that's what I go with, but you do whatever makes you happy.

### Functions

If you find yourself repeating the same set of operations at different points in your code, it's probably time to write a function. We've already used some built-in functions like `print()`. But you can also write your own, and doing so can help you keep your code tidy.

You could write a function that takes a number and returns that number minus 10:

```python
def minusTen(num):
    return num - 10
```

Notice a couple things here:

* Functions start with `def`.
* Functions have a name (ours is called "minusTen")
* If you need to pass one or more arguments to the function, you define them inside parentheses ("num" is just a variable name I made up -- it could be anything)
* The special Python word "return," which tells the function to return something to the script that called it once it has fulfilled its function

You would call the function like this:

```python
my_number = minusTen(100)
print(my_number)
# prints 90
```

In this case we're passing the number 100 to the function. The function's job is to subtract 10 from whatever number you hand it and return it back to the script.

You're probably wondering: what happens if you hand the function something that's not a number? Great question! You'd get an error -- an "exception," as they're called in Python.

What happens if you call the function without providing an argument: `minusTen()`? Yup, Error City, population you. If this were a real function, you'd want to [build in some error handling](https://docs.python.org/3/tutorial/errors.html) to make sure that the function has in fact received and argument and that it's a number.

You could also set your function up to have a default value for the variable:

```python
def minusTen(num=100):
    return num - 10
```

Now, if you call the function without specifying an argument, it would return `90`.

**Write a function called `formatInternName` that takes a string, strips whitespace, makes it uppercase and adds ' IS A DOW JONES INTERN' before returning it. (Remember, you can chain string functions together.)**

In [7]:
def formatInternName(name):
    return name.upper() + ' IS A DOW JONES INTERN'

Now let's apply our new `formatInternName` function inside a `for` loop.

**Loop over the names in our `djnf_interns` list, applying the `formatInternName` function to each item and printing the result.**

In [8]:
for intern in interns:
    print(formatInternName(intern))

IULIIA IS A DOW JONES INTERN
JONATHAN IS A DOW JONES INTERN
SAHIL IS A DOW JONES INTERN
CAMILLE IS A DOW JONES INTERN
AIDAN IS A DOW JONES INTERN
ALEXANDER IS A DOW JONES INTERN
RILYN IS A DOW JONES INTERN
HAYLEY IS A DOW JONES INTERN
KENNETH IS A DOW JONES INTERN
OLIVIA IS A DOW JONES INTERN
ANNIE IS A DOW JONES INTERN
WILLIAM IS A DOW JONES INTERN
OLIVIA IS A DOW JONES INTERN
SHANNON IS A DOW JONES INTERN
DAVID IS A DOW JONES INTERN
ISHA IS A DOW JONES INTERN
JUSTINA IS A DOW JONES INTERN
HARRY IS A DOW JONES INTERN


### Reading data files

Let's apply these concepts to a real-world use case: Working with data files. We'll start with the Major League Baseball opening day roster you've already seen in the Excel and SQL lessons.

You use the `open()` function to open a file. It's _usually_ a good idea to open a data file using a [`with`](https://docs.python.org/3/reference/compound_stmts.html#with) block, because then you don't have to worry about closing it. Here's an example:

```python
import csv

with open('path/to/your/file.csv', 'r') as data_file:
    # do something with your file, which you have access to via the `data_file` variable you just defined
```

The `open()` function needs at least two arguments: The file you want to open and what mode you want to open it in; `r` means "read," but [there are other modes you can open a file in](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). Later on we're going to use the `w` mode to _write_ to a data file.

This is a comma-delimited file, so we'll need to import Python's built-in [`csv`](https://docs.python.org/3/library/csv.html) module. Don't let the name fool you -- Python can handle pretty much any kind of delimited text.

The MLB data file lives in the `data` directory, which is on the same level as the directory containing this notebook. Two dots mean we're going one level up, then into the data directory: `../data/mlb.csv`.

We're going to look at two different ways of using the `csv` module to read a data file:

* Treating each row of data as a list, using [`csv.reader()`](https://docs.python.org/3/library/csv.html#csv.reader), and
* Treating each row of data as a dictionary, using [`csv.DictReader()`](https://docs.python.org/3/library/csv.html#csv.DictReader)

Let's start with rows as lists. Before we start the loop, we're going to take one extra step here, calling the reader object's `next()` function to skip the header row:

```python

import csv

with open('path/to/your/file.csv', 'r') as data_file:
    reader = csv.reader(data_file)
    next(reader)
    
    for row in reader:
        # do stuff here
```

- **Import the csv module.**
- **Open the data file that lives at `../data/mlb.csv` in a `with` block `as` a variable that you define.**
- **In the indented block, create a new variable, `reader`, and set the value as `csv.reader(your_file_variable)`. If you get stuck, [check out the example here](https://docs.python.org/3/library/csv.html#csv.reader).**
- **Write a `for` loop to iterate over the rows in your `reader` variable. (Each row will be a list.)**
- **Print only the player's name, which will be the first item in the list (because it's the first column of data).**

In [12]:
import csv

with open('../data/mlb.csv', 'r') as data_file:
    reader = csv.reader(data_file)
    
    # skip header row
    next(reader)
    
    for row in reader:
        print(row[0])

Zack Greinke
Yasmany Tomas
Tyler Clippard
Paul Goldschmidt
Brad Ziegler
Shelby Miller
Welington Castillo
A.J. Pollock
Daniel Hudson
Jean Segura
Patrick Corbin
Rubby De La Rosa
Josh Collmenter
Rickie Weeks
Randall Delgado
Chris Owings
David Peralta
Nick Ahmed
Robbie Ray
Jake Lamb
Andrew Chafin
Chris Herrmann
Philip Gosselin
Silvino Bracho
Brandon Drury
Socrates Brito
Jake Barrett
Freddie Freeman
Nick Markakis
Erick Aybar
Hector Olivera
Jason Grilli
Julio Teheran
A.J. Pierzynski
Bud Norris
Jim Johnson
Tyler Flowers
Alexi Ogando
Kelly Johnson
Eric O'Flaherty
Gordon Beckham
Jeff Francoeur
Arodys Vizcaino
Ender Inciarte
Jace Peterson
Paco Rodriguez
Andrew McKirahan
Williams Perez
Adonis Garcia
Dan Winkler
Shae Simmons
Drew Stubbs
Jesse Biddle
John Gant
Jose A. Ramirez
Manny Banuelos
Matt Wisler
Chris Davis
Adam Jones
Matt Wieters
Ubaldo Jimenez
J.J. Hardy
Mark Trumbo
Yovani Gallardo
Zach Britton
Chris Tillman
Darren O'Day
Pedro Alvarez
Manny Machado
Brian Matusz
Hyun-soo Kim
Vance Worley
Dy

Now let's do the same thing but use a `DictReader` object instead. The process will be the same, but each row will be a dictionary, not a list. So instead of accessing items by position, you're going to access values by key -- in this case, the keys are the names of the field from the header row. The field name we're targeting is `'NAME'`.

(You don't need to import the `csv` module again.)

In [15]:
with open('../data/mlb.csv', 'r') as data_file:
    reader = csv.DictReader(data_file)
    
    for row in reader:
        print(row['NAME'])

Zack Greinke
Yasmany Tomas
Tyler Clippard
Paul Goldschmidt
Brad Ziegler
Shelby Miller
Welington Castillo
A.J. Pollock
Daniel Hudson
Jean Segura
Patrick Corbin
Rubby De La Rosa
Josh Collmenter
Rickie Weeks
Randall Delgado
Chris Owings
David Peralta
Nick Ahmed
Robbie Ray
Jake Lamb
Andrew Chafin
Chris Herrmann
Philip Gosselin
Silvino Bracho
Brandon Drury
Socrates Brito
Jake Barrett
Freddie Freeman
Nick Markakis
Erick Aybar
Hector Olivera
Jason Grilli
Julio Teheran
A.J. Pierzynski
Bud Norris
Jim Johnson
Tyler Flowers
Alexi Ogando
Kelly Johnson
Eric O'Flaherty
Gordon Beckham
Jeff Francoeur
Arodys Vizcaino
Ender Inciarte
Jace Peterson
Paco Rodriguez
Andrew McKirahan
Williams Perez
Adonis Garcia
Dan Winkler
Shae Simmons
Drew Stubbs
Jesse Biddle
John Gant
Jose A. Ramirez
Manny Banuelos
Matt Wisler
Chris Davis
Adam Jones
Matt Wieters
Ubaldo Jimenez
J.J. Hardy
Mark Trumbo
Yovani Gallardo
Zach Britton
Chris Tillman
Darren O'Day
Pedro Alvarez
Manny Machado
Brian Matusz
Hyun-soo Kim
Vance Worley
Dy

### Writing to a CSV file

Now we're going to open a file and write some data into it. Same idea -- we can treat each row of data as a list, with [`csv.writer`](https://docs.python.org/3/library/csv.html#csv.writer), or as a dictionaries with [`csv.dictWriter`](https://docs.python.org/3/library/csv.html#csv.DictReader). Let's start with lists.

To write to the file, we're going to open a file in `'w'` mode. The `csv.writer`'s `writerow()` method expects a list. Here's an example:

```python
import csv

with open('mlb_writer.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    headers = ['Name', 'Position', 'Team', 'Salary']
    writer.writerow(headers)
    writer.writerow(['Bob Johnson', 'P', 'Arizona Diamondbacks', 1900000.0])
```

**Use the `csv.writer()` object to write the following lines to a data file called 'intern.csv':**
- **A header row: Name, Age, News Organization**
- **A row of data with your name, your age, and the organization you're going to be interning at**

(You won't need to import `csv` again.)

**Verify that the file was created and has data in it.**

In [16]:
with open('intern.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    headers = ['Name', 'Age', 'News Organization']
    writer.writerow(headers)
    writer.writerow(['Cody Winchester', 31, 'IRE/NICAR'])

Now let's do the same thing but use a `csv.DictWriter()` object instead. The way you set up the file is a little different -- you have to define the header rows when you create the `DictWriter()` object by passing a list to the `fieldnames` argument.

Then you write the headers to the file using the `writeheader()` method.

And instead of passing a list to the `writerow()` method, you hand it a dictionary with _the same keys you defined in `fieldnames`._ That's important -- you'll get an error if your dictionary keys don't match what's in `fieldnames`.

Here's an example:

```python
import csv

with open('mlb_dictwriter.csv', 'w') as outfile:
    headers = ['Name', 'Position', 'Team', 'Salary']
    writer = csv.DictWriter(outfile, fieldnames=headers)
    writer.writeheader()
    writer.writerow({'Name': 'Bob Johnson',
                     'Position': 'P',
                     'Team': 'Arizona Diamondbacks',
                     'Salary': 1900000.0})
```

**Use the `csv.DictWriter()` object to write the following lines to a data file called 'intern2.csv':**
- **A header row: Name, Age, Organization**
- **A row of data with your name, your age, and the news organization you're going to be interning at**

**Verify that the file was created and has data in it.**

In [17]:
with open('intern2.csv', 'w') as outfile:
    headers = ['Name', 'Age', 'Organization']
    writer = csv.DictWriter(outfile, fieldnames=headers)
    writer.writeheader()
    writer.writerow({'Name': 'Cody Winchester', 'Age': 31, 'Organization': 'IRE/NICAR'})