# Python Version Checking

In [None]:
import sys
print("Python Version:", sys.version, '\n')

# Python Data Types Review

Before we start into "how we want to write Python code" let's make sure we've reviewed some of the basics of Python. In particular we'll focus on data types and "specific to python" tools in our toolbox.

### "Normal Data Types"

In [None]:
print("Integer (int): ", type(1))
print("Float (float): ", type(1.5))
print("String (str): ", type("test"))
print("Boolean (bool): ", type(True))

These are the basic types of data that we'll mostly work with in Python. 

* Integers are whole numbers. 
* Floats are all other numbers. 
* Strings are text data (and can contain numbers, but aren't "actually numbers"). 
* Booleans are a true/false value. 

### Tuples and Python Selection/Slicing Rules

A tuple is essentially a collection of other data. Let's see an example (note that tuples are designated with parenthesis):

In [None]:
example_tuple = (1, 2, "bob", 19, "eleventeen", 37)
print(example_tuple)

Now that we have this collection, what if we want to grab an individual piece out of it?

In [None]:
print(example_tuple[0]) # Remember, Python starts at 0 index
print(example_tuple[2])

What if we want more than one element? Let's slice this up.

Python Slicing: 

```python
# short version
name_of_object_to_slice[start:stop:step] 

# longer version
name_of_object_to_slice[start_index : up_to_but_not_this_index : how many steps between returns]
```

In [None]:
# Note that we only get 3 total things, indexes 1, 2, 3
print(example_tuple[1:4]) 

In [None]:
# Let's go beginning to end, but skip by 2's 
# (if start is empty, assumes 0. If end is empty, assumes last value)
print(example_tuple[::2])

In [None]:
# Grabbing the last 2 digits
print(example_tuple[-2:])
print(example_tuple[-1])

What if we want to change one of the elements of a tuple?

In [None]:
example_tuple[1] = 7

Nope. Tuples don't do that. This is introduces us to the idea of **immutability**. Tuples don't allow the pieces they own to be changed any more. That's both a blessing and a curse:

* Blessing: If I have correlated data that I don't want a user to be able to break, I can put it in a tuple and it's safe from them breaking the correlation.
* Curse: I can't change my own data, and I want to. Can't add data to the tuple after the fact.

Tuples are often a great choice for working with data. However, they aren't as flexible as we might need. When we run into this case... _Enter Lists_

### Lists

Lists are also a combination of other objects, but now we're allowed to change it on the fly. Lists are super flexible, however they shouldn't be your only data type. Let's see what they do, then we'll talk about the downsides. Note, lists are designated with square brackets.

In [None]:
example_list = [1,2,3,4,5,6,7,8]
print(example_list)

In [None]:
print(example_list[0:4])

In [None]:
example_list[0] = 'DIFFERENT NOW'
print(example_list)

Lists are a lot like tuples, except we're allowed to change things. Lists are **mutable**.

It's tempting to just always use lists for everything. Please don't. Lists are a great catch-all, but whenever you're loading a row of data or just need to put the data somewhere and then use it again later, tuples are a better choice because they are immutable and you can't really "break" the data as easily.

You can map between lists and tuples easily by just calling the constructor like so.

In [None]:
new_tuple = tuple(example_list)
print(new_tuple)
print(type(new_tuple))

In [None]:
new_list = list(example_tuple)
print(new_list)
print(type(new_list))

We also want to be able add and subtract data pieces from our list.

In [None]:
# Let's add some data to the list
new_list.append("NEW DATA")
print(new_list)

In [None]:
# Remove data piece
print(new_list.pop())
print(new_list)

### Sets

Sets are another type of collection, but one that allows us to do a lot of extra operations at the expense of not allowing duplication. Let's take a look. Note that sets are denoted by curly brackets.

In [None]:
list_to_be_made_a_set = [1,1,1,1,5,2,6,8,0]
set_from_list = set(list_to_be_made_a_set)
print(set_from_list)

In [None]:
set_A = {0,1,2,3,4}
set_B = {2,3,4,5,6,7}

print("Union: ", set_A.union(set_B))
print("Intersect: ", set_A.intersection(set_B))

In [None]:
# We can't adjust sets or look up a single element
set_A[3]

In [None]:
set_A[3] = 7

### Dictionaries

The final major data type to consider is the dictionary. It also allows us to have "groups" of things, but now instead of working by index - we're going to use key-value pairs. This has some major benefits, but let's see how it works before discussing. Dictionaries are _also_ denoted with curly braces, but we have to provide keys AND values to make it not a set.

In [None]:
new_dictionary = {'key_for_this_thing': 3}
print(new_dictionary['key_for_this_thing'])

Dictionaries can store more than just integers though. Let's look.

In [None]:
new_dictionary = {'key_for_a_list': [0,1,2,3,4], 'key_for_another_dictionary': {'bob': 1}}
print(new_dictionary['key_for_a_list'])

In [None]:
print(new_dictionary['key_for_another_dictionary'])
print(new_dictionary['key_for_another_dictionary']['bob'])

In [None]:
# Example with some baseball stats!
# Let's assume a format like [Team, Games, Plate Apperance, Home Runs]
career_stats = {'babe_ruth': {1914: ["Red Sox", 5, 10, 0], 1915:['Red Sox', 43, 104, 4]},
                'gavvy_cravath': {1914: ['Phillies',149,604,14]}} # Yes, that's a real baseball player's name
print(career_stats['babe_ruth'])
print(career_stats['babe_ruth'][1914])

Note that I was able to use an integer as a key. Any object that's immutable can be used as a key. So that means we can't use lists as keys, but we can use tuples. Let's try to add some things to our dictionary.

In [None]:
# New entry using a tuple
career_stats[('a','new','player')] = ['Reds',12, 34, 8]

for key, val in career_stats.items(): #items allows us to access each key-value pair
    print(key, val)

In [None]:
# We can also access just the keys or just the values with the proper methods
print(career_stats.keys())
print(career_stats.values())

In [None]:
# New entry with a list
career_stats[['a','new','player']] = ['Reds',12, 34, 8]

So dictionaries allow us to assign a key to each value we want to contain. The keys must be immutable to be `hashable`, which is why it's complaining about the list. 

What's a `hash` though? The reason we like dictionaries is that they are really fast at retrieving data. A hash essentially says, "I'm going to store this data about `babe_ruth` (the key given) in memory over at address XYZ. So when someone asks for the key `babe_ruth` I'll just know to go to XYZ right away. 

### Arrays

The final Python datatype that is super common in data science is the array. It's like a list, but it only accepts one datatype. So it will be all integers or all floats. It's the backbone of Pandas, SkLearn, and many other programs. It's so important, we'll have a stand alone discussion of arrays once we finish this notebook. The module `numpy` is what we will use to get and manage arrays.

# Following the Pythonic Coding Style

Python has a set of rules that determine "how we should write Python" code. It's called [PEP8](https://www.python.org/dev/peps/pep-0008/). For the next section, we're going to go through some of the more important parts of PEP 8 that may be counter-intuitive when new to programming. We'll also highlights some of the special Python-specific types of code are. Note this is not a total demonstration of all of PEP8, we're just focusing on some specific aspects.

To start, let's look at `import this` which is a small easter egg built into Python. It's meant to be a high-level guide to how we should think about Python coding. It is not binding, nor does everyone agree with all the statements, but it's a cute way to center yourself before programming in Python.

In [None]:
import this

If you remember nothing else, remember these points about coding in Python:

>Code is for people. 95% of the time it's not running. Write your code so that it can be read. Executing is 'secondary'. Your future self will thank you while debugging. A super clever solution that is only one line long, but no one can quickly debug is worse than a five line solution that everyone understands.

> **The correct order of code development (in a nutshell)**
> * Make it work
> * Make it right
> * Make it readable
> * Make it fast (you rarely get to do this part)

> "Premature optimization of code is the root of all evil."
> - Donald Knuth (Famous Computer Scientist)

## Variable Names

A few quick rules on variable names. In general, I try to follow the rule, "would an innebriated version of myself be able to figure out what this variable does from the name? If not, change it."

More specifically:
    
* Your variables should be named sensible things. `x` is not a sensible thing.
* Long-winded variable names are preferable to vague names. Shorter, but still specific names are even better.
* Use hyphens to stand in for spaces in names (Python convention)
* No capitalization, typically

Here are some example variable names:

`customer_order_value`, `ordered_document_list`, `mlb_batting_statistics_1887_1990`

Even without context, I know what these variable contain. That's the type of response we want.

## Functions - How we don't repeat ourselves


If you find yourself copy-and-pasting code, don't do that. 

We should never copy-paste code. The solution is to build functions. Let's imagine I'm processing a bunch of rows of data where I want to pull out a user name from the sales record and keep track of how many times I've seen a user in that table. Like so:

In [None]:
table1 = [['Sam', 36, 85.95],
 ['Carol', 75, 53.65],
 ['Sam', 90, 95.37],
 ['Doug', 61, 19.8],
 ['Sam', 41, 45.22],
 ['Doug', 29, 42.98],
 ['Oliver', 61, 95.74],
 ['Carol', 32, 17.12],
 ['Tari', 27, 68.83],
 ['Tari', 81, 62.47]]

name_counter = {}
for row in table1:
    name = row[0]
    if name in name_counter.keys():
        name_counter[name] += 1
    else:
        name_counter[name] = 1
print(name_counter)

But what if I get a new table and want to do the same thing? I have two options, I can copy that code above, or I could just make it into a function. Let's see how I would do the latter.

In [None]:
def count_names_in_tables(table_to_count):
    """
    Takes in a table of data, where the first (0th) column
    is the name of the user, and counts how many time each
    user appears.
    ---
    Inputs: table (list of lists or array)
    Outputs: Dictionary (keys are names, values are counts)
    """
    name_counter = {}
    for row in table_to_count:
        name = row[0]
        if name in name_counter.keys():
            name_counter[name] += 1
        else:
            name_counter[name] = 1
    return name_counter

print(count_names_in_tables(table1))

In [None]:
table2 = [['Oliver', 12, 49.95],
 ['Tari', 76, 30.71],
 ['Carol', 98, 25.07],
 ['Carol', 24, 11.85],
 ['Carol', 34, 13.36],
 ['Kelly', 14, 34.31],
 ['Tari', 6, 86.11],
 ['Tari', 90, 29.08],
 ['Carol', 55, 45.61],
 ['Sam', 88, 97.47]]

print(count_names_in_tables(table2))

Let's note a few things about my function:

* This didn't hard code any variables. We want our functions to be flexible.
* This isn't something that I would only call one time - we don't want our functions to be 'python scripts' that are pretending to be functions.
* This does one thing. At the end of the day, every function should accomplish one goal. That might mean we need to call a function within a function to make sure each function is only acheiving one goal at a time.

So for example, you might have a function structure like this:

```python
def clean_person_name(name):
    """
    Cleans the user name and returns it back
    """
    name = name.lower()
    name = name.replace("'","") # replace apostrophes with nothing for simplicity!
    return name

def process_row(row):
    """
    Takes a row of data and cleans all of the columns
    before sending it back to the main processing loop
    """
    name = clean_person_name(row[0])
    product_name = lookup_product_name_from_id(row[1])
    converted_price = convert_pounds_to_dollars(row[2])
    return (name, product_name, converted_price)
```

In this example, process_row only really accomplishes one task - processing each row. However, it needed some help in cleaning the persons name, so we outsourced that to the other function of `clean_person_name` which only has one job - cleaning the person's name.

Those functions allow us to perform repeated actions efficiently, with minimal code. They also give us a few other benefits:

* If we need to upgrade our process, we just have to change the function in one place and it will propagate to the rest of the code.
* When we get errors, Python tracks what function broke and where, so we'll be able to quickly track down errors.
* If we're clever with args and kwargs, we should be able to build very flexible solutions to many problems.
    

#### An exercise (finally!)

Write a function that takes two strings. Return a boolean that tells us whether the two strings are anagrams of one another (I can use all the characters from one to make the other). You can then use the exercise tester to check if your code is working!

When you write the function, make sure it follows all the standards we've talked about above.

In [None]:
# write your function here
def anagram_checker():
    pass

In [None]:
# DON'T CHANGE THIS CELL! IT'S JUST HERE TO TEST YOUR CODE

def exercise_tester():
    """
    This function will test your anagram_checker function
    by putting in some values and making sure you get the
    correct answer!
    
    If you've written your code well, you should see 
    "you passed all tests!" as your output of this cell.
    """
    
    assert anagram_checker('cat','tac') == True, "Failed on First Test!"
    assert anagram_checker('taCo','taco') == False, "Failed on Second Test!"
    assert anagram_checker('realm','realm ') == False, "Failed on Third Test!"
    assert anagram_checker('rail safety','fairy tales') == True, "Failed on Fourth Test!"
    assert anagram_checker('steve','bob') == False, "Failed on Fifth Test!"
    return "You passed all tests!"

exercise_tester()

## Document, but document well

We should always heavily comment our code. However, there are bad and good ways to comment code. Let's see some examples.

Bad:

In [None]:
# loop over s_list
for x in s_list:
    x.update() # update each thing

This is just not something I'll be able to understand in 6 months. What is X? What is s_list? My comments tell me I'm updating something, but not how, what, or why. Let's see a better version.

Better:

In [None]:
# Loop through all the songs in the song list to clean the titles for 
# storage in the database.
for song in song_list:
    song.update_title() 

### Docstrings are even nicer

Docstrings are triple-quoted strings places after a def or class that describes the functionality of that thing.

Many tools expect and use this feature. e.g. Jupyter notebook

In [None]:
def multiply_by_two(x):
    """
    This function multiplys the input by 2
    ---
    Input: Numerical
    Output: Numerical
    """
    return 2*x

In [None]:
multiply_by_two.__doc__

In [None]:
# Put your cursor in between the parens and hit shift-tab 4-times
# You should see that Jupyter automatically tracks the docstring as
# documentation! There are many programs to auto-complete documentation
# websites from docstrings.
multiply_by_two()

## How to check for truth in Python

In many languages, you need to explicitly check for equivalence. So something like:

```python
if (amount_spent == 0):
    do_something_with_amount_spent(amount_spent)
else:
    yada yada yada
```

However, in Python that's considered bad form. Let's see how we can handle things in a nicer, more readable way.

In [None]:
test_value = 0

if test_value:
    print("In the loop")

In [None]:
test_value = 1

if test_value:
    print("In the loop")

In [None]:
def evaluate_as_bool(value):
    """
    Asks python what the boolean representation
    is for the provided value
    """
    print("{} as bool: ".format(value), bool(value))


evaluate_as_bool(1)
evaluate_as_bool(0)
evaluate_as_bool(0.00001)
evaluate_as_bool(-50000)

So python treats any non-zero number as a `True`, and exactly 0 as a `False`. What about other data structures?

In [None]:
evaluate_as_bool([])
evaluate_as_bool(["test"])
evaluate_as_bool(["test", 2, 5])
evaluate_as_bool(("test", 2, 5))
evaluate_as_bool(())
evaluate_as_bool({})
evaluate_as_bool({'test'})

Python also treats empty lists/tuples/sets as `False` and non-empty lists as `True`. Using this methodology is much preferable to doing something like:

In [None]:
a_list = [1,2,3,4]

if len(a_list) > 0:
    print("This is a terrible way to write Python")
    
if a_list:
    print("Yeah, this is much nicer.")

What about strings?

Given what we've learned so far, what do we think these three lines will evaluate to?

In [None]:
evaluate_as_bool('False')
evaluate_as_bool('')
evaluate_as_bool(' ') # There's a space here!

Same idea as lists. Is it an empty string? `False` otherwise, `True`.

## Loops are vital and should be cleanly written

`for` loops are great for almost any iterable. Let's see how we SHOULD write them. 

Good:

In [None]:
for row in table1:
    print(row)

Bad:

In [None]:
for i in range(len(table1)):
    print(table1[i])

In the bad example above, I can't immediately tell what you're doing. It's opaque and hard to read. That's bad form. If you do need to get the index, use `enumerate`

In [None]:
for ix, row in enumerate(table1):
    print(ix, row, table1[ix])

### List Comprehensions

When you can, you should use list comprehensions. These are specific to Python, and are great for making your code small and efficient. Let's check them out. 

In [None]:
a_list_of_numbers = [1,2,3,4,5,7]

new_list_of_numbers = [x + 1 for x in a_list_of_numbers if x > 2]
print(new_list_of_numbers)

Unless you have a specific reason not to use a `for` loop, you shouldn't be using `while` loops. A common edge case is, what if we want to modify the object we're looping over in real-time and apply that update inside the loop. Then we might need to use a `while`.

## Using `in` for checking existence

In the above example, we used `in` to check if something was in a Python structure. Let's examine that a bit by trying to see if a letter exists in a string.

In [None]:
test_string = "test string of words, yep"

if 'z' in test_string:
    print("Found a 'z'")
if 'e' in test_string:
    print("Found an 'e'")

What about a list?

In [None]:
test_list = [1,2,3,4,5]

if 1 in test_list:
    print("Found a '1'")
if 6 in test_list:
    print("Found an '6'")

## Lambda functions and Sorting

There are two "ways" to do sorting in Python. There's a subtle difference. Let's see them both in action and see if you can spot the difference.

In [None]:
# The columns are (name, age, salary)
alumni = [('bob',32,72000),
          ('alice',29,115000),
          ('charlie',25,95000)]

sorted(alumni) # This is the keyword for the built in python sorter
print(alumni)

In [None]:
alumni.sort() # This is the method that a list has to sort itself
print(alumni)

What happened? When we ran `sorted` it didn't appear to work. Why? 

Let's try it this way:

In [None]:
alumni = [('bob',32,72000),
          ('alice',29,115000),
          ('charlie',25,95000)]
print(sorted(alumni))

Now it worked! That's because when we run `sorted(input_list)`, it actually makes a new list that is the sorted list, instead of changing the data in place. 

It's usually safer to run `sorted` than `.sort`, because if you accidentally `.sort` wrongly, you are changing your source data. Let's play around with our sorting options a bit to try to sort backwards, and then by salary instead.

In [None]:
print(sorted(alumni, reverse=True))

To sort by some value other than the first object in the record, we need to tell Python how to sort. This is a great use for a `lambda` function. Lambda functions are functions we can define that we don't want to save. We just want them to stick around for one line, then go away and not steal memory space. Let's see one in action, then discuss below.

In [None]:
print(sorted(alumni,key=lambda row: row[2]))

The `key` argument for sorted takes in a function that defines which piece of each data we want to sort on. Here we told it to sort on the salary (column 2) information. The function itself isn't very interesting, we don't want it to live beyond this line. So we told Python, "hey, here's a function that takes in a single argument `row` and returns this part of the row. The sorted function knows we can automatically loop over every row using that function. Here's what it would look like:

In [None]:
def get_second_column_of_row(row):
    """
    Returns the second column of the row
    """
    return row[2]

print(sorted(alumni,key=get_second_column_of_row))

But realistically, I don't want to save that function in my name space, I'll never want it again. So instead, I just use a quick lambda function to define the behavior without dirtying up all of my code with one-time-use functions. Lambda functions are a common part of the python language, especially when doing things like using `map` or `filter` functions. But those are better discussed another time.

## Errors should never pass silently

When things happen that are unexpected, have your code raise an exception

In [None]:
class_list = [1,2,3]
if len(class_list) % 2 !=0:
    raise ValueError('list should have an even number of elements')

You can also use `assert` to check if something is behaving correctly.

In [None]:
def sum_up_a_list(list_to_sum):
    """
    Takes a list of numbers and returns the sum
    """
    assert type(list_to_sum) == list, "Input must be a list!"
    return sum(list_to_sum)

print(sum_up_a_list(3))

In [None]:
print(sum_up_a_list([1,2,3]))

## Try-Except

The `try-except` paradigm is a great way of building code that can react to errors on the fly. The way of doing things is to put a block of code inside a `try` statement, then the "how to react" in the `except`.

In [None]:
x = "1" # this is a string version of 1, can't do math on it

try:
    print("Trying to divide")
    print(x/2)
    print("This won't happen because of the error")
    
except TypeError:
    print("\nOops, forgot to make it a number")
    print(float(x)/2)

It's bad form to have an except statement that doesn't tell you what type of contingency it's planning for. These are called "bare excepts" and you should try to avoid them because you don't really know what's happening in the error.

In [None]:
del x # Let's make sure x doesn't exist

try:
    str(x)
except:
    print("¯\_(ツ)_/¯")

try instead and we'll see that the error isn't a "TypeError" so we get a different error message telling us that this isn't the error we planned for:

In [None]:
try:
    str(x)
except TypeError:
    print("x could not be made a string")
    raise

#### Exercise:

Let's use try except to check whether something is a number. Write a function that takes in a value, and returns the integer version of the value is possible, and the string, "NOT A NUMBER" if it's not possible.

In [None]:
# Your code here. 



# Test cases:
# your_function_name("Steve") -> "NOT A NUMBER"
# your_function_name(5.41) -> 5

## Reference vs Values

Variables are not boxes, they're labels for spots in memory where things live. That means we have to be careful about how we assign things, because we might just be telling two things to point to the same address.

In [None]:
a = [1,2,3]
b = a
b.append(4)

# Even though we only modified b, do we think a will have changed?
print(a)
print(b)

`b = a` does not create a new object called `b`. It pastes `b` as another label to `[1,2,3]`.

If you want a new object, you will have to call some kind of constructor (like telling python to make a list out of the data at `a`'s address).

In [None]:
a = [1,2,3]
b = list(a)
b.append(4)

print(a)
print(b)

This is a tricky subject, which we'll discuss in more detail when we get to the lectuer on `deepcopy` vs `copy` later in the course.

# Summary

We've only touched on some of the most common "got ya's" in Python coding. There are many more parts of the standard that can be found here:

[PEP8 Guide](https://www.python.org/dev/peps/pep-0008/)

Remember: make your code easy to read. Your future self and all future team members will thank you.