### The Zen of Python

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Getting Python

You can download Python from Python.org. But if you don’t already have
Python, I recommend instead installing the Anaconda distribution, which
already includes most of the libraries that you need to do data science.


### Virtual Environments

Every data science project you do will require some combination of external libraries, sometimes with specific versions that differ from the specific versions you used for other projects. If you were to have a single Python installation, these libraries would conflict and cause you all sorts of problems.

The standard solution is to use virtual environments, which are sandboxed Python environments that maintain their own versions of Python libraries (and, depending on how you set up the environment, of Python itself).

I recommended you install the Anaconda Python distribution, so in this section I’m going to explain how Anaconda’s environments work. If you are not using Anaconda, you can either use the built-in venv module or install virtualenv. In which case you should follow their instructions instead.

To create an (Anaconda) virtual environment, you just do the following:
 * create a Python 3.6 environment named "dsfs"
> conda create -n dsfs python=3.6
 
Follow the prompts, and you’ll have a virtual environment called “dsfs,” with the instructions:

 To activate this environment, use:
 > source activate dsfs

 To deactivate an active environment, use:
 > source deactivate

 As indicated, you then activate the environment using:
 > source activate dsfs

at which point your command prompt should change to indicate the active environment

As long as this environment is active, any libraries you install will be installed only in the dsfs environment

Now that you have your environment, it’s worth installing IPython, which is a full-featured Python shell:
 > python -m pip install ipython

As a matter of good discipline, you should always work in a virtual environment, and never using the “base” Python installation.



### Whitespace Formatting

Many languages use curly braces to delimit blocks of code. Python uses
indentation:

In [2]:
# The pound sign marks the start of a comment. Python itself
# ignores the comments, but they're helpful for anyone reading the code.
for i in [1, 2, 3, 4, 5]:
    print(i)
    for j in [1, 2, 3, 4, 5]:
        print(j)
        print(i+j)
    print(i)
print('done looping')

1
1
2
2
3
3
4
4
5
5
6
1
2
1
3
2
4
3
5
4
6
5
7
2
3
1
4
2
5
3
6
4
7
5
8
3
4
1
5
2
6
3
7
4
8
5
9
4
5
1
6
2
7
3
8
4
9
5
10
5
done looping


This makes Python code very readable, but it also means that you have to
be very careful with your formatting.

##### WARNING
Programmers will often argue over whether to use tabs or spaces for indentation. For
many languages it doesn’t matter that much; however, Python considers tabs and spaces
different indentation and will not be able to run your code if you mix the two. When
writing Python you should always use spaces, never tabs. (If you write code in an editor
you can configure it so that the Tab key just inserts spaces.)

Whitespace is ignored inside parentheses and brackets, which can be
helpful for long-winded computations: and for making code easier to read

In [4]:
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 +
                           11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)
long_winded_computation

210

In [7]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

easier_to_read_list_of_lists = [[1, 2, 3],
                                [4, 5, 6],
                                [7, 8, 9]]

You can also use a backslash to indicate that a statement continues onto the
next line, although we’ll rarely do this:

In [10]:
two_plus_three = 2 + \
                 3

One consequence of whitespace formatting is that it can be hard to copy and
paste code into the Python shell. For example, if you tried to paste the code:
into the ordinary Python shell, you would receive the complaint:
> IndentationError: expected an indented 

because the interpreter thinks the blank line signals the end of the for
loop’s block.



In [12]:
for i in [1, 2, 3, 4, 5]:    
    
    #notice the blank line
    print(i)

1
2
3
4
5


IPython has a magic function called %paste, which correctly pastes
whatever is on your clipboard, whitespace and all. This alone is a good
reason to use IPython.


### Modules

Certain features of Python are not loaded by default. These include both
features that are included as part of the language as well as third-party
features that you download yourself. In order to use these features, you’ll
need to import the modules that contain them.
One approach is to simply import the module itself:

In [14]:
import re
my_regex = re.compile("[0-9]+", re.I)
my_regex

re.compile(r'[0-9]+', re.IGNORECASE|re.UNICODE)

Here, re is the module containing functions and constants for working with
regular expressions. After this type of import you must prefix those
functions with re. in order to access them.

If you already had a different re in your code, you could use an alias:

In [15]:
import re as regex
my_regex = regex.compile("[0-9]+", regex.I)
my_regex

re.compile(r'[0-9]+', re.IGNORECASE|re.UNICODE)

You might also do this if your module has an unwieldy name or if you’re
going to be typing it a lot. For example, a standard convention when
visualizing data with matplotlib is:

>import matplotlib.pyplot as plt
>>plt.plot(...)

If you need a few specific values from a module, you can import them
explicitly and use them without qualification:

In [17]:
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()

# reference : https://www.geeksforgeeks.org/python-collections-module/

If you were a bad person, you could import the entire contents of a module
into your namespace, which might inadvertently overwrite variables you’ve
already defined:

In [19]:
match = 10
from re import *  # uh oh, re has a match function
print(match)      # "<function match at 0x10281e6a8>"

<function match at 0x0000000002A38D08>


However, since you are not a bad person, you won’t ever do this.


### Functions

A function is a rule for taking zero or more inputs and returning a
corresponding output. In Python, we typically define functions using def:


In [1]:
def double(x):
    """
    This is where you put an optional docstring that explains what the function does.
    For example, this function multiplies its input by 2.
    """
    return x * 2

Python functions are first-class, which means that we can assign them to
variables and pass them into functions just like any other arguments:


In [9]:
def apply_to_one(f):
    """ Calls the function f with 1 as its arguments"""
    return f(1)

# assigning the function (double) to a variable (my_double)
my_double = double #refers to the previously defined function 

# passing a function object(my_double) into another function(apply_to_one) as an argument
x = apply_to_one(my_double) # equals 2
print(x)

2


It is also easy to create short anonymous functions, or lambdas:

In [10]:
y = apply_to_one(lambda x: x + 4)
y

5

You can assign lambdas to variables, although most people will tell you that
you should just use def instead:


In [11]:
another_double = lambda x: 2 * x    # don't do this

def another_double(x):
    """ Do this instead"""
    return 2 * x

Function parameters can also be given default arguments, which only need
to be specified when you want a value other than the default:


In [12]:
def my_print(message = "my default message"):
    print(message)
    
my_print("hello") # prints 'hello'
my_print()        # prints 'my default message'

hello
my default message


It is sometimes useful to specify arguments by name:

In [19]:
def full_name(first = "What's-his-name", last = "something"):
    return first + " " + last

full_name("Joel", "Grus")

'Joel Grus'

In [18]:
full_name("Joel")

'Joel something'

In [16]:
full_name(last="Grus")

"What's-his-name Grus"

We will be creating many, many functions

### Strings

Strings can be delimited by single or double quotation marks (but the
quotes have to match):

In [1]:
single_quoted_string = 'data science'
double_quoted_string = "data science"

Python uses backslashes to encode special characters. For example

In [5]:
tab_string = "\t"   # represents the tab character
len(tab_string)     # is 1

1

If you want backslahses as backslashes (which you might in windows directory names or in regular expressions), 
you can create raw strings using r"":

In [6]:
not_tab_string = r"\t"  # represents the characters '\' and 't'
len(not_tab_string)     # is 2

2

You can create multiline strings using three double quotes: 

In [7]:
multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""

A new feature in python 3.6 is the f-string, which provides a simple way to substitute values into strings.
For example, if we had the first name and last name given separately:

In [8]:
first_name = "Joel"
last_name = "Grus"

We might want to combine them into a full name. There are multiple ways to construct such a full_name string:

In [10]:
full_name1 = first_name + " " + last_name               # string addition
full_name2 = "{0} {1}".format(first_name, last_name)    # string.format

but the f-string way is much less unwieldy:

In [12]:
full_name3 = f"{first_name} {last_name}"
full_name3

'Joel Grus'

and we'll prefer it throughout the book.

### Exceptions

When something goes wrong, python will raise an <i>exception</i>. Unhandled, exceptions will cause your program to crash.
You can handle them using try and except:

In [13]:
try:
    print(0/0)
except ZeroDivisionError:
    print("cannot divide by Zero")

cannot divide by Zero


Although in many languages exceptions are considered bad, in Python there is no shame in using them to make your code cleaner, 
and we will sometimes do so.

### Lists

Probably the most fundamental data structure in Python is the <i>list</i>, which is simply an ordered collection ( it is similar to 
what in other languages might be called an <i>array</i>, but with some added funtionality):

In [14]:
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]

In [15]:
list_of_lists

[[1, 2, 3], ['string', 0.1, True], []]

In [18]:
list_length = len(integer_list)   #equals 3
list_length

3

In [20]:
list_sum = sum(integer_list)    # equals 6
list_sum

6

You can get or set a <i>n</i>th element of a list with square brackets:

In [1]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
zero = x[0]    # equals 0, lists are 0-indexed
zero

0

In [3]:
one = x[1]    # equals 1
one

1

In [4]:
nine = x[-1]  # equals 9, 'pythonic' for last element
nine

9

In [5]:
eight = x[-2] # equals 8, 'pythonic' for next-to-last element
eight

8

In [6]:
x[0] = -1     # now x is [-1, 1, 2, 3, ...., 9]
x

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can also use square brackets to <i>slice</i> lists. The slice <b>i : j</b> means all elements from <b>i</b> (inclusive) to <b>j</b> (not inclusive). If you leave of the start of the slice, you'll slice from the beggining of the list, and if you leave of the end of the slice, you'll slice until the end of the list:

In [8]:
first_three = x[:3]
first_three

[-1, 1, 2]

In [9]:
three_to_end = x[3:]
three_to_end

[3, 4, 5, 6, 7, 8, 9]

In [11]:
one_to_four = x[1:5]
one_to_four

[1, 2, 3, 4]

In [15]:
last_three = x[-3:]
last_three

[7, 8, 9]

In [16]:
without_first_and_last = x[1:-1]
without_first_and_last

[1, 2, 3, 4, 5, 6, 7, 8]

In [17]:
copy_of_x = x[:]
copy_of_x

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]

you can similarly slice strings and other "sequential" types.
<br><br>
A Slice can take a third argument to indicate its <i>stride</i>, which can be negative

In [23]:
every_third = x[::3]
every_third

[-1, 3, 6, 9]

In [40]:
five_to_three = x[5:2:-1]
five_to_three

[5, 4, 3]

Python has an <b>in</b> operator to check for list membership

In [41]:
1 in [1, 2, 3]

True

In [42]:
0 in [1, 2, 3]

False

This check invloves examining the elements of the list one at a time, which means that you probably shouldn't use it unless you know your list is pretty small (or unless you don't care how long the check takes).
<br><br>
It is easy to concatenate lists together. If you want to modify a list in place, you can use <b>extend</b> to add items from 
another collection:

In [43]:
x = [1, 2, 3]
x

[1, 2, 3]

In [44]:
x.extend([4, 5, 6])     # x is now [1, 2, 3, 4, 5, 6]
x

[1, 2, 3, 4, 5, 6]

If you don't want to modify x,  you can use list addition:

In [45]:
x = [1, 2, 3]

In [47]:
y = x + [4, 5, 6]   # y is [1, 2, 3, 4, 5, 6]; x is unchanged
y

[1, 2, 3, 4, 5, 6]

In [49]:
x     #x is unchanged

[1, 2, 3]

Most frequently we will append to lists one item at a time:

In [52]:
x = [1, 2, 3]
x.append(0)
x

[1, 2, 3, 0]

In [54]:
y = x[-1]
y

0

In [55]:
z = len(x)
z

4

It's often convenient to unpack lists when you know how many elements they contain:

In [59]:
x, y = [1, 2]    # now x is 1, y is 2

In [57]:
x

1

In [58]:
y

2

although you will get a <i>ValueError</i> if you don't have the same number of elements on both sides.
<br><br>
A common idiom is to use an underscore for a value you are going to through away:

In [60]:
_, y = [1, 2]    # now y == 2, didn't care about the first element
y

2

### Tuples

Tuples are lists' immutable cousins. Pretty much anything you can do to a list that doesn't involve modifying it, you can do to a tuple.You specify a tuple by using parentheses (or nothing) instead of square brackets:

In [1]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4

In [3]:
my_list[1] = 3  # my_list is now [1, 3]
my_list

[1, 3]

In [4]:
try:
    my_tuple[1] = 3
    
except TypeError:
    print("Cannot modify a tuple")

Cannot modify a tuple


Tuples are a convenient way to return multiple values from functions:

In [5]:
def sum_and_product(x, y):
    return (x + y), (x * y)

In [7]:
sp = sum_and_product(2, 3)
sp

(5, 6)

In [11]:
s, p = sum_and_product(5, 10)
s, p

(15, 50)

Tuples (and lists) can also be used for multiple assignment:

In [22]:
x, y = 1, 2
print("x is {0}, y is {1}".format(x, y))

x is 1, y is 2


In [23]:
x, y = y, x  # Pythonic way to swap variables; now x is 2, y is 1
print("x is {0}, y is {1}".format(x, y))

x is 2, y is 1


### Dictionaries



Another fundamental data structure is a dictionary, which associates <i>values</i> with <i>keys</i> and allows you to quickly retrieve the value corresponding to a given key:

In [2]:
empty_dic = {}                    # Pythonic
empty_dic2 = dict()               # less Pythonic
grades = {'joel': 80, 'Tim':95}   # dictionary literal

grades

{'joel': 80, 'Tim': 95}

You can look up the value for a key using square brackets:

In [4]:
joels_grade = grades["joel"]

joels_grade

80

But you will get a <b>KeyError</b> if you ask for a key that's not in the dictionary:

In [5]:
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("no grade for Kate!")

no grade for Kate!


You can check for the existence of a key using <b>in</b>:

In [9]:
joel_has_grade = "joel" in grades
kate_has_grade = "Kate" in grades

In [10]:
joel_has_grade

True

In [11]:
kate_has_grade

False