# Chapter 2: A Crash Course in Python

The second chapter of "Data Science from Scratch" book.

This is only a distilled content that I consider worth mentioning and giving some practice time. You can find more inside the book.

Here we have:
- Exceptions
- Lists
- Tuples
- Dictionaries
    - defaultdict
    - Counter
- Sets
- Control Flow (only a few interesting aspects)
- Booleans
- Sorting
- List Comprehensions (very useful)
- Generators and Iterators
- (Pseudo)Randomness
- OOP

## Exceptions
Pretty standard except here we use "except" keyword to catch exceptions instead of "catch" which I'm more familiar with.

In [None]:
try:
    print(0/0)
except ZeroDivisionError:
    print("Cannot divide by zero")

## Lists
May resemble arrays of other languages, however, they certainly have more functionalities.

In [1]:
integer_list = [1, 2, 3]
print(integer_list)
print(len(integer_list))
print(sum(integer_list))

[1, 2, 3]
3
6


In [2]:
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]
print(heterogeneous_list)
print(list_of_lists)

['string', 0.1, True]
[[1, 2, 3], ['string', 0.1, True], []]


Uncommon indexing stuff.

In [4]:
simple_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(simple_list)
print(simple_list[0])
print(simple_list[1])

# Get the last item
print(simple_list[-1])

# Get next-to-last item
print(simple_list[-2])

# Lists are editable
simple_list[0] = -1
print(simple_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0
1
9
8
[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Now some "slicing".

In [5]:
# First three
print(simple_list[:3])

# Three to end
print(simple_list[3:])

# One to four (ending index is exclusive)
print(simple_list[1:5])

# Last three
print(simple_list[-3:])

# Without first and last
print(simple_list[1:-1])

# Copy of the entire list
print(simple_list[:])

[-1, 1, 2]
[3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4]
[7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]


You can use an "in" operator to check for list membership, however, be careful as it involves examining every element one at a time, which obviously may cause performance issues.

In [6]:
print(1 in simple_list)
print(20 in simple_list)

True
False


Some useful functions.

In [8]:
simple_list = [1, 2, 3]
print(simple_list)

# Warning - modifies original list
simple_list.extend([4, 5, 6])
print(simple_list)

[1, 2, 3]
[1, 2, 3, 4, 5, 6]


In [9]:
simple_list = [1, 2, 3]

# Use list addition when you don't want to modify your original list
another_list = simple_list + [4, 5, 6]
print(another_list)

[1, 2, 3, 4, 5, 6]


In [10]:
simple_list.append(0)
print(simple_list)

[1, 2, 3, 0]


Unpacking lists:

In [12]:
simple_list = [1, 2]
x, y = simple_list
print(x, y)

1 2


Use underscore "_" when you don't care about a value

In [13]:
_, z = simple_list
print(z)

2


## Tuples
In short, they might be described as read-only lists. They use paratheses (or nothing).

In [14]:
my_list = [1, 2]
my_tuple = (1, 2)
another_tuple = 3, 4

# Can modify a list
my_list[1] = 3
print(my_list)

# But cannot do the same with tuples
try:
    my_tuple[1] = 3
except TypeError:
    print("Cannot modify a tuple!")

[1, 3]
Cannot modify a tuple!


Tuples are a convenient way to return multiple values from functions

In [15]:
def my_sum(x, y):
    return x+y

def my_mul(x, y):
    return x*y

def sum_and_mul(x, y):
    """Returns tuple of sum and product"""
    return my_sum(x, y), my_mul(x, y)

s, m = sum_and_mul(4, 5)
print("Sum = {}, Product = {}".format(s, m))

Sum = 9, Product = 20


Useful trick - swap variables in one line

In [16]:
x, y = 1, 2
print(x, y)

x, y = y, x
print(x, y)

1 2
2 1


## Dictionaries
Standard, JSON-like objects consisting of key-value pairs.

In [None]:
# "{}" is faster than "dict()" and is also more "Pythonic"
#grades = dict()
grades = {"Adam": 85, "Ada": 90}

print(grades["Adam"])

In [None]:
# Catch error when there's no such key defined
try:
    print(grades["Kate"])
except KeyError:
    print("There's no grade for Kate!")

In [None]:
# Default to 0 if there's no key
kates_grade = grades.get("Kate", 0)
print(kates_grade)

In [None]:
print("Number of students: %s" % len(grades))
grades["Kate"] = 99
print("Number of students: %s" % len(grades))

In [None]:
print("Keys:")
for key in grades.keys():
    print("   %s" % key)

print("Values:")
for value in grades.values():
    print("   %s" % value)

print("Is Kate in grades?")
print("Kate" in grades)
print("Is Jack in grades?")
print("Jack" in grades)

print("Now let's print all the key-value pairs:")
for key, value in grades.items():
    print("({}, {})".format(key, value))

### Defaultdict
Simply, a dictionary with a factory function provided so it is used in case no such key is present yet.

In [17]:
from collections import defaultdict

document = "This is a sample text to perform some basic operations on. Python is an awesome language and super powerful. Did I mention that Python is great? Only Python!".split(" ")
print(document)

# int() creates 0
word_counts = defaultdict(int)

for word in document:
    # Creates a key with default value of 0 if such key has not been created yet
    word_counts[word] += 1

['This', 'is', 'a', 'sample', 'text', 'to', 'perform', 'some', 'basic', 'operations', 'on.', 'Python', 'is', 'an', 'awesome', 'language', 'and', 'super', 'powerful.', 'Did', 'I', 'mention', 'that', 'Python', 'is', 'great?', 'Only', 'Python!']


You can provide any other constructor but there's one requirement - it must be parameterless

In [None]:
# Uses list()
dd_list = defaultdict(list)
dd_list[2].append(1)
print(dd_list)

In [None]:
# Uses dict()
dd_dict = defaultdict(dict)
dd_dict["Damian"]["City"] = "Colchester"
print(dd_dict)

In [None]:
# Can use lambdas as well
dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1
print(dd_pair)

### Counter
Tip: Useful for histograms (we'll see later).

In [None]:
from collections import Counter

c = Counter([0, 1, 2, 0])
print(c)

In [None]:
word_counts = Counter(document)
print(word_counts)

In [None]:
for word, count in word_counts.most_common(3):
    print (word, count)

## Sets
To me, this seems like a dictionary that uses only keys. As a consequence, it only stores unique values and there's no problem when you try to add a value second time.

It is also worth to mention, that the "in" operation is much faster here compared to lists, where every element needs to be checked. Keep that in mind!

Two main reasons to use sets:
- "in" is a very fast operation on sets
- find distinct items in a collection

In [None]:
s = set()
s.add(1)
s.add(2)
s.add(2)
print(s)
print(len(s))
print(2 in s)
print(3 in s)

In [None]:
item_list = [1, 2, 3, 1, 2, 3]
print(item_list)

num_items = len(item_list)
print(num_items)

item_set = set(item_list)
print(item_set)

num_distinct_items = len(item_set)
print(num_distinct_items)

dis_item_list = list(item_set)
print(dis_item_list)

## Control flow

### if-elif-else
Pretty standard as any other language. The different worth mentioning is "elif" which stands for "else if" construction.

In [18]:
if 1 > 2:
    print("Is 1 greater than 2?")
elif 1 > 3:
    print("Or greater than 3?")
else:
    print("Well, all above is false!")

Well, all above is false!


A one-liner:

In [19]:
print("even" if 4 % 2 == 0 else "odd")

even


### for loops

In [20]:
for x in range(10):
    if x == 3:
        continue
    if x == 5:
        break
    print(x)

0
1
2
4


## Booleans
Again, pretty standard except True and False values are capitalized

In [21]:
print(1 < 2)
print(True == False)

True
False


Uses "None" as not defined value (known as "null" in other languages)

In [25]:
x = None
print(x == None)

# More Pythonic
print(x is None)

True
True


### And operator
Returns its second value when the first is True; first value otherwise

In [26]:
my_list = []
first_item = my_list and my_list[0]
print(first_item)

my_list.append(1)
first_item = my_list and my_list[0]
print(first_item)

[]
1


### Or operator
Returns its first value when it's True; second value otherwise

In [27]:
x = None

# Equivalent of "x ?? 0" in C#
safe_x = x or 0
print(x)

x = 1
safe_x = x or 0
print(x)

None
1


Useful boolean functions:
- all
- any

They take a list as a parameter

In [29]:
print(all([True, 1, {3}]))
print(all([True, 1, {}]))
print(any([True, 1, {}]))
print(all([]))
print(any([]))

print(any([
    False,
    None,
    [],
    {},
    "",
    set(),
    0,
    0.0
]))

True
False
True
True
False
False
