In my opinion, it is not necessary to become proficient at building good software in
Python to be able to productively do data analysis. I encourage you to use the IPython shell and Jupyter notebooks to experiment with the code examples and to
explore the documentation for the various types, functions, and methods.
# IPython Basics
## Running the IPython Shell
$ ipython

## Running the Jupyter Notebook
One of the major components of the Jupyter project is the notebook, a type of interac‐
tive document for code, text (with or without markup), data visualizations, and other
output.

## Introspection

Using a question mark (?) before or after a variable will display some general information about the object

In [5]:
b = [1, 2, 3]

Using ?? will also show the function’s source code if possible:

? has a final usage, which is for searching the IPython namespace in a manner similar
to the standard Unix or Windows command line. A number of characters combined
with the wildcard (*) will show all names matching the wildcard expression.

## About Magic Commands
Some magic functions behave like Python functions and their output can be assigned
to a variable:

In [10]:
%pwd

'D:\\DS_Works\\DWWP'

In [11]:
foo = %pwd
foo

'D:\\DS_Works\\DWWP'

In [12]:
# %quickref

## Matplotlib Integration
In Jupyter, the command is

In [13]:
%matplotlib inline

# Python Language Basics
## Language Semantics
The Python language design is distinguished by its emphasis on readability, simplicity, and explicitness. Some people go so far as to liken it to “executable pseudocode.”
### Indentation, not braces
Python uses whitespace (tabs or spaces) to structure code instead of using braces as in
many other languages like R, C++, Java, and Perl.

A colon denotes the start of an indented code block after which all of the code must
be indented by the same amount until the end of the block.

By and large, four spaces is
the standard adopted by the vast majority of Python programmers,
so I recommend doing that in the absence of a compelling reason
otherwise.

### Everything is an object
An important characteristic of the Python language is the consistency of its object
model. Every number, string, data structure, function, class, module, and so on exists
in the Python interpreter in its own “box,” which is referred to as a Python object.
Each object has an associated type (e.g., string or function) and internal data. In prac‐
tice this makes the language very flexible, as even functions can be treated like any
other object.

### Comments
Any text preceded by the hash mark (pound sign) # is ignored by the Python inter‐
preter. This is often used to add comments to code. At times you may also want to
exclude certain blocks of code without deleting them. An easy solution is to comment
out the code.

### Function and object method calls
You call functions using parentheses and passing zero or more arguments, optionally
assigning the returned value to a variable. Almost every object in Python has attached functions, known as methods, that have
access to the object’s internal contents. 

### Variables and argument passing
When assigning a variable (or name) in Python, you are creating a reference to the
object on the righthand side of the equals sign. In practical terms, consider a list of
integers:

In [16]:
a = [1, 2, 3]

Suppose we assign a to a new variable b:

In [17]:
b = a

In some languages, this assignment would cause the data [1, 2, 3] to be copied. In
Python, a and b actually now refer to the same object, the original list [1, 2, 3]. You can prove this to yourself by appending an element to
a and then examining b:

In [18]:
a.append(4)
b

[1, 2, 3, 4]

**Assignment is also referred to as binding, as we are binding a name
to an object.**

**When you pass objects as arguments to a function, new local variables are created referencing the original objects without any copying.**

In [31]:
def append_element(some_list):
    some_list.append(4)
    print(some_list)
    
lst = [1, 2, 3]
append_element(lst)
print(lst)

[1, 2, 3, 4]
[1, 2, 3, 4]


**If you bind a new object to a variable inside a function, that change will not be reflected in the parent scope.**

In [32]:
def append_element(some_list):
    some_list = list('1234567')
    print(some_list)
    
lst = [1, 2, 3]
append_element(lst)
print(lst)

['1', '2', '3', '4', '5', '6', '7']
[1, 2, 3]


### Dynamic references, strong types
In contrast with many compiled languages, such as Java and C++, object references in
Python have no type associated with them. There is no problem with the following:

In [33]:
a = 5
type(a)

int

In [34]:
a = 'foo'
type(a)

str

Variables are names for objects within a particular namespace; the type information is
stored in the object itself.

Python is considered a strongly typed language, which means that every object
has a specific type (or class), and implicit conversions will occur only in certain obvi‐
ous circumstances, such as the following:

In [35]:
5 + '5'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [36]:
a = 4.5
b = 2
print('a is {}, b is {}'.format(type(a), type(b)))

a is <class 'float'>, b is <class 'int'>


In [37]:
a / b

2.25

You can check that an object is an instance of a particular type using the isinstance function:

In [38]:
a = 5
isinstance(a, int)

True

isinstance can accept a tuple of types if you want to check that an object’s type is
among those present in the tuple:

In [39]:
a = 5; b = 4.5
isinstance(a, (int, float))

True

In [40]:
isinstance(b, (int, float))

True

### Attributes and methods
Objects in Python typically have both attributes (other Python objects stored “inside”
the object) and methods (functions associated with an object that can have access to
the object’s internal data). Both of them are accessed via the syntax
obj.attribute_name.

Attributes and methods can also be accessed by name via the getattr function:

In [45]:
a = 'Stephen'
getattr(a, 'lower')

<function str.lower()>

### Duck typing
Often you may not care about the type of an object but rather only whether it has
certain methods or behavior. This is sometimes called “duck typing,” after the saying
“If it walks like a duck and quacks like a duck, then it’s a duck.” 

In [46]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError:
        return False

In [47]:
isiterable('a string')

True

In [49]:
isiterable([1, 2, 3])

True

In [50]:
isiterable(5)

False

In [54]:
x = '5789'
if not isinstance(x, list) and isiterable(x):
    x = list(x)

A place where I use this functionality all the time is to write functions that can accept
multiple kinds of input.

### Imports
In Python a module is simply a file with the .py extension containing Python code.

By using the as keyword you can give imports different variable names.

### Binary operators and comparisons
Most of the binary math operations and comparisons are as you might expect:

In [57]:
5 - 7

-2

In [59]:
10 + 50.5

60.5

In [60]:
5 <= 2

False

To check if two **references refer to the same object**, use the is keyword. is not is also
perfectly valid if you want to check that two objects are not the same:

In [61]:
a = [1, 2, 3]
b = a
c = list(a)

In [67]:
a is b

True

In [68]:
a is not c

True

In [69]:
id(a), id(b), id(c)

(3037267060800, 3037267060800, 3037270754368)

Since list always creates a new Python list (i.e., a copy), we can be sure that c is dis‐
tinct from a. Comparing with is is not the same as the == operator, because in this
case we have:

In [70]:
a == c

True

A very common use of is and is not **is to check if a variable is None**, since there is
only one instance of None:

In [71]:
a = None
a is None

True

|Operation|Description|
|---|---|
a + b|Add a and b
a - b|Subtract b from a
a * b|Multiply a by b
a / b|Divide a by b
a // b|Floor-divide a by b, dropping any fractional remainder
a ** b|Raise a to the b power
a & b|True if both a and b are True; for integers, take the bitwise AND
a \| b|True if either a or b is True; for integers, take the bitwise OR
**a ^ b**|**For booleans, True if a or b is True, but not both; for integers, take the bitwise EXCLUSIVE-OR**
a == b|True if a equals b
a != b|True if a is not equal to b
a <= b, a < b|True if a is less than (less than or equal) to b
a > b, a >= b|True if a is greater than (greater than or equal) to b
a is b|True if a and b reference the same Python object
a is not b|True if a and b reference different Python objects

In [73]:
0 ^ 0

0

In [83]:
8 ^ 6

14

In [86]:
bin(8), bin(5)

('0b1000', '0b101')

In [87]:
0b1101

13

### Mutable and immutable objects
Most objects in Python, such as lists, dicts, NumPy arrays, and most user-defined
types (classes), are mutable. This means that the object or values that they contain can
be modified:

In [88]:
a_list = ['foo', 2, [4, 5]]
a_list[2] = (3, 4)
a_list

['foo', 2, (3, 4)]

Others, like strings and tuples, are immutable:

In [89]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = 'four'

TypeError: 'tuple' object does not support item assignment

Remember that just because you can mutate an object does not mean that you always should. Such actions are known as side effects. If possible, I recommend trying to avoid side effects and favor immutability, even though there may be mutable objects involved.If possible, I recommend trying to avoid side effects and favor immutability, even though there may be mutable objects involved.

## Scalar Types
Python along with its standard library has a small set of built-in types for handling
numerical data, strings, boolean (True or False) values, and dates and time. These
“single value” types are sometimes called scalar types and we refer to them in this
book as scalars.

|Type|Description|
|---|---|
None|The Python “null” value (only one instance of the None object exists)
str|String type; holds Unicode (UTF-8 encoded) strings
bytes|Raw ASCII bytes (or Unicode encoded as bytes)
float|Double-precision (64-bit) floating-point number (note there is no separate double type)
bool|A True or False value
int|Arbitrary precision signed integer

### Numeric types
The primary Python types for numbers are int and float. An int can store arbitrarily large numbers:

In [95]:
ival = 1237513627
ival ** 7

4444776514818367223522115848560207047841971416909503010826526003

Floating-point numbers are represented with the Python float type. Under the hood each one is a double-precision (64-bit) value. They can also be expressed with scientific notation:

In [98]:
fval = 7.243
fval

7.243

In [97]:
fval2 = 6.78e-5
fval2

6.78e-05

Integer division not resulting in a whole number will always yield a floating-point
number:

In [1]:
3 / 2

1.5

To get C-style integer division (which drops the fractional part if the result is not a
whole number), use the floor division operator //:

In [2]:
3 // 2

1

### Strings
You can write string literals using either single quotes ' or double quotes ":

In [3]:
a = 'one way of writing a string'
b = "another way"

For multiline strings with line breaks, you can use triple quotes, either ''' or """:

In [4]:
c = """
This is a longer string that 
spans multiple lines
"""

The line breaks after """ and after lines are included in the string. We can count the new line characters with the count method on c:

In [5]:
c.count('\n')

3

Python strings are immutable; you cannot modify a string:

In [6]:
a = 'this is a string'
a[10] = 'f'

TypeError: 'str' object does not support item assignment

In [8]:
b = a.replace('string', 'longer string')
b, a

('this is a longer string', 'this is a string')

Many Python objects can be converted to a string using the str function:

In [9]:
a = 5.6
s = str(a)
print(s)

5.6


Strings are a sequence of Unicode characters and therefore can be treated like other
sequences, such as lists and tuples:

In [10]:
s = 'Python'
list(s)

['P', 'y', 't', 'h', 'o', 'n']

In [11]:
s[: 3]

'Pyt'

The backslash character \ is an escape character, meaning that it is used to specify
special characters like newline \n or Unicode characters. To write a string literal with
backslashes, you need to escape them:

In [12]:
s = '12\\34'
print(s)

12\34


You can preface the leading quote of the string with r,
which means that the characters should be interpreted as is. The r stands for raw.

In [16]:
s = r'this\has\no\sepcial\characters'
s

'this\\has\\no\\sepcial\\characters'

In [17]:
print(s)

this\has\no\sepcial\characters


Adding two strings together concatenates them and produces a new string:

In [19]:
a = 'This is the first half '
b = 'and this is the second half'
a + b

'This is the first half and this is the second half'

String templating or formatting is another important topic. Here I will briefly describe the
mechanics of one of the main interfaces. String objects have a format method that
can be used to substitute formatted arguments into the string, producing a new
string:

In [20]:
template = '{0:.2f} {1:s} are worth US${2:d}'

In this string,
- {0:.2f} means to format the first argument as a floating-point number with two decimal places.
- {1:s} means to format the second argument as a string.
- {2:d} means to format the third argument as an exact integer.

To substitute arguments for these format parameters, we pass a sequence of argu‐
ments to the format method:

In [24]:
template.format(4.5560, 'Argentine Pesos', 2)

'4.56 Argentine Pesos are worth US$2'

String formatting is a deep topic; there are multiple methods and numerous options
and tweaks available to control how values are formatted in the resulting string. To
learn more, I recommend consulting the official Python documentation.

### Bytes and Unicode
In modern Python (i.e., Python 3.0 and up), Unicode has become the first-class string
type to enable more consistent handling of ASCII and non-ASCII text.

In [26]:
val = 'español'
val

'español'

We can convert this Unicode string to its UTF-8 bytes representation using the
encode method:

In [28]:
val_utf8 = val.encode('utf-8')
val_utf8

b'espa\xc3\xb1ol'

Assuming you know the Unicode encoding of a bytes object, you can go back using
the decode method:

In [29]:
val_utf8.decode('utf-8')

'español'

While it’s become preferred to use UTF-8 for any encoding, for historical reasons you
may encounter data in any number of different encodings:

In [30]:
val.encode('latin1')

b'espa\xf1ol'

In [31]:
val.encode('utf-16')

b'\xff\xfee\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'

In [33]:
val.encode('utf-16le')

b'e\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'

It is most common to encounter bytes objects in the context of working with files,
where implicitly decoding all data to Unicode strings may not be desired.

Though you may seldom need to do so, you can define your own byte literals by pre‐
fixing a string with b:

In [34]:
bytes_val = b'this is bytes'
bytes_val

b'this is bytes'

In [36]:
decoded = bytes_val.decode('utf8')
decoded

'this is bytes'

### Booleans
The two boolean values in Python are written as True and False. Comparisons and
other conditional expressions evaluate to either True or False. Boolean values are
combined with the and and or keywords:

In [37]:
True and True

True

In [38]:
False or True

True

### Type casting
The str, bool, int, and float types are also functions that can be used to cast values
to those types:

In [41]:
s = '3.14159'
fval = float(s)
type(fval)

float

In [42]:
int(fval)

3

In [43]:
bool(fval)

True

In [44]:
bool(0)

False

### None
None is the Python null value type. If a function does not explicitly return a value, it
implicitly returns None:

In [45]:
a = None
a is None

True

In [47]:
b = 5
b is not None

True

None is also a common default value for function arguments:

In [48]:
def add_and_maybe_mutiply(a, b, c=None):
    result = a + b
    
    if c is not None:
        result = result * c
    
    return result

While a technical point, it’s worth bearing in mind that None is not only a reserved
keyword but also a unique instance of NoneType:

In [49]:
type(None)

NoneType

### Dates and times
The built-in Python datetime module provides datetime, date, and time types.

In [51]:
from datetime import datetime, date, time
dt = datetime(2022, 10, 5, 18, 53, 55)
dt.day, dt.minute

(5, 53)

Given a datetime instance, you can extract the equivalent date and time objects by
calling methods on the datetime of the same name:

In [52]:
dt.date()

datetime.date(2022, 10, 5)

In [53]:
dt.time()

datetime.time(18, 53, 55)

The strftime method formats a datetime as a string:

In [54]:
dt.strftime('%m/%d/%Y %H:%M')

'10/05/2022 18:53'

Strings can be converted (parsed) into datetime objects with the strptime function:

In [56]:
datetime.strptime('20221005', '%Y%m%d')

datetime.datetime(2022, 10, 5, 0, 0)

When you are aggregating or otherwise grouping time series data, it will occasionally
be useful to replace time fields of a series of datetimes

In [57]:
dt.replace(minute=0, second=0)

datetime.datetime(2022, 10, 5, 18, 0)

Since datetime.datetime is an immutable type, methods like these always produce
new objects.

The difference of two datetime objects produces a datetime.timedelta type:

In [59]:
dt2 = datetime(2022, 10, 10, 12, 12, 12)
delta = dt2 - dt
delta

datetime.timedelta(days=4, seconds=62297)

In [60]:
type(delta)

datetime.timedelta

Adding a timedelta to a datetime produces a new shifted datetime:

In [62]:
dt, dt + delta

(datetime.datetime(2022, 10, 5, 18, 53, 55),
 datetime.datetime(2022, 10, 10, 12, 12, 12))

Type|Description
---|---
%Y|Four-digit year
%y|Two-digit year
%m|Two-digit month [01, 12]
%d|Two-digit day [01, 31]
%H|Hour (24-hour clock) [00, 23]
%I|Hour (12-hour clock) [01, 12]
%M|Two-digit minute [00, 59]
%S|Second [00, 61] (seconds 60, 61 account for leap seconds)
%w|Weekday as integer [0 (Sunday), 6]
%U|Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”
%W|Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”
%z|UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
%F|Shortcut for %Y-%m-%d (e.g., 2012-4-18)
%D|Shortcut for %m/%d/%y (e.g., 04/18/12)

## Control Flow
Python has several built-in keywords for conditional logic, loops, and other standard
control flow concepts found in other programming languages.

### if, elif, and else
The if statement is one of the most well-known control flow statement types. It
checks a condition that, if True, evaluates the code in the block that follows:

In [67]:
x = 3
if x < 0:
    print("It's negative")

An if statement can be optionally followed by one or more elif blocks and a catchall else block if all of the conditions are False:

In [69]:
if x < 0:
    print('It\'s negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')

Positive but smaller than 5


With a compound condition using and or or, conditions are evaluated left to right and will
short-circuit:

In [70]:
a = 5; b = 7
c = 8; d = 4

In [71]:
if a < b or c > d:
    print('Made it')

Made it


It is also possible to chain comparisons:

In [75]:
4 > 3 > 6 > 1, 4 > 3 > 2 > 1

(False, True)

### for loops
for loops are for iterating over a collection (like a list or tuple) or an iterater.

You can advance a for loop to the next iteration, skipping the remainder of the block,
using the continue keyword.

In [76]:
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value

A for loop can be exited altogether with the break keyword. 

In [77]:
sequence = [1, 2, 3, 4, 7, 6, 5, 0, 9, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break
    total_until_5 += value

The break keyword only terminates the innermost for loop; any outer for loops will
continue to run:

In [79]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j), end=',')

(0, 0),(1, 0),(1, 1),(2, 0),(2, 1),(2, 2),(3, 0),(3, 1),(3, 2),(3, 3),

As we will see in more detail, if the elements in the collection or iterator are sequen‐
ces (tuples or lists, say), they can be conveniently unpacked into variables in the for
loop statement.

### while loops
A while loop specifies a condition and a block of code that is to be executed until the
condition evaluates to False or the loop is explicitly ended with break:

In [82]:
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2

### pass
pass is the “no-op” statement in Python. It can be used in blocks where no action is to
be taken (or as a placeholder for code not yet implemented); it is only required
because Python uses whitespace to delimit blocks:

In [83]:
if x < 0:
    print('negative!')
elif x == 0:
    pass
else:
    print('positive!')

positive!


### range
The range function returns an iterator that yields a sequence of evenly spaced
integers:

In [84]:
range(10)

range(0, 10)

In [85]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Both a start, end, and step (which may be negative) can be given:

In [87]:
list(range(0, 20, 2))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [86]:
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

As you can see, range produces integers up to but not including the endpoint. A
common use of range is for iterating through sequences by index:

In [90]:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
    val = seq[i]

While you can use functions like list to store all the integers generated by range in
some other data structure, often the default iterator form will be what you want.

In [91]:
sum = 0
for i in range(100000):
    if i % 3 == 0 or i % 5 == 0:
        sum += i

While the range generated can be arbitrarily large, the memory use at any given time
may be very small.

### Ternary expressions
A ternary expression in Python allows you to combine an if-else block that pro‐
duces a value into a single line or expression. The syntax for this in Python is:

```python
value = true-expr if condition else false-expr
```

Here, true-expr and false-expr can be any Python expressions.

In [92]:
x = 5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'

As with if-else blocks, only one of the expressions will be executed. Thus, the “if”
and “else” sides of the ternary expression could contain costly computations, but only
the true branch is ever evaluated.

While it may be tempting to always use ternary expressions to condense your code,
realize that you may sacrifice readability if the condition as well as the true and false
expressions are very complex.