This notebook contains notes on Python basics for data science, with an emphasis on differences from R. These notes are primarily based upon "A Whirlwind Tour of Python" (VanderPlas 2016) and "Python Data Science Handbook" (VanderPlas 2017).

# Semantics
Python is object-oriented, while R is largely functional (S3 classes being an exception). All Python objects have *attributes* (object metadata) and *methods* (functionality associated with the type of object). Object methods are the most common way of interacting with data in Python (while object methods are generally used under-the-hood in R).

Use dot syntax to access an object's methods (in this case, "append" for a "list" object):

In [1]:
# Create a list L
L = [1, 2, 3]
L.append(100)
print(L)

[1, 2, 3, 100]


**Assignment operators** are always = in Python.

In [2]:
a = 24
print(a)

24


**Aritmetic** (e.g. `+`) and **comparison** (e.g. `>`) **operators** are largely the same as those in R. Check documentation for any differences (floor division, modulus, and exponentiation).

To combine assignment with an arithmetic operator:

In [3]:
a += 2 #Equivalent to a = a + 2
print(a)

26


**Boolean operators** are `and`, `or`, and `not`.

In [4]:
x = 4
(x < 6) and (x > 2)

True

In [5]:
(x > 10) or (x % 2 == 0)

True

In [6]:
not (x < 6)

False

**Identity operators** are `is`, `is not`, (are objects identical?) `in`, and `not in` (are objects members of one another?). Object identity is different from equality because Python points to variable values with distinct objects.

In [8]:
a = [1, 2, 3]
b = [1, 2, 3]
a == b

True

In [9]:
a is b

False

In [10]:
a is not b

True

In [11]:
1 in [1, 2, 3]

True

In [12]:
2 not in [1, 2, 3]

False

**Strings** are relatively straightforward in Python, defined by characters enclosed by single or double quotes.

In [13]:
message = "Hello"
response = "Hey there"

To get the number of characters in a string:

In [14]:
len(response)

9

To concatenate strings:

In [15]:
message + response

'HelloHey there'

# Base Data Structures
Data structures are "compound types", which contain data types such as "integer". These are comparable to vectors, matrices, etc in R.

## Lists
**Lists** are *ordered* and *mutable* collections of data, defined by brackets (`[]`).

In [16]:
L = [2,3,5,7]

To concatenate a list with another list:

In [17]:
L + [13,17,19]

[2, 3, 5, 7, 13, 17, 19]

To sort by values:

In [19]:
L.sort()
L

[2, 3, 5, 7]

Similar to R, lists can contain objects & data of any type, including other lists.

In [20]:
L = [1, "two", 3.14, [0,3,5]]

*List indexing* (accessing single elements) and *slicing* (accessing multiple elements) is done by placing brackets containing the index/indices directly after the list object.

Python indexes starting at 0, while R indexes starting at 1. You can also reverse index in Python, with the last item being -1, and each index value decreasing as you approach the beginning of the list.

To access a single value via an index, place one number in the brackets.

In [23]:
L = [2,3,5,7,9]

In [24]:
L[0]

2

In [25]:
L[-5]

2

To access multiple values via slicing, define a starting point (inclusive) and an ending point (non-inclusive) separated by a colon in the brackets.

In [26]:
L[0:3]

[2, 3, 5]

When omitting the first index, 0 is assumed:

In [27]:
L[:3]

[2, 3, 5]

When omitting the last index, the length of the list is assumed:

In [28]:
L[-3:]

[5, 7, 9]

You can also define the step size when slicing with a double colon:

In [29]:
L[::2]

[2, 5, 9]

Indexing can be used to define/replace values, etc just as in R.

In [30]:
L[0] = 100
print(L)

[100, 3, 5, 7, 9]


In [31]:
L[1:3] = [55,56]
print(L)

[100, 55, 56, 7, 9]


## Tuples
**Tuples** are *ordered* and *immutable* collections of data, defined by parentheses (`()`), or by no enclosing characters.

In [33]:
t = (1,2,3)
print(t)

(1, 2, 3)


Tuples can be sliced or indexed just as lists are. The distinguishing feature of tuples is that they **cannot be modified** after being created via assignment, etc.