数据类型和数据结构
=============================

这是什么类型？
----------------

Python知道不同的数据类型。要查找变量的类型，可以使用`type（）`函数：

In [1]:
a = 45
type(a)

int

In [2]:
b = 'This is a string'
type(b)

str

In [3]:
c = 2 + 1j
type(c)

complex

In [4]:
d = [1, 3, 56]
type(d)

list

数字
-------

##### 更多的信息

- 数字的非正式介绍。 [Python教程，第3.1.1节](http://docs.python.org/tutorial/introduction.html#using-python-as-a-calculator)

- Python库参考：正式的数字类型介绍<http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex>的正式概述

- Think Python，[第2.1节](http://www.greenteapress.com/thinkpython/html/book003.html)

内置的数字类型是整数和浮点数（请参见[浮点数]（#Floating-Point-numbers））和复数浮点数（[complex number]（#Complex-numbers））。

### 整数

我们已经在[第2章](02-powerful-calculator.ipynb)中看到了整数的使用。注意整数除法问题（[整数除法](02-powerful-calculator.ipynb＃Integer-division)）。

如果需要将包含整数的字符串转换为整数，可以使用`int( )`函数：

In [5]:
a = '34'       # a is a string containing the characters 3 and 4
x = int(a)     # x is in integer number

函数`int()`还将浮点数转换为整数：

In [6]:
int(7.0)

7

In [7]:
int(7.9)

7

注意，`int`将截断浮点数的任何非整数部分。要将浮点数“舍入”为整数，请使用“ round()”函数：

In [8]:
round(7.9)

8

### 整数的极限

Python 3中的整数是无限的。随着数字的增加，Python将自动根据需要分配更多的内存。这意味着我们无需特殊步骤即可计算非常大的数字。

In [9]:
35**42

70934557307860443711736098025989133248003781773149967193603515625

在许多其他编程语言中，例如C和FORTRAN，整数是固定大小的——最常见的是4个字节，允许$2^{32}$个不同的值——但具有多种不同大小的数据类型可用。对于适合这些限制的数字，计算可能会更快，但是您可能需要检查数字是否超出限制。计算超出限制的数字称为*整数溢出*，可能会产生奇怪的结果。

即使在Python中，使用numpy时也需要注意这一点（请参阅[第14章](14-numpy.ipynb)）。 Numpy使用固定大小的整数，因为它会将许多整数存储在一起，并且需要高效地进行计算。 [Numpy数据类型](http://docs.scipy.org/doc/numpy/user/basics.types.html)包括一系列以其大小命名的整数类型，例如int16是一个16位整数，可能的值的个数为$2^{16}$。

整数类型也可以是*signed*或*unsigned*。有符号整数允许使用正或负值，无符号整数仅允许使用正值。例如：

* uint16（无符号）的范围从0到$2^{16}-1$
* int16（带符号）的范围从$-2^{15}$到$2^{15}-1$

### 浮点数

可以使用`float()`命令将包含浮点数的字符串转换为浮点数：

In [10]:
a = '35.342'
b = float(a)
b

35.342

In [11]:
type(b)

float

### 复数

Python（跟Fortran和Matlab类似）具有内置的复数。以下是如何使用复数的例子：

In [12]:
x = 1 + 3j
x

(1+3j)

In [13]:
abs(x)               # 计算绝对值

3.1622776601683795

In [14]:
x.imag

3.0

In [15]:
x.real

1.0

In [16]:
x * x

(-8+6j)

In [17]:
x * x.conjugate()

(10+0j)

In [18]:
3 * x

(3+9j)

请注意，如果您想执行更复杂的操作（例如取平方根等），则必须使用`cmath`模块（Complex MATHematics）：

In [19]:
import cmath
cmath.sqrt(x)

(1.442615274452683+1.0397782600555705j)

### 适用于所有数字类型的函数

函数abs()返回数字的绝对值（也称为模数）：

In [20]:
a = -45.463
abs(a)

45.463

注意，`abs()`也适用于复数（请参见上文）。

序列
---------

字符串、列表和元组是一种*序列*。它们可以用相同的方式*索引*和*切片*。

元组和字符串是“不可变的”（这基本上意味着我们不能更改元组中的单个元素，也不能更改字符串中的单个字符），而列表是“可变的”（*例如*，我们可以更改列表中的元素。）

序列共享以下操作

* `a[i]` 返回`a`的第i个元素
* `a[i：j]` 返回元素i直到j-1
* `len(a)` 按返回元素的个数
* `min（a）` 返回最小值
* `max(a)` 返回最大值
* `x in a` 如果x是a中的元素，则a中的x返回True。
* `a + b` 连接`a`和`b`
* `n * a` 创建序列`a`的`n`个副本

### 序列类型1：字符串

##### 更多的信息

- 字符串简介，[Python教程3.1.2](http://docs.python.org/tutorial/introduction.html#strings)

字符串是（不可变的）字符序列。可以使用单引号定义一个字符串：

In [21]:
a = 'Hello World'

双引号：

In [22]:
a = "Hello World"

或任何一种的三重引号

In [23]:
a = """Hello World"""
a = '''Hello World'''

字符串的类型为`str`，空字符串由`""`给出：

In [24]:
a = "Hello World"
type(a)

str

In [25]:
b = ""
type(b)

str

In [26]:
type("Hello World")

str

In [27]:
type("")

str

字符串中的字符数（即*length*）可以使用`len()`函数获得：

In [28]:
a = "Hello Moon"
len(a)

10

In [29]:
a = 'test'
len(a)

4

In [30]:
len('another test')

12

您可以使用`+`运算符组合（“连接”）两个字符串：

In [31]:
'Hello ' + 'World'

'Hello World'

字符串有许多有用的方法，例如`upper()`以大写形式返回字符串：

In [32]:
a = "This is a test sentence."
a.upper()

'THIS IS A TEST SENTENCE.'

可用的字符串函数列表可以在Python参考文档中找到。如果有Python提示，则应使用`dir`和`help`函数来检索此信息，*例如*`dir()`提供函数列表，`help()`可用于学习关于每一种函数。

一个特别有用的函数是`split()`，它将一个字符串转换为一个字符串列表：

In [33]:
a = "This is a test sentence."
a.split()

['This', 'is', 'a', 'test', 'sentence.']

`split()`函数将在找到*空格*的地方将字符串进行分隔。空格可以是打印成空格的任何字符，例如一个空格、多个空格或一个制表符。

通过将分隔符作为参数传递给`split()`函数，把字符串分成不同的部分。例如，假设我们要获取每一个完整句子的列表：

In [34]:
a = "The dog is hungry. The cat is bored. The snake is awake."
a.split(".")

['The dog is hungry', ' The cat is bored', ' The snake is awake', '']

与`split`相反的字符串函数是`join`，可以按下面这种方法使用：

In [35]:
a = "The dog is hungry. The cat is bored. The snake is awake."
s = a.split('.')
s

['The dog is hungry', ' The cat is bored', ' The snake is awake', '']

In [36]:
".".join(s)

'The dog is hungry. The cat is bored. The snake is awake.'

In [37]:
" STOP".join(s)

'The dog is hungry STOP The cat is bored STOP The snake is awake STOP'

### 序列类型2：列表

##### 更多的信息

- 列表简介，[Python教程，第3.1.4节](http://docs.python.org/tutorial/introduction.html#lists)

列表是一个对象序列。对象可以是任何类型，例如整数：

In [38]:
a = [34, 12, 54]

或者字符串：

In [39]:
a = ['dog', 'cat', 'mouse']

一个空列表由`[]`表示：

In [40]:
a = []

类型是`list`：

In [41]:
type(a)

list

In [42]:
type([])

list

与字符串一样，可以使用len()函数获得列表中元素的个数：

In [43]:
a = ['dog', 'cat', 'mouse']
len(a)

3

也可以在同一列表中*混合*不同类型：

In [44]:
a = [123, 'duck', -42, 17, 0, 'elephant']

在Python中，列表是一个对象，因此列表可能包含其他列表（因为列表保留了对象序列）：

In [45]:
a = [1, 4, 56, [5, 3, 1], 300, 400]

您可以使用“ +”运算符组合（“连接”）两个列表：

In [46]:
[3, 4, 5] + [34, 35, 100]

[3, 4, 5, 34, 35, 100]

或者，您可以使用`append()`函数将一个对象添加到列表的末尾：

In [47]:
a = [34, 56, 23]
a.append(42)
a

[34, 56, 23, 42]

您可以把一个对象作为参数传递给`remove()`函数，通过`remove()`函数把列表中的这个对象删除。
例如：

In [48]:
a = [34, 56, 23, 42]
a.remove(56)
a

[34, 23, 42]

#### range()函数

有一种特殊的列表是经常被使用的（通常与`for-loops`一起使用），因此存在一个用于生成该列表的函数：`range(n)`命令生成从0开始一直到n的整数序列*但不是包括n* 。这里有一些例子：

In [49]:
list(range(3))

[0, 1, 2]

In [50]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

这个函数通常与for循环一起使用。例如，要打印数字0<sup>2</sup>,1<sup>2</sup>,2<sup>2</sup>,3<sup>2</sup>,…,10<sup>2</sup>，可以使用以下程序：

In [51]:
for i in range(11):
    print(i ** 2)

0
1
4
9
16
25
36
49
64
81
100


range命令对于整数序列的产生采用采选参数的方法，序列的开头（开始）采用一个可选参数，对于步长采用另一可选参数。这通常写为`range（[start]，stop，[step]）`，其中方括号（start和step）中的参数是可选的。这里有几个例子：

In [52]:
list(range(3, 10))            # start=3

[3, 4, 5, 6, 7, 8, 9]

In [53]:
list(range(3, 10, 2))         # start=3, step=2

[3, 5, 7, 9]

In [54]:
list(range(10, 0, -1))        # start=10,step=-1

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

我们为什么需要调用函数`list(range())`？

在Python 3中，`range()`根据需要生成数字。当在for循环中使用`range()`时，这样做会更有效，因为它不会占用数字列表来占用内存。将其传递给list()会强制其生成所有数字，因此我们可以看到它的作用。

为了在Python 2中获得相同的有效行为，请使用`xrange()`而不是`range()`。

### 序列类型3：元组

*元组*是一种（不可变的）序列对象。元组的行为与列表非常相似，但它们不能被修改（即是不可变的）。

例如，序列中的对象可以是任何类型：

In [55]:
a = (12, 13, 'dog')
a

(12, 13, 'dog')

In [56]:
a[0]

12

定义一个元组不是一定需要圆括号的：仅用逗号分隔的一系列对象就足以定义一个元组：

In [57]:
a = 100, 200, 'duck'
a

(100, 200, 'duck')

尽管在实际的使用中，最好使用圆括号来定义元组，因为这样可以清楚的表明这是一个元组的定义。

元组也可以用于对两个变量进行赋值：

In [58]:
x, y = 10, 20
x

10

In [59]:
y

20

这种用法可以用于在一行代码内*交换*两个不同的对象，例如

In [60]:
x = 1
y = 2
x, y = y, x
x

2

In [61]:
y

1

空元组由`()`给出

In [62]:
t = ()
len(t)

0

In [63]:
type(t)

tuple

对于只包含一个值的元组，一开始看起来可能有些奇怪：

In [64]:
t = (42,)
type(t)

tuple

In [65]:
len(t)

1

额外的逗号是用来区分（42，）与（42），在后一种情况下，括号将被理解为定义运算符优先级：（42）简化为`42`，这只是一个数字：

In [66]:
t = (42)
type(t)

int

这个例子表明了元组的不变性：

In [67]:
a = (12, 13, 'dog')
a[0]

12

In [68]:
# NBVAL_RAISES_EXCEPTION
a[0] = 1

TypeError: 'tuple' object does not support item assignment

不变性是元组和列表（后者是可变的）之间的主要区别。当我们不想更改内容时，应使用元组。

请注意，返回多个值的Python函数以元组返回这些值（这很有意义，因为您不希望更改这些值）。

### 索引序列

##### 更多信息

-在[Python教程，第3.1.2节](http://docs.python.org/tutorial/introduction.html#strings)中介绍了字符串和索引，相关部分在引入字符串之后开始。

可以使用对象的索引和方括号（`[`和`]`）访问列表中的各个对象：

In [69]:
a = ['dog', 'cat', 'mouse']
a[0]

'dog'

In [70]:
a[1]

'cat'

In [71]:
a[2]

'mouse'

请注意Python从零开始计算索引（与C类似，但与Fortran和Matlab不同）！

Python提供了一个方便的快捷方式来检索列表中的最后一个元素：使用索引“ -1”，其中的减号表示它是列表的从后面往前计算的元素。同样，索引“ -2”将返回倒数第二个元素：

In [72]:
a = ['dog', 'cat', 'mouse']
a[-1]

'mouse'

In [73]:
a[-2]

'cat'

如果您愿意，可以将索引`a [-1]`当作`a[len(a) - 1]`的简写。

请记住字符串（如列表）也是序列类型，可以用相同的方式索引：

In [74]:
a = "Hello World!" 
a[0]

'H'

In [75]:
a[1]

'e'

In [76]:
a[10]

'd'

In [77]:
a[-1]

'!'

In [78]:
a[-2]

'd'

### 切片顺序

##### 更多信息

- [Python教程，第3.1.2节](http://docs.python.org/tutorial/introduction.html#strings)可以找到字符串，索引和切片的简介

序列中的*切片*可用于检索多个元素。例如：

In [79]:
a = "Hello World!"
a[0:3]

'Hel'

By writing `a[0:3]` we request the first 3 elements starting from element 0. Similarly:

In [80]:
a[1:4]

'ell'

In [81]:
a[0:2]

'He'

In [82]:
a[0:6]

'Hello '

We can use negative indices to refer to the end of the sequence:

In [83]:
a[0:-1]

'Hello World'

It is also possible to leave out the start or the end index and this will return all elements up to the beginning or the end of the sequence. Here are some examples to make this clearer:

In [84]:
a = "Hello World!"
a[:5]

'Hello'

In [85]:
a[5:]

' World!'

In [86]:
a[-2:]

'd!'

In [87]:
a[:]

'Hello World!'

Note that `a[:]` will generate a *copy* of `a`. The use of indices in slicing is by some people experienced as counter intuitive. If you feel uncomfortable with slicing, have a look at this quotation from the [Python tutorial (section 3.1.2)](http://docs.python.org/tutorial/introduction.html#strings):

> The best way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of 5 characters has index 5, for example:
>
>      +---+---+---+---+---+ 
>      | H | e | l | l | o |
>      +---+---+---+---+---+ 
>      0   1   2   3   4   5   <-- use for SLICING
>     -5  -4  -3  -2  -1       <-- use for SLICING 
>                                      from the end
>
> The first row of numbers gives the position of the slicing indices 0...5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labelled i and j, respectively.

So the important statement is that for *slicing* we should think of indices pointing between characters.

For *indexing* it is better to think of the indices referring to characters. Here is a little graph summarising these rules:

       0   1   2   3   4    <-- use for INDEXING 
      -5  -4  -3  -2  -1    <-- use for INDEXING 
     +---+---+---+---+---+          from the end
     | H | e | l | l | o |
     +---+---+---+---+---+ 
     0   1   2   3   4   5  <-- use for SLICING
    -5  -4  -3  -2  -1      <-- use for SLICING 
                             from the end

If you are not sure what the right index is, it is always a good technique to play around with a small example at the Python prompt to test things before or while you write your program.

### Dictionaries

Dictionaries are also called “associative arrays” and “hash tables”. Dictionaries are *unordered* sets of *key-value pairs*.

An empty dictionary can be created using curly braces:

In [88]:
d = {}

Keyword-value pairs can be added like this:

In [89]:
d['today'] = '22 deg C'    # 'today' is the keyword

In [90]:
d['yesterday'] = '19 deg C'

`d.keys()` returns a list of all keys:

In [91]:
d.keys()

dict_keys(['today', 'yesterday'])

We can retrieve values by using the keyword as the index:

In [92]:
d['today']

'22 deg C'

Other ways of populating a dictionary if the data is known at creation time are:

In [93]:
d2 = {2:4, 3:9, 4:16, 5:25}
d2

{2: 4, 3: 9, 4: 16, 5: 25}

In [94]:
d3 = dict(a=1, b=2, c=3)
d3

{'a': 1, 'b': 2, 'c': 3}

The function `dict()` creates an empty dictionary.

Other useful dictionary methods include `values()`, `items()` and `get()`. You can use `in` to check for the presence of values.

In [95]:
d.values()

dict_values(['22 deg C', '19 deg C'])

In [96]:
d.items()

dict_items([('today', '22 deg C'), ('yesterday', '19 deg C')])

In [97]:
d.get('today','unknown')

'22 deg C'

In [98]:
d.get('tomorrow','unknown')

'unknown'

In [99]:
'today' in d

True

In [100]:
'tomorrow' in d

False

The method `get(key,default)` will provide the value for a given `key` if that key exists, otherwise it will return the `default` object.

Here is a more complex example:

In [101]:
# NBVAL_IGNORE_OUTPUT
order = {}        # create empty dictionary

#add orders as they come in
order['Peter'] = 'Pint of bitter'
order['Paul'] = 'Half pint of Hoegarden'
order['Mary'] = 'Gin Tonic'

#deliver order at bar
for person in order.keys():
    print(person, "requests", order[person])

Peter requests Pint of bitter
Paul requests Half pint of Hoegarden
Mary requests Gin Tonic


Some more technicalities:

-   The keyword can be any (immutable) Python object. This includes:

    -   numbers

    -   strings

    -   tuples.

-   dictionaries are very fast in retrieving values (when given the key)

An other example to demonstrate an advantage of using dictionaries over pairs of lists:

In [102]:
# NBVAL_IGNORE_OUTPUT
dic = {}                        #create empty dictionary

dic["Hans"]   = "room 1033"     #fill dictionary
dic["Andy C"] = "room 1031"     #"Andy C" is key
dic["Ken"]    = "room 1027"     #"room 1027" is value

for key in dic.keys():
    print(key, "works in", dic[key])

Hans works in room 1033
Andy C works in room 1031
Ken works in room 1027


Without dictionary:

In [103]:
people = ["Hans","Andy C","Ken"]
rooms  = ["room 1033","room 1031","room 1027"]

#possible inconsistency here since we have two lists
if not len( people ) == len( rooms ):
    raise RuntimeError("people and rooms differ in length")

for i in range( len( rooms ) ):
    print(people[i],"works in",rooms[i])

Hans works in room 1033
Andy C works in room 1031
Ken works in room 1027


Passing arguments to functions
------------------------------

This section contains some more advanced ideas and makes use of concepts that are only later introduced in this text. The section may be more easily accessible at a later stage.

When objects are passed to a function, Python always passes (the value of) the reference to the object to the function. Effectively this is calling a function by reference, although one could refer to it as calling by value (of the reference).

We review argument passing by value and reference before discussing the situation in Python in more detail.

### Call by value

One might expect that if we pass an object by value to a function, that modifications of that value inside the function will not affect the object (because we don’t pass the object itself, but only its value, which is a copy). Here is an example of this behaviour (in C):

```c
#include <stdio.h>

void pass_by_value(int m) {
  printf("in pass_by_value: received m=%d\n",m);
  m=42;
  printf("in pass_by_value: changed to m=%d\n",m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_value(global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}
```

together with the corresponding output:

    global_m=1
    in pass_by_value: received m=1
    in pass_by_value: changed to m=42
    global_m=1


The value `1` of the global variable `global_m` is not modified when the function `pass_by_value` changes its input argument to 42.

### Call by reference

Calling a function by reference, on the other hand, means that the object given to a function is a reference to the object. This means that the function will see the same object as in the calling code (because they are referencing the same object: we can think of the reference as a pointer to the place in memory where the object is located). Any changes acting on the object inside the function, will then be visible in the object at the calling level (because the function does actually operate on the same object, not a copy of it).

Here is one example showing this using pointers in C:

```c
#include <stdio.h>

void pass_by_reference(int *m) {
  printf("in pass_by_reference: received m=%d\n",*m);
  *m=42;
  printf("in pass_by_reference: changed to m=%d\n",*m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_reference(&global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}
```

together with the corresponding output:

    global_m=1
    in pass_by_reference: received m=1
    in pass_by_reference: changed to m=42
    global_m=42

C++ provides the ability to pass arguments as references by adding an ampersand in front of the argument name in the function definition:

```cpp
#include <stdio.h>

void pass_by_reference(int &m) {
  printf("in pass_by_reference: received m=%d\n",m);
  m=42;
  printf("in pass_by_reference: changed to m=%d\n",m);
}

int main(void) {
  int global_m = 1;
  printf("global_m=%d\n",global_m);
  pass_by_reference(global_m);
  printf("global_m=%d\n",global_m);
  return 0;
}
```

together with the corresponding output:

    global_m=1
    in pass_by_reference: received m=1
    in pass_by_reference: changed to m=42
    global_m=42

### Argument passing in Python

In Python, objects are passed as the value of a reference (think pointer) to the object. Depending on the way the reference is used in the function and depending on the type of object it references, this can result in pass-by-reference behaviour (where any changes to the object received as a function argument, are immediately reflected in the calling level).

Here are three examples to discuss this. We start by passing a list to a function which iterates through all elements in the sequence and doubles the value of each element:

In [104]:
def double_the_values(l):
    print("in double_the_values: l = %s" % l)
    for i in range(len(l)):
        l[i] = l[i] * 2
    print("in double_the_values: changed l to l = %s" % l)

l_global = [0, 1, 2, 3, 10]
print("In main: s=%s" % l_global)
double_the_values(l_global)
print("In main: s=%s" % l_global)

In main: s=[0, 1, 2, 3, 10]
in double_the_values: l = [0, 1, 2, 3, 10]
in double_the_values: changed l to l = [0, 2, 4, 6, 20]
In main: s=[0, 2, 4, 6, 20]


The variable `l` is a reference to the list object. The line `l[i] = l[i] * 2` first evaluates the right-hand side and reads the element with index `i`, then multiplies this by two. A reference to this new object is then stored in the list object `l` at position with index `i`. We have thus modified the list object, that is referenced through `l`.

The reference to the list object does never change: the line `l[i] = l[i] * 2` changes the elements `l[i]` of the list `l` but never changes the reference `l` for the list. Thus both the function and calling level are operating on the same object through the references `l` and `global_l`, respectively.

In contrast, here is an example where do not modify the elements of the list within the function: which produces this output:

In [105]:
def double_the_list(l):
    print("in double_the_list: l = %s" % l)
    l = l + l
    print("in double_the_list: changed l to l = %s" % l)

l_global = "Hello"
print("In main: l=%s" % l_global)
double_the_list(l_global)
print("In main: l=%s" % l_global)

In main: l=Hello
in double_the_list: l = Hello
in double_the_list: changed l to l = HelloHello
In main: l=Hello


What happens here is that during the evaluation of `l = l + l` a new object is created that holds `l + l`, and that we then bind the name `l` to it. In the process, we lose the references to the list object `l` that was given to the function (and thus we do not change the list object given to the function).

Finally, let’s look at which produces this output:

In [106]:
def double_the_value(l):
    print("in double_the_value: l = %s" % l)
    l = 2 * l
    print("in double_the_values: changed l to l = %s" % l)

l_global = 42
print("In main: s=%s" % l_global)
double_the_value(l_global)
print("In main: s=%s" % l_global)

In main: s=42
in double_the_value: l = 42
in double_the_values: changed l to l = 84
In main: s=42


In this example, we also double the value (from 42 to 84) within the function. However, when we bind the object 84 to the python name `l` (that is the line `l = l * 2`) we have created a new object (84), and we bind the new object to `l`. In the process, we lose the reference to the object 42 within the function. This does not affect the object 42 itself, nor the reference `l_global` to it.

In summary, Python’s behaviour of passing arguments to a function may appear to vary (if we view it from the pass by value versus pass by reference point of view). However, it is always call by value, where the value is a reference to the object in question, and the behaviour can be explained through the same reasoning in every case.

### Performance considerations

Call by value function calls require copying of the value before it is passed to the function. From a performance point of view (both execution time and memory requirements), this can be an expensive process if the value is large. (Imagine the value is a `numpy.array` object which could be several Megabytes or Gigabytes in size.)

One generally prefers call by reference for large data objects as in this case only a pointer to the data objects is passed, independent of the actual size of the object, and thus this is generally faster than call-by-value.

Python’s approach of (effectively) calling by reference is thus efficient. However, we need to be careful that our function do not modify the data they have been given where this is undesired.

### Inadvertent modification of data

Generally, a function should not modify the data given as input to it.

For example, the following code demonstrates the attempt to determine the maximum value of a list, and – inadvertently – modifies the list in the process:

In [107]:
def mymax(s):  # demonstrating side effect
    if len(s) == 0:
        raise ValueError('mymax() arg is an empty sequence')
    elif len(s) == 1:
        return s[0]
    else:
        for i in range(1, len(s)):
            if s[i] < s[i - 1]:
                s[i] = s[i - 1]
        return s[len(s) - 1]

s = [-45, 3, 6, 2, -1]
print("in main before caling mymax(s): s=%s" % s)
print("mymax(s)=%s" % mymax(s))
print("in main after calling mymax(s): s=%s" % s)

in main before caling mymax(s): s=[-45, 3, 6, 2, -1]
mymax(s)=6
in main after calling mymax(s): s=[-45, 3, 6, 6, 6]


The user of the `mymax()` function would not expect that the input argument is modified when the function executes. We should generally avoid this. There are several ways to find better solutions to the given problem:

-   In this particular case, we could use the Python in-built function `max()` to obtain the maximum value of a sequence.

-   If we felt we need to stick to storing temporary values inside the list \[this is actually not necessary\], we could create a copy of the incoming list `s` first, and then proceed with the algorithm (see [below](#Copying-objects) on Copying objects).

-   Use another algorithm which uses an extra temporary variable rather than abusing the list for this. For example:

-   We could pass a tuple (instead of a list) to the function: a tuple is *immutable* and can thus never be modified (this would result in an exception being raised when the function tries to write to elements in the tuple).

### Copying objects

Python provides the `id()` function which returns an integer number that is unique for each object. (In the current CPython implementation, this is the memory address.) We can use this to identify whether two objects are the same.

To copy a sequence object (including lists), we can slice it, *i.e.* if `a` is a list, then `a[:]` will return a copy of `a`. Here is a demonstration:

In [108]:
a = list(range(10))
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [109]:
b = a
b[0] = 42
a              # changing b changes a

[42, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [110]:
# NBVAL_IGNORE_OUTPUT
id(a)

4446565320

In [111]:
# NBVAL_IGNORE_OUTPUT
id(b)

4446565320

In [112]:
# NBVAL_IGNORE_OUTPUT
c = a[:] 
id(c)          # c is a different object

4444418824

In [113]:
c[0] = 100       
a              # changing c does not affect a

[42, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python’s standard library provides the `copy` module, which provides copy functions that can be used to create copies of objects. We could have used `import copy; c = copy.deepcopy(a)` instead of `c = a[:]`.

Equality and Identity/Sameness
------------------------------

A related question concerns the equality of objects.

### Equality

The operators `<`, `>`, `==`, `>=`, `<=`, and `!=` compare the *values* of two objects. The objects need not have the same type. For example:

In [114]:
a = 1.0; b = 1
type(a)

float

In [115]:
type(b)

int

In [116]:
a == b

True

So the `==` operator checks whether the values of two objects are equal.

### Identity / Sameness

To see check whether two objects `a` and `b` are the same (i.e. `a` and `b` are references to the same place in memory), we can use the `is` operator (continued from example above):

In [117]:
a is b

False

Of course they are different here, as they are not of the same type.

We can also ask the `id` function which, according to the documentation string in Python 2.7 “*Returns the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it’s the object’s memory address.)*”

In [118]:
# NBVAL_IGNORE_OUTPUT
id(a)

4446045984

In [119]:
# NBVAL_IGNORE_OUTPUT
id(b)

4406101840

which shows that `a` and `b` are stored in different places in memory.

### Example: Equality and identity

We close with an example involving lists:

In [120]:
x = [0, 1, 2]
y = x
x == y

True

In [121]:
x is y

True

In [122]:
# NBVAL_IGNORE_OUTPUT
id(x)

4445528520

In [123]:
# NBVAL_IGNORE_OUTPUT
id(y)

4445528520

Here, `x` and `y` are references to the same piece of memory, they are thus identical and the `is` operator confirms this. The important point to remember is that line 2 (`y=x`) creates a new reference `y` to the same list object that `x` is a reference for.

Accordingly, we can change elements of `x`, and `y` will change simultaneously as both `x` and `y` refer to the same object:

In [124]:
x

[0, 1, 2]

In [125]:
y

[0, 1, 2]

In [126]:
x is y

True

In [127]:
x[0] = 100
y

[100, 1, 2]

In [128]:
x

[100, 1, 2]

In contrast, if we use `z=x[:]` (instead of `z=x`) to create a new name `z`, then the slicing operation `x[:]` will actually create a copy of the list `x`, and the new reference `z` will point to the copy. The *value* of `x` and `z` is equal, but `x` and `z` are not the same object (they are not identical):

In [129]:
x

[100, 1, 2]

In [130]:
z = x[:]            # create copy of x before assigning to z
z == x              # same value

True

In [131]:
z is x              # are not the same object

False

In [132]:
# NBVAL_IGNORE_OUTPUT
id(z)               # confirm by looking at ids

4446678088

In [133]:
# NBVAL_IGNORE_OUTPUT
id(x)

4445528520

In [134]:
x

[100, 1, 2]

In [135]:
z

[100, 1, 2]

Consequently, we can change `x` without changing `z`, for example (continued)

In [136]:
x[0] = 42
x

[42, 1, 2]

In [137]:
z

[100, 1, 2]