<img src="./img/Dolan.png" width="180px" align="right">

# **Lesson 6: Strings**
_The first sequential data type_

# **第六课：字符串**
_第一个顺序数据类型_

## **Learning Objectives**

### Theory / Be able to explain ...
- The Python Type Hierarchy
- The three kinds of string literals
- The concept of immutability
- Basics of sequential data types
- Indexing, slicing, and traversal
- String objects and methods

### Skills / Know how to  ...
- Create string literals from quoted text or type conversion
- Use indexing and slicing to retrieve substrings
- Use the `in` and `%` operators

**What follows is adapted from Chapter 6 of the _Python For Everybody_ book. If you have not read it, then please do so before continuing on.**

---

## **学习目标**

### 学会解释一下理论 ...
- Python 数据类型层次结构
- 三种字符串字面量
- 不可变性的概念
- 顺序数据类型的基本要素
- 索引、切片和遍历
- 字符串对象和方法

### 掌握一下技能 ...
- 通过引号或类型转换创建字符串字面量
- 使用索引和切片从字符串中检索子字符串
- 使用 `in` 和  `%` 运算符

**本堂课改编自 _Python For Everybody_ 第六章。开始学习前，可以先阅读教材.**

---

## **The Type Hierarchy**
> "There are 10 kinds of people in this world, those who know binary and those who don't." -- anonymous nerd lore

In this world of Big Data where computational power and data storage capacity have become commodities measured out like sugar or produce, it is easy for non-programmers to forget that it hasn't always been this way, that actual people had to design and build the technology up over many decades. The modern world is by design, not accident, or so we like to believe. 

In the very beginning, all data and programs were **encoded** as **binary**: strings of zeroes and ones called **bits** where every few bits represented something else. We still use binary (at least in digital computers), of course, but not many people speak it natively anymore. It's just too computer-specific and hard for us humans to process. Instead, we use an ever evolving repertoire of **languages** and **data structures** to express ourselves and keep our data safe and useful. 

After the **bit** the next standard data type was the **byte**, composed of 8 bits with 256 possible values, which is just enough to encode each key on a computer keyboard. Thus, [**ASCII**](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) was born. ASCII used seven of the bits (i.e., 128 possible values) for the keyboard codes, with the last bit available for error checking. Each character was assigned a number (e.g., "A" = binary `1000001` = decimal 65). Upper case letters were separate from lower case letters ("a" = `1100001` = 97) and every "control" character (e.g. tab, line ending, end of file, etc.) has an ASCII encoding as well. 

When strung together, **bytes** could represent lots of things as **strings**, **integers**, and **floats**, which you will recognize as native Python data types. From there it was natural to think of more elaborate data structures like complex numbers or lists. These in turn led to decimal and fractional numbers, and file, array, list, tuple, and dictionary collections. Today the Python standard library includes dozens of data types, each with its own unique properties and uses. And, if none of these quite do what we want, we can always create our own. 

This lesson starts our explorations into the world of data structures with a review of the **string data type**, which in practice is the most commonly used data structure of all. While we analytics geeks would likely prefer to have quantitative data all the time, if a human is involved in its collection or communication then it almost certainly will start as text strings. 

## **数据类型层次结构**
> "世界上只有“10”种人，懂二进制的和不懂二进制的." -- 佚名

在大数据的世界里，算力和数据存储能力已经成为像糖果或农产品一样被衡量的商品。除了程序员，其他人很容易忘记事实并非如此。其实几十年来，人们必须主动设计、构建技术，才能优化算力和数据存储能力。现代世界是通过主动设计构建的，而非偶然形成。或者说，我们希望世界应当如此. 

最初，所有的数据和程序都以**二进制**的形式**编码**：用字符串“0”和“1”构成**比特（bit）**，用寥寥数个比特代表其他信息。如今我们仍使用二进制（至少传统的数字计算机是如此）。当然，没什么人会直接用二进制生活。这是专属于计算机的语言，对人类来说较难处理。我们会使用不断发展的**语言**和**数据结构**表达自己，确保数据安全可靠、且能发挥价值. 

**比特**之后的标准数据类型叫**字节（byte）**，由8个比特组成，可以有256种可能的值，足够为键盘上的每个按键编码，由此诞生了美国信息交换标准代码ASCII（https://zh.wikipedia.org/wiki/ASCII）。ASCII包含7个比特，共有128种可能的值，最后一个值用于查错。我们用这种方式为键盘按键编码。ASCII下，每个字母都有对应的二进制数（如字母“A”= 二进制的 `1000001` = 十进制的 65），大写字母和小写字母对应的数字不同（如小写字母“a”= 二进制的 `1100001` = 十进制的97），每个控制符（制表符、行结束符、文件结束符等）也有对应的ASCII编码. 

**字节**组合在一起时，可以表示**字符串、整数、浮点数**等诸多信息，上述三种数据类型也是Python原生的数据类型。Python自然也有复数、列表等更复杂的数据结构。之后又发展出了十进制数和分数，以及文件、数组、列表、元组、字典等集合。如今，Python标准数据库包含了数十种数据类型，每种都有其独特的属性和用途。而且，如果这些数据类型都不能完全满足我们的需求，我们也总能创建自己的数据类型. 

从本堂课开始，我们将从 **“字符串”** 这个最常用的数据类型开始，探索数据结构的世界。虽然数据分析师可能更喜欢使用定量数据，但如果涉及人类的话语集和交流内容，数据可能会以文本字符串的形式开始. 

---
## **String: The (Second-Most) Universal Data Type**
Many people are surprised to learn that in its original form, the entirety of the world wide web was text. Even the images were encoded into text! The web browser's job was to take all the strings of text coming over the wire -- there was no wifi back then -- and display web pages that people could view and interact with. Why text? Because just about anything can be encoded as text. It also allowed people to hand-craft web pages with HTML. While the web has progressed a lot since then, most content is still ... text. 

In Python all text has the `string` data type. Most of what follows is a somewhat terse review of things from the Py4E book. Please start there and then read through the notes. 

---
## **字符串：（第二）常用的数据类型**
很多人会惊讶地发现，最初的万维网都是文本。当时没有wifi，连图片都被编码成了文本！浏览器接收所有来自网络的文本字符串，形成可供查看和交互的显示页面。那为什么要把信息编码成文本呢？只是因为一切皆可编码成文本。所以那时的人们可以用超文本标记语言 HTML 自制网页。虽然此后的网页已经有了很大的进步，但大部分内容仍是……文本. 

在Python中，所有文本的数据类型都是字符串（`string`）。接下来的内容大多数是对课本 Python For Everybody 的梳理回顾。希望大家先读书，再读本课内容. 

---
## **String Literals**
A **string literal** is a specific sequence of characters. Stings come about in one of two ways:
- As quoted text (literals) like "Every good boy does fine"
- Conversion from data of another type

### **Quoted Text**
Quoted text in Python comes in three varieties.

**Single quotes** are used for short strings like names: 

---
## **字符串字面量**
**字符串字面量**是一系列特殊的字符序列。字符串可以通过以下两种方式产生:
- 用英文输入法的引号引起文本（字面量），比如 “宫商角徵羽”
- 通过类型转换把另一种数据类型变成字符串

### **用引号创建文本**
Python中的引号有三类.

**单引号**用在名字、物品名等短字符串上: 

In [None]:
'Apple'

'Apple'

**Double quotes** are used when the quoted text might include single quotes/apostrophes:

**双引号**用在内部有单引号或撇号的文本上:

In [None]:
"Apple's market cap is $1.5 Trillion"

"Apple's market cap is $1.5 Trillion"

**Triple quotes** are used when the text spans multiple lines (and we want to keep it that way):

**三引号**用于必须跨行的文本:


In [None]:
'''
Apple's market cap is ...

1.5 TRILLION DOLLARS!
'''

"\nApple's market cap is ...\n\n1.5 TRILLION DOLLARS!\n"

The `\n`s in there represent line endings (a.k.a., "newlines").

此处的`\n`是行结束符，即换行符.

### **Conversions from other data types**
Just about anything can be converted to a string using the `str()` function. 

### **通过类型转换创建文本**
通过 `str()` 函数，任何数据都能转换为文本. 

In [None]:
str(15)

'15'

In [None]:
str(type(15))

"<class 'int'>"

There are other ways, of course, for data types we haven't explored yet.

当然还有其他没讲到的方法，也可以用于创建文本.

### **Special Characters**
Some character codes from the old days of teletypes still exist in Python and other languages. Most represented either whitespace or unprintable behaviors (e.g., ringing a bell with `\a` or going backwards one space with `\b`). They were encoded with a backslash `\` character plus a letter or number. Of these only a few so-called "escape-codes" still have any meaning:
- `\t`: advance to the next tabstop ("tab")
- `\n`: ascii line feed (line end)
- `\\`: backslash
- `\'`: single quote character
- `\"`: double quote character

There are other codes (also with backslashes) for things like unicode characters. You will find them explained in the Python docs. 

### **特殊字符**
电报码中的部分字符沿用到了Python及其他语言，大多用于表示空格或其他无法在标准输出中显示的内容，用反斜杠`\`加上一个字母或数字表示。如`\a`表示响铃信号，`\b`表示退格符。其中仅有少数所谓的“转义码”仍有含义:
- `\t`: 制表符，将光标移动到下一个制表位
- `\n`: ascii下的换行符
- `\\`: 表示反斜杠字符本身
- `\'`: 单引号
- `\"`: 双引号

除了上述提到的转义码之外，还有其他用于表示使用反斜杠的代码，用于统一码（Unicode）等编码中。大家可以在Python在线说明文档中了解更多. 

### **Immutability: When everything is taken literally**

**Heads Up**: Strings and numbers are meant to be taken literally. **They cannot be changed after they are created.** We can't change the number 1 into the number 2, no matter how much we try. Similarly we can't change "Apple" into "Google" no matter how much Eric Schmidt may have wanted it to happen. 

So how are we able to do things like this?

### **不可变性：所有值都已确定**

**注意**字符串和数字都是确定的取值，**一旦创建，无法更改。** 不管怎么尝试，数字 1 都不可能变成数字 2。同样的，不管谷歌前首席执行官 Eric Schmidt 再怎么想，“苹果” 也不可能变成 “谷歌”. 

那我们要如何实现字符串或数字的修改呢?

In [None]:
x = "Apple is king of Silicon Valley" 
x = "Google is king of Silicon Valley"   # Reassignment 重新赋值
x += " ... but Apple was there first"    # Update operator 更新运算符
print(x)

Google is king of Silicon Valley ... but Apple was there first


Despite appearances, **each string is immutable**. We can operate on the strings to make new strings, but the **original strings are unchanged**. Again, we can't modify 1 to make it 2, so why would we expect to be able to do that with strings? 

除了外观，**每个字符串都是不可变的。** 我们可以对字符串进行操作，生成新的字符串，但**原始字符串仍保持不变**。就像数字 1 无法改成数字 2，字符串又怎么可能改变呢？

### **Pulse Check ...**

**1. Why do we need three different quoting mechanisms in Python?**

### **脉冲检查 ...**

**1. 为什么Python中需要三种不同的引号?**

YOUR ANSWER HERE 在此输入你的答案


> - Single quotes are the baseline. Pyhton uses these when it can.
> - Double quotes allow us to use quotes inside of quotes.
> - Triple quotes all quotes to span multiple lines

> - 单引号是最基本的字符串引号。条件允许时，Python都会使用单引号.
> - 双引号允许在引号内部使用单引号.
> - 三引号允许跨行.

**2. Why does the following code fail?**

**2. 为什么下述代码会报错?**
```python
x = "ABD"
x[2]="C"

```

YOUR ANSWER HERE 在此输入你的答案



> String are immutable. We can create new strings but can't alter existing ones, even if we use variables. 

> 因为字符串是不可变的。我们可以创建新字符串，但不能更改已经存在的字符串。这一规则也适用于变量. 

---
## **Strings as Sequences**
One of the reasons why string immutability is surprising is that **a string represents a sequence of characters**, kind of like a list. And, as anybody who has ever made a shopping list can tell you, lists are anything but immutable. We can add, delete, and reorder list items to our hearts content. But not so with strings. 

Despite this one huge difference, in just about every other way a string works pretty much like a list. Both are sequential data types, with a set of features shared by all sequential data types.  

---
## **序列类型——字符串**
字符串不可变的一大原因是：**字符串表示一系列有序的字符**，和列表有点相似。不过列表就像是购物清单，没什么不可变的。我们可以根据随心所欲地添加、划去、重新排列清单上的名目。但字符串并非如此。

除了在可变和不可变性上有较大的差异，字符串和列表在其他方面几乎完全相似。两者都是顺序数据类型，具有这种数据类型的特性es.  

### **Indexing and Slicing**
Sequences inherently have **order**. When referring to an item in the sequence, we refer to its **position** in the sequence. There is always a first item, a second item, etc.

In Python we use the `[]` operator to **index** the string (or list, tuple, etc.):

### **索引和切分**
字符本质上是**有序**的。我们引用序列中的某个元素时，会指向这个元素在序列中的**位置**。序列中始终有第一个元素、第二个元素，以此类推.

Python中，我们使用运算符`[ ]`来**索引**字符串（或列表、元组等等）:

In [None]:
'Google'[3]

'g'

Those of you who are counting will notice that 'g' is the fourth character in 'Google.' Python starts counting from 0, not 1, when indexing a sequence. 

大家如果数了上图代码中“g” 的位置，就会发现它是 “Google” 中的第四个字符。括号内是 3，其实是因为Python在索引时从 0 开始计数，而非从 1 开始：

In [None]:
'Google'[0]

'G'

We can even use negative numbers as indexes:    

我们也可以在索引时填入负数：

In [None]:
'Google'[-2]

'l'

For negative indexes it works backwards from the end, with `[-1]` representing the last character.  

To find out how many characters are in a string, use the `len()` function.

负索引从序列末尾开始计数，`[-1]` 指最后一个字符。要知道字符串中有多少个字符，可以用 `len()` 函数：

In [None]:
len('Google')

6

Sometimes we want more than one character at a time. For that use a **slice** instead of an index:

我们有时想要截取一个以上的字符。此时就需要使用**切分**技巧：

In [None]:
'Google'[2:4]

'og'

A slice `[a:b]` returns the substring starting with position `a` up to position `b-1`. So, `[2:4]` is asking for whatever is in positions 2 and 3 returned as a new string literal, which in this case is 'og'.

切分 `[a:b]` 指截取出从位置 `a` 开始、到位置 `b-1` 结束的子字符串。因此， `[2:4]` 指获取字符串中计数值为2到计数值为3的内容，并返回一个新的字符串。在 “Google” 中，`[2:4]` 切分出的新字符串为 ‘og’.

### **Traversal**
**Traversal** is what we call iteration over the items in a sequence. It is exactly what you might imagine: 

### **遍历**
**遍历**指迭代序列中的每个元素。具体例子如下: 

In [None]:
for c in "Google":
    print(c.upper())

G
O
O
G
L
E


We can do whatever we want inside the loop body except modify the string. Since a string is just an immutable sequence of characters, `for c in "Google"` iterates through the sequence "Google", one character (`c`) at a time. 

Traversal is actually a more generally applicable task that we will return to in (optional) Lesson 12 when we consider tree structured data. 

我们可以在不更改字符串本身的前提下，在循环内做任何操作。在上图案例中，`for c in “Google” `会逐个遍历序列 “Google”。(`c`) 是代表字符串中当前字符的变量，每次迭代时，(`c`) 会依次指向字符串中的每个字符.

遍历其实是一个广泛使用的功能。我们在第12课（选学）讲树形结构数据时会再次提到.

### **Pulse Check ...**
**1. What does the following code do?** You will likely want to consult the [Python docs](https://docs.python.org/3/library/functions.html#slice). Be sure to explain what the third slice argument does. (Yes, it's important and yes, you are expected to learn this on your own, [RTFM](https://en.wikipedia.org/wiki/RTFM)-style.) 

### **脉冲检查 ...**
**1. 下面这行代码的功能是什么？** 你可以查询 Python在线学习文档（https://docs.python.org/zh-cn/3/library/functions.html#slice）。试着解释第三个参数“-1”是什么意思（这里非常鼓励大家试着自己弄明白，就像http://zh.wikipedia.org/wiki/RTFM 中说到的一样）

In [None]:
"Go Stags!"[::-1]

'!sgatS oG'

YOUR ANSWER HERE 在此输入你的答案


> It works backwards from the end of the string. The third splice argument is the step size, which when negative steps backwards.

> 上图的代码指从后向前显示每一个字符。第三处参数“-1”指步长，表示每显示完上一个字符后，都从后向前移动一个字符位。

**2. Rewrite the code from question 1 so that it prints the (reversed) characters one per line.**

**2. 重写第一问的代码，让结果从后往前输出，且每行只输出一个字符.**

In [None]:
YOUR CODE HERE 在此输入你的代码

In [None]:

for c in "Go Stags!"[::-1]:
    print(c)

---
## **String Operators**
We have already seen two different **string operations**:
- **concatenating** with the `+` operator
- **appending** with the `+=` operator

The first returns the merger of two strings, one after the other. The second changes the value of the variable. 

We can also use comparison operators with strings just like with numbers:

---
## **字符串运算符**
我们已经学习了两个**字符串运算符**:
- `+`用于**串联**字符串或序列
- `+=`用于**追加**元素

`+`按顺序合并两个字符串，生成新的字符串；`+=`则直接在一个变量上追加字符串，改变了变量的值。

我们也可以像比较数字一样，比较字符串的大小:

In [None]:
'Google' < 'Apple'

False

The comparison is based on the numeric codes that correspond to each character. Since `A` comes before `G` in the alphabet, its numeric code is smaller. This also applies to lowercase and capital letters and numbers:

这里的比较是基于每个字符对应的ASCII数字编码进行的。因为 `A` 在字母表中排在 `G` 之前，所以它对应的数字编码更小。含大小写和数字的字符串也可以比较大小:

In [None]:
"A" < "a"

True

In [None]:
"21" > "100"

True

The logic is the same as lexicographic ordering (a.k.a. "alphabetizing"), with the comparison examining successive characters in each string until one character is smaller than the other or one of the character sequences has been exhausted. So, since "2" is greater than "1", "21" is greater than "100". 

> **Heads Up:** This is yet another example where data types matter. Think about how sorting string-encoded numbers (like "900" or "899.2") is different from sorting floats (like 900.0 or 899.2). Further, you can't mix strings and floats. 

和字典序排序（按字母顺序排列）的原理相同，Python会从左到右逐个比较每个字符串的大小，直到某一轮能分出大小或某一边的字符序列耗尽。因为字符串 “2” 已经大于字符串 “1”，所以 “21” 比 “100” 大.

>**注意**上述例子也可以证明数据类型的重要性。字符串 ”900” 和 ”899.2” 做比较与浮点数 900.0 和 899.2 作比较完全不同。因此，字符串和浮点数绝不能混淆.

### **The `in` Operator**

### **`in` 运算符**

The `in` operator is used to ask the question: is x in the sequence? We can use it to determine if a string is a substring of another string:

`in` 运算符用于询问：某个字符（串）x 是否在序列中？我们可以用 in 运算符检查某个字符串是否是另一个字符串的子字符串：

In [None]:
x = "Google is king of Silicon Valley ... but Apple was there first"
print("Microsoft" in x)
print("Apple" in x)

False
True


Note that the `in` keyword is not new to us. It's in every `for` loop. However, in a `for` loop it is _assigning_ the loop variable (`c`) to each item in the sequence (`x`), one item at a time. It's not asking. It's doing.    

注意 `in` 这个运算符对我们来说并不陌生，它也用于 for 循环中。然而，在 `for` 循环中，in 是将循环变量 (`c`) _分配_ 给序列 (`x`) 中的每个项，每次指向一项。`for` 循环中的 `in` 不是在询问，而是在执行操作.

### **`%` 运算符**
One of the more powerful string operators is `%`, which allows us to insert values into the middle of a string. (Technically, it generates a new string but conceptually it inserts into it.)

### **The `%` Operator**
`%` 是一个功能更强大的运算符。它允许我们将某个值插入到字符串中。严格来讲，这个过程生成了一个新的字符串，但说它把值插到字符串中，也确实没错

In [None]:
"Google is king of Silicon Valley ... but %s was there first" % 'Microsoft' 

'Google is king of Silicon Valley ... but Microsoft was there first'

The `%s` is a placeholder for where to insert the string 'Microsoft'. Other placeholders like `%d` (for decimals) `%i` (for integers), or `%g` for (for floating point numbers) exist for various other types of data. It is possible to use multiple placeholders in the same string if we supply a **tuple** on the right hand side. We will cover tuples in Lesson 10, but you can get the general idea from this example.    

上图中，`%s` 是一个占位符，用于之后插入字符串 ‘Microsoft’。Python也有用于其他数据类型的占位符，如用于插入十进制数的 `%d`、用于插入整数的 `%i`、用于插入浮点数的 `%g`。一个字符串中可以用多个占位符，只需在右侧加一个**元组**。之后的第10课会细讲元组的用法，但大家可以先在下图的例子中感受一下.`


In [None]:
"%s is king of Silicon Valley ... but %s was there first" % ('Google', 'Microsoft')

'Google is king of Silicon Valley ... but Microsoft was there first'

### **f-string Expressions**
f-Strings allow us to take text templates even further, replacing `%` codes with named **variable placeholders**. It also eliminates the need to know the data type of each item being inserted in advance. The f-string expression takes care of coverting everything to strings first. This is similar to how `print(1,"2")` works just fine but `print(1 + "2")` throws an error; using the comma form of the `print()` call converts each argument to a string before printing.

### **f-string 表达式**
f-string 表达式用**变量占位符**替换 `%` 运算符，帮助我们进一步处理文本模板。用 f-string 表达式插入数据前，无需事先了解插入的数据类型。它会先把所有内容转换为字符串。这和 print() 函数的原理类似。`print(1,”2”)` 可以正常运行，但 `print(1 + ”2”)` 会报错。在`print()` 的例子中，逗号将所有输入的参数变成了字符串，再进行输出；f-string 也是如此.

In [4]:
company1 = "Google"
company2 = "Microsoft"
f"{company1} is king of Silicon Valley ... but {company2} was there first"

'Google is king of Silicon Valley ... but Microsoft was there first'

### **Pulse Check ...**
**1. Why doesn't this expression return `True`?**
```python
"a" in "Apple"
```

### **脉冲检查 ...**
**1. 为什么下面的表达式返回结果不为`True`?**
```python
"a" in "Apple"
```

YOUR ANSWER HERE 在此输入你的答案


> Because Python strings are case sensitive.

> 因为在Python的字符串中，字母的大小写会影响结果.

**2. Explain the output of the following code. How did Python choose the sequence?**

**2. 解释下面代码的输出。Python是如何选择序列的?**

In [None]:
for c in sorted("Go Stags!"):
    print(c)

 
!
G
S
a
g
o
s
t


YOUR ANSWER HERE 在此输入你的答案


> The `sorted()` function sorts the characters before the for loop iterates over them.

> 在循环前，`sorted()` 函数会先从小到大给字符排序。

**3. Study the two operations below. The first one fails (with an error) while the second one executes cleanly. Why do you suppose that is true?** Hint: The answer says a lot about the nature of strings versus numbers for encoding data.
.

**3. 研究以下两个运算符。第一个有一处问题，第二个可以正常运行。为什么第二个运算符无误？** 提示：答案与编码数据的性质（字符串 / 数字）相关.
```python
"The Red Sox have %d World Series Championships" % "9"
"The Red Sox have %s World Series Championships" % 9
```

YOUR ANSWER HERE 在此输入你的答案


> Because just about anything can be converted into a string but not vice-versa. 

> 因为所有数据类型都能转换为字符串，但字符串并不能转换成任意数据类型.

---
## **String Methods**

---
## **字符串方法**

Remember the discussion of essentialism at the start of Lesson 2? There we said that every entity has both form and function. In Python every value, function, module, or other fundamental element is an [object](https://docs.python.org/3/reference/datamodel.html#data-model). Objects have two sides: state and behavior. An object's state (data) is implemented with **instance variables** that act like properties or features. An object's behavior (functionality) is implemented through its **methods**, a set of data type-specific functions that always take the object itself as a parameter. 

Let's take, for example, the number 2.5. Clearly the number has state (i.e., 2.5) but it also has behavior. We can ask it, for example to add another number to itself (which is how the `+` operator really works) or we can ask it to provide the simplest possible equivalent fraction (a.k.a., "integer ratio"):

还记得第二课开头提到的实质主义吗？每个实体都有其形式和功能。在Python中，每个值、功能、模块或其他基本元素都可以称作一个对象。一个对象有两个方面：状态和行为。对象的状态即数据，储存在**实例变量**中；每个实例变量表示某种特定属性或特征。对象的行为即功能，通过**方法**实现；方法指针对特定数据类型创建的函数，其参数通常包括对象本身（以便访问和操作对象的状态）。. 

以数字 2.5 为例。显然，这个数字的状态是 2.5。但数字也有行为。我们可以对它提出请求，例如要求它将另一个数字加到自身上（这实际上是 + 运算符的原理），也可以要求它提供可能的最简分数的等价形式（即“整数比”）:

In [None]:
2.5.as_integer_ratio()

(5, 2)

In human terms this is saying that 2.5 = 5/2. How did the `as_integer_ratio()` method know what number to convert? The one it is attached to, which in this case is 2.5. We tell Python that the method is attached via "dot notation":
```python
value.method( arguments )
```

人当然知道 2.5 = 5/2，但“方法” `as_integer_ratio()`如何知道要转换哪个数字呢？上图的例子中， `as_integer_ratio()`转换的是2.5。我们通过“点符号”告诉 Python，“方法”依附于2.5进行:
```python
value.method( arguments )
```

The actual method is defined much like a function, only it always has one extra `self` parameter that never appears in the call:
```python
def method( self, parameters):
    ...
```

So what does this have to do with strings? The string data type has a large number of built-in [string methods](https://docs.python.org/3/library/stdtypes.html#string-methods):
- `upper()`, `lower()`, `capitalize()`
- `strip()`, `lstrip()`, `rstrip()`
- `center()`, `ljust()`, `rjust()`
- `count()`, `find()`, `rfind()`, `index()`, `rindex()`
- `replace()`, `format()`

and many more. Most are similar to their equivalents in MS Excel. Then there are a few "magic" ones like `__add__()`,  which says what the `+` operator is supposed to do. 


实际的“方法”与函数类似，只是比函数多一个参数 `Self`。这个参数在调用时不会出现:
```python
def method( self, parameters):
    ...
```

上述这些和字符串又有什么关系呢？Python 针对字符串数据类型，构建了有大量内置的字符串方法:
- `upper()`, `lower()`, `capitalize()`
- `strip()`, `lstrip()`, `rstrip()`
- `center()`, `ljust()`, `rjust()`
- `count()`, `find()`, `rfind()`, `index()`, `rindex()`
- `replace()`, `format()`

Python中还有很多其他“方法”，大多数与 Excel表格中的等效方法类似。还有一些神奇的“方法”，比如 `add()` 说明了 `+` 运算符 应该做什么. 


### **The `find` Method**
One of the more commonly-used string methods is `find()`, which returns the position of the first instance of a substring within a longer string:

### **`find` 方法**
`find()` 是最常见的一个字符串方法，可以在长字符串中查找子字符串第一个字符的位置:

In [None]:
"Every Good Boy Does Fine".find("Fine")

20

> **Note:** In English this phrase "**E**very **G**ood **B**oy **D**oes **F**ine" has each of the musical notes E G B D F in order. 

> **题外话：** 上面这句“**E**very **G**ood **B**oy **D**oes **F**ine”字面意思是“每个好男孩都很好”。但有意思的是，每个单词的首字母“E G B D F”正好依次代表了五线谱从低到高的五根线。

If we want `find()` to begin looking somewhere other than the beginning of the string then we can pass the starting position as a second argument. We can then slice out a word out of the string, like this ...

如果我们希望 `find()` 从字符串的其他位置开始查找，就可以将这个位置作为起始位置，并作为第二个参数写入 `find()` 。通过这种方法，可以在字符串中切分出一个词

In [None]:
# pick out a word that fits the pattern 'but '+word+" "
x = "Google is king of Silicon Valley ... but Apple was there first"
start_slice = x.find('but ') + len('but ') # finds the next position after the 'but ' 找到 ‘but ‘后第一个字符的位置
end_slice = x.find(" ",start_slice)        # finds the next space after start_slice 从start_slice开始，找到下一个空格
x[start_slice:end_slice]                   # the actual slice 实际切分出的词

'Apple'

While `find()` is certainly useful, it is not the only way to parse strings. You can, of course, traverse the string using a `for` loop (not recommended) or use the **regular expressions** module to search for arbitrarily complex text patterns. Regular expressions are covered in chapter 11 of the Py4E book.   

`find ()` 确实好用，但不是解析字符串的唯一方法。大家可以用 `for` 循环 实现遍历（不推荐），也可以用**正则表达式**搜索任意复杂文本。教材第十一章会介绍正则表达式的用法

### **The `split` and `join` Methods**
The `split` method is used to break a string apart into a list of substrings. 

### **`split` 和 `join` 方法**
`Split` 方法可以将字符串拆分成一个由子字符串构成的列表. 

In [5]:
'Every Good Boy Does Fine'.split(' ')

['Every', 'Good', 'Boy', 'Does', 'Fine']

The method looks the given separator (here given as the space character `' '`), chopping the string at each occurrence.

The `join` method does the reverse of `split`:

这种“方法”会查找给定的分隔符，并将字符串在分隔符处断开。上图的例子中，子字符串空格 `’ ’` 就是分隔符。

`join` 方法的功能则与 `split` 相反:

In [6]:
' '.join(['Every', 'Good', 'Boy', 'Does', 'Fine'])

'Every Good Boy Does Fine'

It concatenates a sequence of strings (`['Every', 'Good', 'Boy', 'Does', 'Fine']`) into a single string `'Every Good Boy Does Fine'`. The separator string (`' '`) is inserted between each pair of items in the sequence.

> **Heads Up:** Note that the `join` method is called on the *separator* this time. That is exactly backwards from the `split` method. 

`join` 可以把一连串字符串 (`['Every', 'Good', 'Boy', 'Does', 'Fine']`) 合并为一个单独的字符串 `'Every Good Boy Does Fine'`。分隔符(`' '`) 插入到每两个子字符串之间，将它们顺序串联起来.

> **Heads Up:** Note that the `join` method is called on the *separator* this time. That is exactly backwards from the `split` method
> **注意** 在 `join` 方法上调用*分隔符*，实现的功能恰好与在 `split` 上调用相反. 

### **Pulse Check ...**
**Write an expression to count the number of sentences in the Gettysburg address.** Assume that each sentence ends with a period. It is possible to use just one expression (without a statement), though it's okay to use an assignment statement and an expression. Any more than that is just verbose!

> Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
>
>But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.

### **脉冲检查 ...**
**写一个表达式，来计算下面这篇林肯的葛底斯堡演说共有多少句话**。假设每句话都以句号结尾，我们就可以不写语句、只用一个表达式完成计数。当然用一个赋值语句加一个表达式也可以实现功能，但是多余的语句会让代码变得冗长!

>八十七年前，我们先辈在这个大陆上建立了一个新国家，它孕育于自由之中，奉行一切人生来平等。现在，我们正处于一场伟大的内战之中，这场战争考验着这个国家，或者任何一个如此构想、如此献身的国家，是否能够长久地存在下去。我们在这场战争的战场上相遇。我们是来献祭这片战场的一部分 作为那些为了国家而献身的人们的最终安息之地，我们这样做完全合理。广而言之，我们无法奉献/献祭--无法神圣化这片土地。在这里奋斗过的勇士们，无论是生是死，都已将这片土地奉献给了我们，远非我们微薄之力所能增减。世人不会注意到，也不会长久记住我们在这里所说的话，但却永远不会忘记他们在这里所做的一切。倒是我们这些活着的人，要在这里献身于他们在这里战斗过的人迄今为止如此崇高地推进的未竟事业。我们更应该在此献身于摆在我们面前的伟大任务--从这些光荣的逝者身上，我们更加坚定地献身于他们为之奉献了最后全部心血的事业--我们在此下定决心，这些逝者不会白白牺牲--在上帝的庇佑下，这个国家将迎来自由的新生--民有、民治、民享的政府不会从地球上消失.

In [None]:
# YOUR CODE HERE 在此输入你的代码


In [None]:

'''
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.
'''.count(".")

In [None]:

'''
八十七年前，我们先辈在这个大陆上建立了一个新国家，它孕育于自由之中，奉行一切人生来平等。现在，我们正处于一场伟大的内战之中，这场战争考验着这个国家，或者任何一个如此构想、如此献身的国家，是否能够长久地存在下去。我们在这场战争的战场上相遇。我们是来献祭这片战场的一部分 作为那些为了国家而献身的人们的最终安息之地，我们这样做完全合理。广而言之，我们无法奉献/献祭--无法神圣化这片土地。在这里奋斗过的勇士们，无论是生是死，都已将这片土地奉献给了我们，远非我们微薄之力所能增减。世人不会注意到，也不会长久记住我们在这里所说的话，但却永远不会忘记他们在这里所做的一切。倒是我们这些活着的人，要在这里献身于他们在这里战斗过的人迄今为止如此崇高地推进的未竟事业。我们更应该在此献身于摆在我们面前的伟大任务--从这些光荣的逝者身上，我们更加坚定地献身于他们为之奉献了最后全部心血的事业--我们在此下定决心，这些逝者不会白白牺牲--在上帝的庇佑下，这个国家将迎来自由的新生--民有、民治、民享的政府不会从地球上消失.
'''.count(".")

---
## **Before you go ... Save your notebook to be sure it is up to date.**

---
## **离开前，确保你保存了最新的笔记本.**

---
> ## Every Tee Shirt Has a Story
> ABOUT THE 1989 GENETIC ALGORITHMS CONFERENCE   
> This shirt has some significance for me. It marked when I started to think a little differently about the world around me. I had just finished my masters studies in an area now called evolutionary computation (i.e., solving math problems with simulated sexual reproduction) but then simply called Genetic Algorithms. At the conference I got a prime speaking spot, with several kinda famous names in the audience and my overhead slides projected 15 feet tall behind me. I was going all out. I'd even paid to get the slides printed in color! Up on stage I felt like General [Patton](https://www.hollywood.com/general/patton-movie-stills-57256250/#/ms-1915/1) lecturing to the troops. 
>
>I got about halfway through my slides, just about the point where I was about to quote John Holland, the founder of the field, when I spotted him sitting in the first row, watching intently. I didn't expect that. I froze for about 20 seconds, or at least that's what it felt like; I have no idea. I guess he figured out what happened because he motioned for me to continue with my quote. Afterwards he bought me lunch. He turned out to be a very nice and patient man, a true educator as well as a world famous scientist.
>
> The conference tee shirt -- we all wore it in the group photo -- features binary code evolving into the words "Genetic Algorithms." It's held up pretty well for being over 30 years old.        

![L6 Tee Front](./Photos/L06_TeeFront.jpeg)

## Copyright &copy; 2020 Christopher Huntley. All rights reserved. 

---
> ## 每件t恤都有一个故事
> 1989遗传算法会议   
> 这件T恤对我有特殊意义，标志着我对周遭世界的看法发生了变化。当时我刚硕士毕业，学的专业放到现在应该叫“进化计算”，也就是用计算机模拟自然生物繁殖，从而解决数学问题。当时这个专业叫遗传算法。在遗传算法会议上，我有幸获得了一个重要的演讲机会，站在15英尺高的幻灯片前，当着数位业界大牛的面发言。我全力以赴，甚至自费彩打幻灯片！当时我觉得自己就像是好莱坞影片中的巴顿将军，对着下面的士兵发言. 
>
>我讲到一半，刚想引用遗传算法之父约翰·霍兰德（John Holland）的一句话，就瞥到他坐在第一排，意味深长地看着我。我完全没想到他会在现场。我感觉自己在台上愣了快半分钟，大脑完全宕机。他应该知道我在讲什么，因为在我引用他的话时，他朝我动了动。演讲结束后，霍兰德邀我共进午餐。我才发现他人非常好，很有耐心，不仅是个举世闻名的科学家，也是一位名副其实的教育家.
>
> 我们在照片里都穿着大会的文化衫，上面是由二进制数写的“基因算法”。三十多年来，这件T恤仍旧保存完好.        


## Copyright &copy; 2020 Christopher Huntley. All rights reserved.