# 第三讲 Python函数和文件操作

## 1、函数

函数是Python中最重要、最基础的代码组织和代码复用方式。

* 如果需要多次使用重复相同或类似的代码，就非常值得写一个可复用的函数。

* 将一定功能的代码块组织成一个函数并赋予一个函数名，提高代码的可读性

声明函数时使用def关键字，返回时使用return关键字。

In [1]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

函数的调用：

In [2]:
my_function(5, 6, z=0.7)

0.06363636363636363

* 位置参数
* 关键词参数

In [3]:
my_function(3.14, 7, 3.5)

35.49

In [4]:
my_function(10, 20)

45.0

In [5]:
my_function(z=0.7, x=5, y=6)

0.06363636363636363

In [6]:
my_function(0.7, 5, 6)

34.2

In [7]:
my_function(z=0.7, y=6, x=5)

0.06363636363636363

* 关键词参数必须在位置参数后面

In [8]:
my_function(z=0.7, w=5, y=6)

TypeError: my_function() got an unexpected keyword argument 'w'

In [None]:
my_function(z=0.7, 5, y=6)

In [None]:
my_function(5, z=0.7, y=6)

如果达到函数的尾部时仍然没有遇到return语句，就会自动返回None。

In [None]:
def func_abs(x, y):
    if(x>y):
        return x-y

In [None]:
func_abs(6,5)

In [None]:
ret = func_abs(6,7)

In [None]:
type(ret)

### 1.1 命名空间、作用域和本地函数

函数有两种连接变量的方式：
* 全局
* 本地

变量作用域  <-->  命名空间

在函数内部，变量都是分配到本地命名空间的。
* 本地命名空间在函数调用时生成；一般在函数执行结束后，本地命名空间就会别销毁。

In [10]:
def func():
    a_list_in = []
    
    for i in range(5):
        a_list_in.append(i)
        
    print('printed in function: ', a_list_in)

In [11]:
func()

printed in function:  [0, 1, 2, 3, 4]


In [12]:
a_list_in

NameError: name 'a_list_in' is not defined

In [None]:
b_list_out = [-1]

def func2():
    for i in range(5):
        b_list_out.append(i)
        
    print('pinrted in function：', b_list_out)

In [None]:
func2()

In [None]:
b_list_out

* 全局变量

In [13]:
def func3():
    global c_list_global
    c_list_global = []
    
    for i in range(5):
        c_list_global.append(i)
        
    print('printed in function:',c_list_global )

In [14]:
func3()

printed in function: [0, 1, 2, 3, 4]


In [15]:
c_list_global

[0, 1, 2, 3, 4]

再看一个例子：

In [None]:
var = 5

def func4():
    var = 10
    return var

In [None]:
func4()

In [None]:
var 

In [None]:
print('id outside:', id(var))

def func5():
    var = 10
    print('id inside:', id(var))
    return var

In [None]:
func5()

In [None]:
def func6():
    var2 = var + 100
    print('id inside:', id(var))
    return var2

In [None]:
func6()

In [None]:
def func7():
    var +=1
    return var

In [None]:
func7()

In [None]:
def func8():
    global var
    var += 1
    return var

In [None]:
func8()

In [None]:
id(var)

In [None]:
var

In [None]:
func8()

In [None]:
id(var)

In [None]:
def func9():
    var2 = var + 100
    print('id inside:', id(var))
    return var2

In [None]:
func9()

### 1.2 返回多个返回值

In [None]:
def f_multi_ret():
    v1 = 5
    v2 = 6
    v3 = 7
    return v1, v2, v3

a1, a2, a3 = f_multi_ret()

In [None]:
print(a1, a2, a3)

* 具有多个返回值的函数其实是返回了一个元组对象。
* 同学们还记得元组的拆包吗？

In [None]:
a_tuple = f_multi_ret()

In [None]:
a_tuple

In [None]:
def f_return_dict():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

In [None]:
a_dict = f_return_dict()

In [None]:
a_dict

In [None]:
type(a_dict)

### 1.3 函数也是对象

* Python中函数也是对象

In [None]:
import re #regular expression（正则表达式）

def clean_strings(strings):
    result = []
    
    for value in strings:
        value = value.strip() #截掉字符串左边的空格或指定字符
        value = re.sub('[!#?]', '', value) # 正则表达式
        value = value.title() #返回"标题化"的字符串,就是说所有单词都是以大写开始，其余字母均为小写
        result.append(value)
        
    return result

In [None]:
type(clean_strings)

In [None]:
id(clean_strings)

In [None]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'West virginia?']

In [None]:
clean_strings(states)

* 可以将函数作为一个参数传给其他函数。

In [None]:
def clean_strings_v2(strings, ops):
    result = []
    
    for value in strings:
        for function in ops:
            value = function(value)
            #print(function.__name__) #打印函数名
            
        result.append(value)
    return result

In [None]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

In [None]:
clean_ops = [str.strip, remove_punctuation, str.title]

In [None]:
clean_strings_v2(states, clean_ops)

In [None]:
def remove_multi_space(value):
    str_ret = ' '.join(value.split())
    return str_ret

In [None]:
clean_ops2 = [str.strip, remove_punctuation, str.title, remove_multi_space]

In [None]:
clean_strings_v2(states, clean_ops2)

In [None]:
for x in map(remove_punctuation, states):
    print(x)

随堂作业：
  写出上述代码块的等价代码

### 1.4 匿名函数 (Lambda) 

In [None]:
def short_function(x):  
    return x * 2

equiv_anon = lambda x: x * 2

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [None]:
strings.sort(key=lambda x: len(set(list(x))))
strings

### 1.5 闭包函数

* 在函数中可以（嵌套）定义另一个函数时，如果内部的函数引用了外部的函数的变量，则可能产生闭包。

In [None]:
def Maker(step):
    num  = 1
    
    def fun1():
        nonlocal num  #让“内部函数”可以修改“外部函数（装饰器）”的局部变量值
        num = num +step
        print(num)
        
    return fun1

In [None]:
func_maker = Maker(3)

In [None]:
func_maker()

In [None]:
j = 1

while(j<5):
    func_maker()
    j += 1

* “闭包”的最大的作用——保存局部信息不被销毁

### 1.6 科里化：部分参数应用

科里化表示通过部分参数应用的方式从已有的函数中衍生出新的函数。

In [None]:
def add_numbers(x, y):
    return x + y

In [None]:
add_five = lambda y: add_numbers(5, y)

In [None]:
add_five(6)

也可以使用functools模块中的partial函数来进行科里化

In [None]:
from functools import partial
add_five = partial(add_numbers, 5)

In [None]:
def add_num(x, y, z):
    return x + 2*y + 3*z

使用关键词参数

In [None]:
add_curry = partial(add_num, x=5)

In [None]:
add_curry(y=2, z=5)

### 1.7 错误和异常处理

* 异常处理是构建稳健程序的重要组成部分。

In [None]:
float('1.2345')
float('something')

使用 try/except 处理异常:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [None]:
attempt_float('1.2345')

In [None]:
attempt_float('something')

In [None]:
float((1, 2))

只处理某个类型的错误：

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [None]:
attempt_float((1, 2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [None]:
attempt_float((1,2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except TypeError:
        return 'type error'
    except ValueError:
        return 'value error'

In [None]:
attempt_float('string')

In [None]:
attempt_float((1,2))

无论是否报错都要执行

In [None]:
def open_file(path):
    f = open(path, 'w')

    try:
        write_to_file(f)
    finally:
        f.close()

In [None]:
def open_file(path):
    f = open(path, 'w')

    try:
        write_to_file(f)
    except:
        print('Failed')
    else:
        print('Succeeded')
    finally:
        f.close()

####  IPython的异常处理

报错跟踪（调用堆栈）

In [None]:
In [10]: %run examples/ipython_bug.py
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/home/wesm/code/pydata-book/examples/ipython_bug.py in <module>()
     13     throws_an_exception()
     14
---> 15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in calling_things()
     11 def calling_things():
     12     works_fine()
---> 13     throws_an_exception()
     14
     15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in throws_an_exception()
      7     a = 5
      8     b = 6
----> 9     assert(a + b == 10)
     10
     11 def calling_things():

AssertionError:

In [None]:
def works_fine():
    a = 5
    b = 6
    assert(a + b == 11)

def throws_an_exception():
    a = 5
    b = 6
    assert(a + b == 10)

def calling_things():
    works_fine()
    throws_an_exception()

calling_things()


In [None]:
%xmode Plain
#%xmode Context
#%xmode Verbose

In [None]:
def works_fine():
    a = 5
    b = 6
    assert(a + b == 11)

def throws_an_exception():
    a = 5
    b = 6
    assert(a + b == 10)

def calling_things():
    works_fine()
    throws_an_exception()

calling_things()


## 2. 文件操作

### 2.1 文件的打开与读写

打开文件进行读取或写入，需要使用内建函数open

In [None]:
ab_path = 'E:/1_Teaching/Python_Jupyter/pydata-book-2nd-edition/examples/segismundo.txt'
f = open(ab_path)

In [None]:
f.close()

In [None]:
ab_path = 'E:/1_Teaching/Python_Jupyter/pydata-book-2nd-edition/examples/segismundo2.txt'
f = open(ab_path)
f.close()

In [None]:
re_path = 'examples/segismundo.txt'
f = open(re_path)
f.close()

* 相对路径与绝对路径

In [None]:
%pushd

In [None]:
%pwd

* 打开文件异常处理

In [None]:
path = 'examples/segismundo2.txt'

try:
    f = open(path)
except:
    print("Failed to open file {0}!".format(path))
else:
    print("File opened!")

In [None]:
f.close()

In [None]:
f.name

In [None]:
path = 'examples/segismundo.txt'

In [None]:
def open_file(path, verbose=False):
    try:
        f = open(path)
    except:
        print("Failed to open file {0}!".format(path))
    else:
        if verbose:
            print("File opened!")
        
    return f

打开文件后可以向处理列表一样处理文件f,并遍历文件中的行内容。

In [None]:
for line in open_file(path):
    print((line))

In [None]:
lines = [x.rstrip() for x in open_file(path)]
lines

关闭文件

In [None]:
f.close()

* 使用with as 语句，文件会在with代码块结束后自动关闭

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]
    
lines

In [None]:
f.closed

### 2.2 Python文件模式：
* r : 只读模式
* w : 只写模式，创建新文件（清除路径下的同名文件中的数据）
* x : 只写模式，创建新文件（如果存在同名文件会创建失败）
* a : 添加到已经存在的文件（如果不存在就创建）
* r+: 读写模式
* b : 二进制模式， rb, wb
* t : 文本模式（自动解码为Unicode）。如果没有指定模式，默认使用此模式

In [None]:
with open(path, 'rb') as f:
    lines = [x for x in f]
    
lines    

读取一定量的字符

In [None]:
f = open(path)
f.read(10)

使用二进制读取原生字符

In [None]:
f2 = open(path, 'rb')  # Binary mode
f2.read(10)

In [None]:
f2 = open(path, 'rb')  # Binary mode
f2.read(10).decode('utf-8')

In [None]:
f.tell()

In [None]:
f2.tell()

In [None]:
f.seek(3)
f.read(1)

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
    
with open('tmp.txt') as f:
    lines = f.readlines()
lines

In [None]:
import os
os.remove('tmp.txt')

### 2.3 字符编码

* ASCII
* 扩展ASCII
* Unicode
* UTF-8（8 bit Unicode Transformation Format）
* GBK

In [None]:
open??

In [None]:
import sys
sys.getdefaultencoding()

In [None]:
sys.stdin.encoding

In [None]:
sys.stdout.encoding

再次回到这个例子：

In [None]:
lines = [x for x in open(path)]
lines

In [None]:
open?

In [None]:
print([hex(ord(x)) for x in lines[0]])

In [None]:
with open(path, 'rb') as f:
    lines = [x for x in f]
    
lines
    

In [None]:
lines = [x.encode('gbk').decode('utf-8').rstrip() for x in open_file(path)]
lines

另外一种解决方案：

In [None]:
with open(path, 'r', encoding='utf-8') as f:
    lines = [x for x in f]
    
lines

In [None]:
with open(path, 'rb') as f:
    lines = [x.decode('utf-8').rstrip() for x in f]
    
lines

In [None]:
lines = [x for x in open_file(path_ch)]
lines

In [None]:
path_ch = 'examples/chinese.txt'
lines = [x for x in open(path_ch,'r',encoding='utf-8')]
lines

In [None]:
with open(path,'rb') as f:
    data = f.read(10)
data

In [None]:
data.decode('utf8')


In [None]:
data[:4].decode('utf8')

In [None]:
sink_path = 'sink.txt'

with open(path, encoding='utf-8') as source:
    with open(sink_path, 'wt', encoding='latin1') as sink:
        sink.write(source.read())
        
with open(sink_path, encoding='latin1') as f:
    print(f.read(10))

在非二进制模式下，使用seek要小心

In [None]:
f = open(path,encoding='utf-8')
f.read(5)
f.seek(4)
f.read(1)
f.close()