# Chapter 7 模式匹配和正则表达式 

## 匹配方法

1. import re
2. 用re.compile()创建一个regex对象
3. 向regex的search()方法传入想查找的字符串，返回到一个Match对象
4. 调用Match对象的group()方法，返回实际匹配的字符串

In [3]:
import re

phone_num_reg = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phone_num_reg.search('My number is 415-555-4242')
# print(mo)
print(f'phone number found: {mo.group()}')

phone number found: 415-555-4242


### 括号分组

In [8]:
phone_num_reg = re.compile(r'(\d{3})-(\d{3}-\d{4})')
mo = phone_num_reg.search('My number is 415-555-4242')
# print(mo)
print(f'phone number found: {mo.group()}')

phone number found: 415-555-4242


In [9]:
mo.group(1)

'415'

In [10]:
mo.group(2)

'555-4242'

In [11]:
mo.groups()

('415', '555-4242')

### 用管道匹配多个分组

In [12]:
hero_reg = re.compile(r'Batman|Tina Fey')
mo1 = hero_reg.search('Batman and Tina Fey')

In [13]:
mo1.group()

'Batman'

In [19]:
type(mo1)

re.Match

In [14]:
mo2= hero_reg.search('Tina Fey and Batman')

In [15]:
mo2.group()

'Tina Fey'

In [16]:
mo3 = hero_reg.findall('Batman and Tina Fey')

In [17]:
mo3

['Batman', 'Tina Fey']

In [18]:
type(mo3)

list

In [21]:
bat_reg = re.compile(r'Bat(man|mobile|copter|bat)')
mo = bat_reg.search('Batmobile lost a wheel')

In [22]:
mo

<re.Match object; span=(0, 9), match='Batmobile'>

### 用?表示匹配0/1次

### 用*表示匹配0/n次

### 用+表示匹配1/n次

### 用{}表示匹配特定次数

{min,max}，闭区间，这个范围内的次数都能匹配

### 贪心与非贪心匹配 ?表非贪心

In [24]:
greedy_ha_reg = re.compile(r'(ha){3,5}')
mo = greedy_ha_reg.search('hahahahahaha')

In [26]:
mo.group()

'hahahahaha'

In [35]:
nogreedy_ha_reg = re.compile(r'(ha){3,5}?')
mo1 = nogreedy_ha_reg.search('hahahahahaha')

In [36]:
mo1.group()

'hahaha'

### 字符分类

1. \d  0-9的任何数字
2. \D 除了\d的任何字符
3. \w  任何数字、字母、下划线
4. \W 除了\w的任何字符
5. \s 空格、制表符、换行符
6. \S 除了\s的任何字符   

In [37]:
vowel_reg = re.compile(r'[aeiouAEIOU]')
vowel_reg.findall('RoboCop eats baby food, BABY FOOD')

['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

In [38]:
# ^在[]表示否认，即[]范围之外的
constant_reg = re.compile(r'[^aeiouAEIOU]')
constant_reg.findall('RoboCop eats baby food, BABY FOOD')

['R',
 'b',
 'C',
 'p',
 ' ',
 't',
 's',
 ' ',
 'b',
 'b',
 'y',
 ' ',
 'f',
 'd',
 ',',
 ' ',
 'B',
 'B',
 'Y',
 ' ',
 'F',
 'D']

### ^和$

^表示匹配必须发生在开头；$表示必须在结尾；二者都有表示整个字符串必须匹配该模式

In [40]:
begin_with_hello_reg = re.compile(r'^Hello')
mo = begin_with_hello_reg.search('Nick, Hello！')

In [42]:
mo == None

True

In [43]:
begin_with_hello_reg.search('Hello, world.')

<re.Match object; span=(0, 5), match='Hello'>

In [44]:
end_with_number = re.compile(r'\d+$')
end_with_number.search('nfisdaofhofja233')

<re.Match object; span=(13, 16), match='233'>

In [45]:
mo = end_with_number.search(r'ldsfjla32rfd')

In [47]:
mo == None

True

In [48]:
end_with_number = re.compile(r'\d$')
end_with_number.search('nfisdaofhofja233')

<re.Match object; span=(15, 16), match='3'>

In [49]:
whole_string_is_number = re.compile(r'^\d+$')
whole_string_is_number.search('1921346574')

<re.Match object; span=(0, 10), match='1921346574'>

In [51]:
whole_string_is_number.search('1921346574fdsf') == None

True

### .表示\n外的一切字符

In [52]:
at_reg = re.compile(r'.at')
at_reg.findall('the cat in the hat sat on a flat mat')

['cat', 'hat', 'sat', 'lat', 'mat']

In [55]:
at_reg = re.compile(r'\w*at')
at_reg.findall('the cat in the hat sat on a flat mat')

['cat', 'hat', 'sat', 'flat', 'mat']

In [58]:
at_reg = re.compile(r'[^\s]*at')
at_reg.findall('the cat in the hat sat on a flat mat')

['cat', 'hat', 'sat', 'flat', 'mat']

### .* 匹配所有字符

### .*匹配包括换行符\n  re.DOTALL

In [59]:
no_newline_reg = re.compile(r'.*')
no_newline_reg.search('Serve the public trust.\n Protect the innocent.\n Uphold the law')

<re.Match object; span=(0, 23), match='Serve the public trust.'>

In [60]:
newline_reg = re.compile(r'.*', re.DOTALL)
newline_reg.search('Serve the public trust.\n Protect the innocent.\n Uphold the law')

<re.Match object; span=(0, 62), match='Serve the public trust.\n Protect the innocent.\n>

### 不分大小写的匹配

In [63]:
robocop = re.compile(r'robocop', re.I)
robocop.search('ROBOcoP is part man, part machine, all cop')

<re.Match object; span=(0, 7), match='ROBOcoP'>

## sub()替换字符串

In [64]:
names_reg = re.compile(r'Agent \w+')
names_reg.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob')

'CENSORED gave the secret documents to CENSORED'

\1 \2 \3 表示分组

In [68]:
agent_name_reg = re.compile(r'Agent (\w)\w*')
agent_name_reg.sub(r'\1***', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent')

'A*** told C*** that E*** knew B*** was a double agent'

## 管理复杂的正则表达式

### re.VERBOSE

In [81]:
phone_num_reg = re.compile(r'''(
    (\d{3}|\(\d{3}\))?     # area code
    (\s|-|\.)?             # seperator
    \d{3}                  # first 3 digits
    (\s|-|\.)?             # seperator
    \d{4}                  # last 4 digits
    (\s*(ext|x|ext.)\s*\d{2,5})? # extension
    )''', re.VERBOSE)

In [82]:
mo = phone_num_reg.findall('fdlasjflasdjflajfdl(100)-234-2356 x 22')

In [83]:
mo

[('(100)-234-2356 x 22', '(100)', '-', '-', ' x 22', 'x')]

## 组合使用re.I, re.DOTALL, re.VERBOSE

In [87]:
# re.compile只接受两个参数
some_reg = re.compile('foo', re.IGNORECASE|re.DOTALL|re.VERBOSE)

## 案例： 匹配电话和邮箱

In [80]:
#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.

import pyperclip, re

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?                # area code
    (\s|-|\.)?                        # separator
    (\d{3})                           # first 3 digits
    (\s|-|\.)                         # separator
    (\d{4})                           # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))?    # extension
    )''', re.VERBOSE)

# TODO: Create email regex.
emailRegex =re.compile(r'''(
    [0-9a-zA-z_%+-.]+   # username
    @                   # symbom
    [0-9a-zA-z.]+       # company name
    \.
    (com|cn|org)# ending
    )''', re.VERBOSE)

# TODO: Find matches in clipboard text.
txt = pyperclip.paste()
phones =[]
emails = []

for phone in phoneRegex.findall(txt):
    norm_phone = f'{phone[1]}-{phone[3]}-{phone[5]}'
    if phone[8] !='':
        norm_phone +=f' x {phone[8]}'
    phones.append(norm_phone)

for email in emailRegex.findall(txt):
    emails.append(email[0])

# TODO: Copy results to the clipboard.
if len(phones)+len(email) >0:
    pyperclip.copy('\n'.join(phones)+'\n'+'\n'.join(emails))
    

In [83]:
test =re.compile(r'''(
    [0-9a-zA-z_%+-.]+   # username
    @                   # 
    [0-9a-zA-z.]+       # company name
    \.
    (com|cn|org)# ending
    )''', re.VERBOSE)

In [100]:
test.findall('34323r@adsf.eee.ew.org')

[('34323r@adsf.eee.ew.org', 'org')]

In [101]:
test2 = re.compile(r'''(
    [0-9a-zA-z_%+-.]+   # username
    @                   # 
    [0-9a-zA-z.]+       # company name
    \.
    com|cn|org# ending
    )''', re.VERBOSE)

In [102]:
test2.findall('34323r@adsf.eee.ew.org')

['org']

In [103]:
test2.findall('34323r@adsf.eee.ew.cn')

['cn']

In [108]:
test2.findall('34323r@adsf.eee.ew.com.org')

['34323r@adsf.eee.ew.com', 'org']

In [104]:
test2.search('34323r@adsf.eee.ew.org')

<re.Match object; span=(19, 22), match='org'>

编写的规则的意思： 
+  [0-9a-zA-z_%+-.]+   # username
    @                   # 
    [0-9a-zA-z.]+       # company name
    \.
    com
    这是一个整体
+    cn  是第二个整体
+    org 是第三个整体
+    三者是或关系

In [105]:
or3 = re.compile(r'com|cn|org')

In [106]:
or3.findall('23434cn3443')

['cn']

In [107]:
or3.findall('dsjaofjacomerrercn343rorg')

['com', 'cn', 'org']

# Chapter 8 输入验证

## pyinputplus模块

In [8]:
while True:
    age = input('please input your age')
    try:
        age = int(age)
    except:
        print(f'please input a number')
        continue
    if age < 1:
        print(f'please input positive number')
        continue
    break
print(f'your age is {age}')

please input your age w


please input a number


please input your age -1


please input positive number


please input your age 23


your age is 23


In [3]:
import pyinputplus as pyip
response = pyip.inputNum()

 five


'five' is not a number.


 5


### 模块函数

+ inputStr()
+ iputNum()
+ inputChoice()
+ inputMenu()
+ inputDatetime()
+ inputYesNo()
+ inputBool()
+ inputEmail()
+ inputFilepath()
+ inputPassword()

In [4]:
response

5

In [5]:
type(response)

int

### 关键字prompt

In [6]:
response = pyip.inputInt(prompt='Enter your age:')

Enter your age:

 twenty three


'twenty three' is not an integer.
Enter your age:

 23


### 关键字min, max, greaterThan, lessThan

In [10]:
response = pyip.inputNum(prompt='Enter a number', min=3, max=7)

Enter a number

 2


Number must be at minimum 3.
Enter a number

 8


Number must be at maximum 7.
Enter a number

 5


### 关键字blank

In [12]:
response = pyip.inputNum(blank=True)

  


In [13]:
response

''

### 关键字limit, timeout, default 

In [14]:
response = pyip.inputNum(limit=2)

 r


'r' is not a number.


 a


'a' is not a number.


RetryLimitException: 

In [15]:
response = pyip.inputNum(timeout=10)

 5


TimeoutException: 

In [16]:
res = pyip.inputNum(limit=2, default='N/A')

 hellp


'hellp' is not a number.


 world


'world' is not a number.


In [17]:
res

'N/A'

### 关键字allowRegexes和blockRegexes

In [1]:
import pyinputplus as pyip

In [4]:
res = pyip.inputNum(allowRegexes=[r'zero'])

 zero


In [5]:
res = pyip.inputNum(allowRegexes=[r'zero', r'(I|V|X|L|C|D|M)'])

 VII


In [5]:
# filter out even number
res = pyip.inputNum(blockRegexes=[r'[02468]$'])

 22


This response is invalid.


 3


+ 如果allow和block都包含，block的优先级更高

In [3]:
res = pyip.inputStr(allowRegexes=[r'catpillar', 'catgegory'], blockRegexes=[r'cat'])

 cat


This response is invalid.


 catastrophy


This response is invalid.


 catpillar


### 自定义验证函数传递给inputCustom()

In [4]:
def add_up_to_ten(numbers):
    numberList = list(numbers)
    for i, digit in enumerate(numberList):
        numberList[i] = int(digit)
    if sum(numberList) !=10:
        raise Exception('The digits must add up to 10')
    return int(numbers)

res = pyip.inputCustom(add_up_to_ten)
            

 233


The digits must add up to 10


 hello


invalid literal for int() with base 10: 'h'


 46


## 案例：乘法测验

In [3]:
import pyinputplus as pyip, time, random

total_questions = 10
correct_questions = 0

for i in range(total_questions):                    
    num1 = random.randint(0,9)
    num2 = random.randint(0,9)
    prompt = f'#{i+1}: {num1} x {num2} = '
    try:
        ans = pyip.inputStr(prompt, allowRegexes=['^%s$' % (num1*num2)],
                            blockRegexes=[(r'.*', 'Incorrect')],
                           timeout=8, limit=3)
    except pyip.TimeoutException:
        print('Out of time')
    except pyip.RetryLimitException:
        print('Out of tries')
    else:
        print('Correct!')
        correct_questions +=1
        time.sleep(1)
    print(f'\033[1;32mScore: {correct_questions} / {total_questions}\n\033[0m')
        
    

#1: 1 x 3 = 

 3


Correct!
[1;32mScore: 1 / 10
[0m
#2: 4 x 9 = 

 36


Correct!
[1;32mScore: 2 / 10
[0m
#3: 7 x 1 = 

 7


Correct!
[1;32mScore: 3 / 10
[0m
#4: 7 x 5 = 

 35


Correct!
[1;32mScore: 4 / 10
[0m
#5: 7 x 4 = 

 28


Correct!
[1;32mScore: 5 / 10
[0m
#6: 8 x 5 = 

 40


Correct!
[1;32mScore: 6 / 10
[0m
#7: 3 x 1 = 

 3


Correct!
[1;32mScore: 7 / 10
[0m
#8: 0 x 8 = 

 0


Correct!
[1;32mScore: 8 / 10
[0m
#9: 1 x 4 = 

 4


Correct!
[1;32mScore: 9 / 10
[0m
#10: 3 x 2 = 

 6


Correct!
[1;32mScore: 10 / 10
[0m


正则表达式的提醒功能

In [10]:
res = pyip.inputNum(blockRegexes=[(r'[02468]$', 'we do not want even number')])

 2


we do not want even number


 3


# Chatpter 9 读写文件

## pathlib模块

+ 保证路径的写法在不同系统上通用

In [11]:
from pathlib import Path
myFiles = ['accounts.txt', 'details.csv', 'invite.docx']
for file in myFiles:
    print(Path(r'D:\Code\jupyter', file))

D:\Code\jupyter\accounts.txt
D:\Code\jupyter\details.csv
D:\Code\jupyter\invite.docx


In [12]:
Path('spam', 'foo', 'eggs')

WindowsPath('spam/foo/eggs')

In [13]:
str(Path('spam', 'foo', 'eggs'))

'spam\\foo\\eggs'

### 用/连接路径

In [14]:
Path(r'D:\Code') / 'eggs.txt'

WindowsPath('D:/Code/eggs.txt')

In [16]:
Path(r'D:\Code').joinpath('eggs.txt')

WindowsPath('D:/Code/eggs.txt')

### 当前工作目录

In [17]:
Path.cwd()

WindowsPath('D:/Code/jupyter/automate_python')

In [18]:
import os

In [19]:
os.chdir('D:\Code\pilis_dl')

In [20]:
Path.cwd()

WindowsPath('D:/Code/pilis_dl')

In [21]:
os.getcwd()

'D:\\Code\\pilis_dl'

### 主目录

In [22]:
Path.home()

WindowsPath('C:/Users/DELL')

### 绝对路径与相对路径

### 创建新文件夹

In [24]:
os.chdir(r'D:\Code\jupyter\automate_python')

In [25]:
Path.cwd()

WindowsPath('D:/Code/jupyter/automate_python')

In [27]:
os.makedirs('../spam')

In [30]:
Path(r'../spam2').mkdir()

+ ps: Path().mkdir()一次只能创建一个，不能像os.makedirs()一次多个

### 处理绝对和相对路径

In [31]:
Path.cwd()

WindowsPath('D:/Code/jupyter/automate_python')

In [33]:
Path.cwd().is_absolute()

True

os.path的一些：
+ os.path.abspath(some_path)，返回some_path的绝对路径
+ os.path.isabs(some_path)， 判断是否
+ os.path.relpath(some_path, start) 从start出发到some_path要怎么走，走法是用相对路径写的

In [37]:
os.path.relpath(r'D:\Typora', Path.cwd())

'..\\..\\..\\Typora'

### 取得文件路径的各部分

在windows系统里，文件路径分为：
+ anchor,是根文件夹。 在windows里是drive，驱动器
+ parent
+ name: stem+suffix
  
Path对象在windows里有drive属性
  

In [43]:
p  = Path(r'D:\Code\jupyter\spam2\foo.txt')

In [45]:
p.anchor

'D:\\'

In [47]:
p.parent

WindowsPath('D:/Code/jupyter/spam2')

In [49]:
p.name

'foo.txt'

In [51]:
p.stem

'foo'

In [52]:
p.suffix

'.txt'

In [53]:
p.parents

<WindowsPath.parents>

In [54]:
p.parents[0]

WindowsPath('D:/Code/jupyter/spam2')

In [55]:
p.parents[1]

WindowsPath('D:/Code/jupyter')

In [56]:
p.parents[2]

WindowsPath('D:/Code')

In [57]:
p.parents[3]

WindowsPath('D:/')

In [59]:
os.path.dirname(r'D:\Code\jupyter\spam2\foo.txt')

'D:\\Code\\jupyter\\spam2'

In [60]:
os.path.basename(r'D:\Code\jupyter\spam2\foo.txt')

'foo.txt'

In [61]:
os.path.split(r'D:\Code\jupyter\spam2\foo.txt')

('D:\\Code\\jupyter\\spam2', 'foo.txt')

In [63]:
calcFilePath = r'D:\Code\jupyter\spam2\foo.txt'
calcFilePath.split(os.sep)

['D:', 'Code', 'jupyter', 'spam2', 'foo.txt']

### 查看文件大小和文件夹内容

In [64]:
os.path.getsize(r'D:\Code\recipie.py')

5101

In [65]:
os.listdir(r'D:\Code\jupyter')

['.ipynb_checkpoints',
 'automate_python',
 'cv_course(opencv)',
 'data_mining(test)',
 'spam',
 'spam2',
 '实验结果画图.ipynb',
 '新版jupyter测试.ipynb']

In [66]:
os.path.getsize(r'D:\Code\jupyter')

4096

In [67]:
os.path.getsize(r'D:\Typora')

4096

+ os.path.getsize碰见文件夹会返回其描述文件的大小

### 使用通配符模式修改文件列表 Path().glob()

In [68]:
p = Path(r'D:\Code\jupyter\spam2')
p.glob('*')

<generator object Path.glob at 0x000001452B3F79A0>

In [69]:
list(p.glob('*'))

[WindowsPath('D:/Code/jupyter/spam2/foo.txt'),
 WindowsPath('D:/Code/jupyter/spam2/hah.csv'),
 WindowsPath('D:/Code/jupyter/spam2/others.bat'),
 WindowsPath('D:/Code/jupyter/spam2/test.docx')]

In [71]:
list(p.glob('*.txt'))

[WindowsPath('D:/Code/jupyter/spam2/bar.txt'),
 WindowsPath('D:/Code/jupyter/spam2/foo.txt')]

In [73]:
list(p.glob('project?'))

[WindowsPath('D:/Code/jupyter/spam2/projectA'),
 WindowsPath('D:/Code/jupyter/spam2/projectB'),
 WindowsPath('D:/Code/jupyter/spam2/projectC')]

### 检测路径的有效性

+ Path().exists()   <====> os.path.exists()
+ Path().is_dir()   <====>os.path.isdir()
+ Path().is_file()  <====>os.path.isfile()

In [74]:
p = Path(r'D:\Code\jupyter')

In [75]:
p.exists()

True

In [76]:
p.is_file()

False

In [77]:
p.is_dir()

True

In [80]:
p = Path(r'D:\Code\juju')

In [81]:
p.exists()

False

###  另附 os.walk()

In [82]:
def walkFile(file):
    for root, dirs, files in os.walk(file):

        # root 表示当前正在访问的文件夹路径
        # dirs 表示该文件夹下的子目录名list
        # files 表示该文件夹下的文件list

        # 遍历文件
        for f in files:
            print(os.path.join(root, f))

        # 遍历所有的文件夹
        for d in dirs:
            print(os.path.join(root, d))

## 文件读写过程

In [83]:
Path.cwd()

WindowsPath('D:/Code/jupyter/automate_python')

In [101]:
p = p.parent

In [102]:
p

WindowsPath('D:/Code/jupyter')

In [103]:
p = p / 'spam.txt'

In [104]:
p

WindowsPath('D:/Code/jupyter/spam.txt')

In [105]:
p.write_text('hello, spam!')

12

In [106]:
p.read_text()

'hello, spam!'

## shelve模块保存变量

+ 利用这个模块将python程序中的变量保存到二进制的shelf文件中，下次即可从硬盘中恢复变量数据

In [107]:
import shelve

In [109]:
shelf_file = shelve.open('mydata')
cats = ['Zophine', 'Pooka', 'Simon']
shelf_file['cats'] = cats
shelf_file.close()

In [110]:
shelf_file

<shelve.DbfilenameShelf at 0x1452ed4c3d0>

In [111]:
shelf_file = shelve.open('mydata')

In [112]:
shelf_file

<shelve.DbfilenameShelf at 0x1452e469fd0>

In [116]:
list(shelf_file.keys())

['cats']

In [117]:
list(shelf_file.values())

[['Zophine', 'Pooka', 'Simon']]

In [118]:
shelf_file['cats']

['Zophine', 'Pooka', 'Simon']

In [119]:
shelf_file['dogs'] = ['Peter', 'Ralph']

In [120]:
list(shelf_file.values())

[['Zophine', 'Pooka', 'Simon'], ['Peter', 'Ralph']]

In [121]:
shelf_file.clear()

In [122]:
list(shelf_file.keys())

[]

In [123]:
list(shelf_file.values())

[]

In [124]:
shelf_file.close()

### 综合用法

In [125]:
import pprint
cats = [{'name': 'Zophie', 'desc':'chubby'}, {'name':'Pooka', 'desc':'fluffy'}]
pprint.pformat(cats)

"[{'desc': 'chubby', 'name': 'Zophie'}, {'desc': 'fluffy', 'name': 'Pooka'}]"

In [126]:
with open('my_cats.py', 'w') as f:
    f.write('cats = '+ pprint.pformat(cats) + '\n')

In [127]:
import my_cats
my_cats.cats

[{'desc': 'chubby', 'name': 'Zophie'}, {'desc': 'fluffy', 'name': 'Pooka'}]

## 综合案例1： 生成随机试卷TODO

## 综合案例2： 生成多重粘贴板TODO

# Chapter 10 组织文件

## shutil模块

+ 利用其对文件进行复制， 移动， 重命名和删除

### shutil.copy(source, dest) 复制文件

In [128]:
import shutil, os
from pathlib import Path

In [149]:
p = Path(r'D:\\')

In [151]:
p

WindowsPath('D:/')

In [160]:
# 原封不动拷贝
shutil.copy(p/'spam.txt', r'D:\Code')

'D:\\Code\\spam.txt'

In [155]:
ss = p / 'spam.txt'

In [156]:
ss 

WindowsPath('D:/spam.txt')

In [158]:
ss.exists()

True

In [162]:
# 重命名拷贝
shutil.copy(p/'spam.txt', r'D:\Code\jupyter\haha.txt')

'D:\\Code\\jupyter\\haha.txt'

### shutil.copytree(source, dest) 复制文件夹

In [163]:
shutil.copytree('D:\Code\jupyter\spam', 'D:\Code\jupyter\spam_back')

'D:\\Code\\jupyter\\spam_back'

### shutil.move(source, dest) 移动/重命名

+ os.unlik(path) 删除path文件
+ os.rmdir(path) 删除path<font size=5>空文件夹</font>
+ shutil.rmtree(path) 删除path文件夹

In [166]:
os.unlink(r'D:\Code\jupyter\spam_back\foo.txt')

In [167]:
os.rmdir(r'D:/Code/jupyter/spam_back/')

OSError: [WinError 145] 目录不是空的。: 'D:/Code/jupyter/spam_back/'

In [168]:
os.rmdir(r'D:/Code/jupyter/empty_dir')

In [169]:
shutil.rmtree(r'D:/Code/jupyter/spam_back/')

### 用send2trash模块安全删除

+ 由于前面的都是不可恢复的删除，所以使用该模块将文件送入垃圾箱内

In [1]:
import send2trash

with open('bacon.txt', 'a') as f:
    f.write('bacon is not a vegetable.')

In [2]:
send2trash.send2trash('bacon.txt')

### 遍历目录树

In [3]:
import os

In [5]:
for folderName, subfolders, filenames in os.walk(r'D:\delicious'):
    print(f'The current folder is {folderName}')

    for subfolder in subfolders:
        print(f'SUBFOLDER OF {folderName}: {subfolder}')

    for filename in filenames:
        print(f'FILE INSIDE {folderName}: {filename}')

The current folder is D:\delicious
SUBFOLDER OF D:\delicious: cats
SUBFOLDER OF D:\delicious: walnut
FILE INSIDE D:\delicious: spam.txt
The current folder is D:\delicious\cats
The current folder is D:\delicious\walnut
SUBFOLDER OF D:\delicious\walnut: waffles
The current folder is D:\delicious\walnut\waffles
FILE INSIDE D:\delicious\walnut\waffles: butter.txt


## 用zipfile模块压缩文件

In [7]:
import zipfile, os
from pathlib import Path

In [8]:
p = Path('D:\\')
delicousZip = zipfile.ZipFile(p / 'delicious.zip')
delicousZip.namelist()       

['delicious/',
 'delicious/cats/',
 'delicious/cats/road.jpg',
 'delicious/ChatGpt╡─╗╪┤≡║╧╝».md',
 'delicious/DL.md',
 'delicious/walnut/',
 'delicious/walnut/waffles/',
 'delicious/walnut/waffles/butter.txt']

In [11]:
# ↑看来是不支持中文
dl_note = delicousZip.getinfo('delicious/DL.md')

In [12]:
dl_note.file_size

21684

In [13]:
dl_note.compress_size

8258

In [15]:
f'Compressed files is {round(dl_note.file_size / dl_note.compress_size, 2)}x smaller'

'Compressed files is 2.63x smaller'

In [16]:
delicousZip.close()

### 从zip文件中解压缩

In [21]:
elicious_zip = zipfile.ZipFile(p / 'elsarticle.zip')
elicious_zip.extractall('D:\DESKTOP')  # 不加参数的话会解压到当前工作路径
elicious_zip.close()

In [22]:
elicious_zip

<zipfile.ZipFile [closed]>

In [23]:
# extract解压单个文件
elicious_zip = zipfile.ZipFile(p / 'elsarticle.zip')
elicious_zip.namelist()

['elsarticle/',
 'elsarticle/elsarticle-template-num.tex',
 'elsarticle/elsarticle-num.bst',
 'elsarticle/elsarticle.dtx',
 'elsarticle/elsarticle.ins',
 'elsarticle/elsarticle-harv.bst',
 'elsarticle/elsarticle-template-harv.tex',
 'elsarticle/elsarticle-template-num-names.tex',
 'elsarticle/doc/',
 'elsarticle/doc/elsdoc.pdf',
 'elsarticle/doc/elstest-1pdoubleblind.pdf',
 'elsarticle/doc/1pseperateaug.pdf',
 'elsarticle/doc/elstest-3p.pdf',
 'elsarticle/doc/elsdoc.tex',
 'elsarticle/doc/elstest-1p.pdf',
 'elsarticle/doc/makefile',
 'elsarticle/doc/rvdtx.sty',
 'elsarticle/doc/pdfwidgets.sty',
 'elsarticle/doc/1psingleauthorgroup.pdf',
 'elsarticle/doc/elstest-5p.pdf',
 'elsarticle/doc/jfigs.pdf',
 'elsarticle/doc/elstest-3pd.pdf',
 'elsarticle/manifest.txt',
 'elsarticle/elsarticle-num-names.bst',
 'elsarticle/README']

In [27]:
elicious_zip.extract('elsarticle/doc/elstest-3pd.pdf','D:\\DESKTOP')

'D:\\DESKTOP\\elsarticle\\doc\\elstest-3pd.pdf'

In [28]:
elicious_zip.close()

### 创建和添加到zip文件

In [29]:
new_zip = zipfile.ZipFile('new.zip', 'w')
new_zip.write(r'D:\浏览器下载\gan review.pdf', compress_type=zipfile.ZIP_DEFLATED)

In [30]:
new_zip.close()

In [31]:
new_zip = zipfile.ZipFile('new.zip', 'a')
new_zip.write(r'D:\浏览器下载\gan review_ch.pdf', compress_type=zipfile.ZIP_DEFLATED)

In [32]:
new_zip.close()

## 综合案例1：美国日期转欧洲日期TODO

## 综合案例2：将文件夹备份到ZIP文件TODO