Skip to content

提取HTML页面中所有的Url,要求,这些Url都属于a节点的href属性。 #67

@Sogrey

Description

@Sogrey
import re

s ='<a href="https://geekori.com">极客起源</a> <a href="https://www.microsoft.com">微软</a>'

result = re.findall('<a[^>]*href="([^>]*)">',s,re.I)
print(result)
for url in result:
    print(url)

https://geekori.com
https://www.microsoft.com

本题的技术点有如下2个:

  1. 分析a节点的正则表达式
  2. 利用分组提出href属性的值(Url)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions