#### 用一个Socket表示"打开了一个网络连接"
打开 Socket 需要知道目标计算机的 IP、Port，以及通信协议  

客户端：主动发起连接的计算机  
服务器：被动响应的计算机  

端口 | 服务类型 |
---- | -------- |
80   | Web网页服务[http] |
443   | Web网页服务[https] |
25   | SMTP服务   |
21   | FTP服务    |
<1024 | Internet标准服务 |
>1024 | 自定义用途     |


#### 创建一个基于 TCP 连接的 Socket
HTTP 标准规定：客户端必须先发送请求到服务器、服务器收到后才回复数据

In [49]:
from urllib import parse

url = 'https://www.cnblogs.com/xie-kun/p/7858358.html'
up = parse.urlparse(url)
print(up)

dest = up.netloc.split(':')
print(dest, len(dest))

ParseResult(scheme='https', netloc='www.cnblogs.com', path='/xie-kun/p/7858358.html', params='', query='', fragment='')
['www.cnblogs.com'] 1


In [47]:
from urllib import parse
import socket
import ssl

def https_comm(url):
    proto = 'http'
    host  = ''
    port  = 80
    up = parse.urlparse(url)
    
    # https://i.cnblogs.com/EditPosts.aspx?opt=1
    # (scheme='https', netloc='i.cnblogs.com', path='/EditPosts.aspx', params='', query='opt=1', fragment='')
    # 其中 scheme 是协议  netloc 是域名服务器  path 相对路径  params是参数，query是查询的条件
    print(up)
    
    if(up.scheme != ''):
        proto = up.scheme
        print('proto = %s' % proto)
        
    dest = up.netloc.split(':')
    if(len(dest) == 2):
        port = int(dest[1]) # 自带端口?
    else:
        if proto == 'http':
            port = 80
        elif port == 'https':
            port = 443

    host = dest[0]
    if proto == 'http':
        # 创建一个Socket(AF_INET=IPv4、SOCK_STREAM=面向流的TCP协议)
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    elif proto == 'https':
        s = ssl.wrap_socket(socket.socket())
    
    s.settimeout(5)
    # 建立连接(IP/域名，端口)
    try:
        s.connect((host, port))
    except Exception as e:
        print('error = %s' % e)
        return None
    
    # 发送数据
    s.send(b'GET %s HTTP/1.1\r\nHost: %s\r\n' % (up.path, host))

    # 接收数据
    buffer = []
    while True:
        d = s.recv(1024) # 每次最多接收1024个字节
        if d:
            buffer.append(d)
        else:
            break
    data = b''.join(buffer)
    # print(data)

    # 关闭连接
    s.close()
    
    # 上面收到的数据包括 HTTP 头 和 网页本身
    header, html = data.split(b'\r\n\r\n', 1)
    print('[header]:\r\n', header.decode('utf-8'))
    print('[html]:\r\n', html.decode('utf-8'))

    with open('../files/sina.html', 'wb') as f:
        f.write(html)

In [48]:
https_comm('https://www.cnblogs.com/xie-kun/p/7858358.html')

ParseResult(scheme='https', netloc='www.cnblogs.com', path='/xie-kun/p/7858358.html', params='', query='', fragment='')
proto = https
error = [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:833)
