## 通訊的演進
---

### 人與人
![people communication](img/communication.jpg)

### 電位訊號
![legacy communication](img/morsecodeletters.jpg)

### UART 序列傳輸
![serial port communication](img/uart.png)

### Ethernet
![eth communication](img/ethernet.jpg)

### Application Layer
![network layer](img/application_layer.jpg)

## Socket Communication (Simple Echo Server/Client)
---

範例在 Jupyter 可以執行，不過 print 會有問題，應該跟 IPython 的機制相關

可以複製到文本(echoserver.py/echoclient.py)透過下列指令執行

```bash
python3 echoserver.py

python3 echoclient.py
```

[出處參考](https://realpython.com/python-sockets/#echo-client-and-server) 有興趣可以自己看



In [None]:
# socket echo server
import socket

HOST = '127.0.0.1'  # Standard loopback interface address (localhost)
PORT = 65432        # Port to listen on (non-privileged ports are > 1023)

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen()
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data:
                break
            conn.sendall(data)

In [None]:
# socket echo client
import socket

HOST = '127.0.0.1'  # The server's hostname or IP address
PORT = 65432        # The port used by the server

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    s.sendall(b'Hello, world')
    data = s.recv(1024)

print('Received', repr(data))

## 爬蟲程式
---

### Scraper vs. Crawler

* Scraper
    * 一般指萃取網頁內容
* Crawler
    * 一般找連結建立索引

### 基礎知識

* HTTP(HyperText Transfer Protocol)
* HTML(HyperText Markup Language)、JavaScript、CSS(Cascading Style Sheets)
* DOM(Document Object Model) 操作
* requests 庫

## HTTP
---

* HyperText Transfer Protocol
* Application Layer (Layer 7)
* 主要組成
  * 請求方法
  * 表頭
  * 內文



### 請求/回應範例

---

```
GET / HTTP/1.1
Host: www.google.com
```

---

```
HTTP/1.1 200 OK
Content-Length: 3059
Server: GWS/2.0
Date: Sat, 11 Jan 2003 02:44:04 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=73d4aef52e57bae9:TM=1042253044:LM=1042253044:S=SMCc_HRPCQiqy
X9j; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Connection: keep-alive

<!DOCTYPE html>
<html>
  <head>
    <title>This is a title</title>
  </head>
  <body>
    <p>Hello world!</p>
  </body>
</html>

```

---

### 狀態碼

* 1xx訊息——請求已被伺服器接收，繼續處理
* 2xx成功——請求已成功被伺服器接收、理解、並接受
* 3xx重新導向——需要後續操作才能完成這一請求
* 4xx請求錯誤——請求含有詞法錯誤或者無法被執行
* 5xx伺服器錯誤——伺服器在處理某個正確請求時發生錯誤


## HTML
---

* HyperText Markup Language
* 標記語言
* 搭配 Javascript、CSS 實現複雜渲染


### 範例
---

```
<!DOCTYPE html>
<html>
  <head>
    <title>This is a title</title>
  </head>
  <body>
    <p>Hello world!</p>
  </body>
</html>
```

---

### 課外參考
[W3Schools](https://www.w3schools.com)

In [None]:
###
# Play around with HTML and DOM edit
###
from IPython.core.display import display, HTML
display(HTML("""
  <h1><p id="main">Hello world!</p></h1>
  <p style="display:none">this is Dylan speaking</p>
  <script>document.getElementById('main').innerHTML = 'Hello Python!'</script>
"""))

In [None]:
###
# Basic requests & BeautifulSoup demo
###
import requests
from bs4 import BeautifulSoup

r = requests.get('http://example.com/')
print("request url: {} return {}".format(r.url, r.status_code))
soup = BeautifulSoup(r.text, 'html.parser')
print("Title is {}".format(soup.title.string))
print("Body Text is {}".format(soup.body.p.string))

In [None]:
###
# Advance scraper for dynamic page by Selenium + PhantomJS
###
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS()

def example_flow():
  # load coppaclub booking page.
  driver.get("https://www.coppaclub.co.uk/towerbridge/book/")
  # switch to widget iframe.
  driver.switch_to.frame(0)
  try:
    # wait the widget render ready.
    element = WebDriverWait(driver, 10).until(
      EC.presence_of_element_located((By.ID, "dfdate_calendar"))
    )

    # set party size & session
    Select(driver.find_element_by_id('dfcovs')).select_by_value("6")
    Select(driver.find_element_by_id('dfsid')).select_by_value('DINNER')

    # change dfdate by javascript
    driver.execute_script("document.getElementById('dfdate').setAttribute('value','2019-01-25')")

    # save the guid for later used.
    element = WebDriverWait(driver, 10).until(
      EC.presence_of_element_located((By.ID, "sessionGuid"))
    )
    guid = element.get_attribute("value")

    # click on submit button
    element = WebDriverWait(driver, 10).until(
      EC.presence_of_element_located((By.ID, "optionsSubmit"))
    )
    element.click()

    # wait session completed
    time.sleep(5)

    # load booktable page
    driver.get("https://bda.bookatable.com/Mobile/AvailableTimes?sessionGuid={}".format(guid))    

    # print available time text
    print(driver.find_element_by_id('AvailableTimesInfoText').text)

    # wait time slots sector render completed
    element = WebDriverWait(driver, 10).until(
      EC.presence_of_element_located((By.ID, "timesSlots"))
    )
    
    # print all available time slots
    for e in element.find_elements_by_class_name("ui-btn-text"):
      print(e.text)

    # save screenshot to file system
    # driver.save_screenshot('/tmp/screenshot.jpg')
  except:
    print('Booking Unavailable.')
    driver.quit()
  finally:
    pass
    driver.quit()

example_flow()

## 小測驗或作業
---
* 將下列 HTML 修改以下項目
  * 將 Hello World! 改變字體
  * 修改使得 this is xxx speaking 可以顯示
  * 執行 3 秒後將 Hello World! 更換成 Hello <自己名字>!
* 利用 BeautifulSoup 讀取 html_doc 並印出平均年齡
* 利用 requests 下載任意網頁

In [None]:
from IPython.core.display import display, HTML
from bs4 import BeautifulSoup
html_doc = """
  <h1><p id="main">Hello world!</p></h1>
  <p style="display:none">this is Dylan speaking</p>
  <table style="width:100%">
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>
  <tr>
    <td>Jill</td>
    <td>Smith</td> 
    <td>50</td>
  </tr>
  <tr>
    <td>Eve</td>
    <td>Jackson</td> 
    <td>94</td>
  </tr>
  </table>
"""
display(HTML(html_doc))