# Network Programs

1. `socket`
2. `urllib`
3. `BeautifulSoup`

## Socket

* Retrieving a text document

In [6]:
import socket

mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org',80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if (len(data) < 1):
        break
    print(data.decode())
mysock.close()

HTTP/1.1 200 OK
Date: Fri, 09 Nov 2018 10:35:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief



* Retrieve an image

In [7]:
import socket
import time

HOST = 'data.pr4e.org'
PORT = 80
mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect((HOST,PORT))

mysock.sendall(
    b'GET http://data.pr4e.org/cover.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""

while True:
    data = mysock.recv(5120)
    if (len(data) < 1):
        break
    time.sleep(0.25)
    count = count + len(data)
    print(len(data), count)
    picture += data
    
pos = picture.find(b"\r\n\r\n")
print("Header length", pos)
print(picture[:pos].decode())

picture = picture[pos+4:]
fh = open("data/cover.jpg","wb")
fh.write(picture)
fh.close()

5120 5120
5120 10240
2240 12480
5120 17600
5120 22720
5120 27840
5120 32960
5120 38080
5120 43200
5120 48320
5120 53440
5120 58560
5120 63680
5120 68800
1654 70454
Header length 393
HTTP/1.1 200 OK
Date: Fri, 09 Nov 2018 10:47:45 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 15 May 2017 12:17:21 GMT
ETag: "111a9-54f8f097cc937"
Accept-Ranges: bytes
Content-Length: 70057
Vary: Accept-Encoding
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: image/jpeg


## `urllib` 

In [8]:
from urllib import request,parse,error

fh = request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fh:
    print(line.decode().strip())

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


### Count words in a paragraph

In [17]:
from urllib import request,parse,error

counts = dict()
fh = request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fh:
    words = line.decode().split()
    for word in words:
        counts[word] = counts.get(word, 0) + 1  

In [20]:
total = [(count, key) for key,count in counts.items()]
total.sort()
print(total[-5:])

[(1, 'yonder'), (2, 'sun'), (3, 'and'), (3, 'is'), (3, 'the')]
