## Let's get started using sockets!

Read the Python docs on sockets: https://docs.python.org/3/library/socket.html


Import the socket library. Its built-in to Python, so no separate install is required.

In [37]:
import socket

The Python sockets library allows us to find out what port a service uses. SSH is a network protocol for operating network services securely over an unsecured network, but don't worry too much about thath right now.

In [38]:
socket.getservbyname('ssh')

22

We can see that SSH is running on port 22. 

You can also do a reverse lookup, finding what service uses a given port.

In [39]:
socket.getservbyport(80)

'http'

HTTP is a network protocol for accessing the web. We can see that that is what runs on port 80.

Hostnames are human-readable labels that correspond to the IP Address of a device connected to a network. For example, `google` is the domain name of www.google.com which corresponds to an IP Address.

The sockets library also provides tools for finding out information about hosts. For example, you can find out about the hostname and IP address of the machine you are currently using.

In [40]:
socket.gethostname()

'LAPTOP-BQP6HJU5'

`gethostbyname(hostname)` will return the IP Address of the host.

In [41]:
socket.gethostbyname(socket.gethostname())

'192.168.56.1'

Your host name and IP Address will be different than mine.

You can also find out about machines that are located elsewhere, assuming you know their hostname. For example:

In [42]:
socket.gethostbyname('google.com')

'216.58.193.78'

In [43]:
socket.gethostbyname('uw.edu')

'128.95.155.198'

In [44]:
socket.gethostbyname('cutecatvideos.net')

'50.97.112.130'

The gethostbyname_ex method of the socket library provides more information about the machines we are exploring. It returns (hostname, aliaslist, ipaddrlist).

In [45]:
socket.gethostbyname_ex('cutecatvideos.net')

('cutecatvideos.net', [], ['50.97.112.130'])

In [46]:
socket.gethostbyname_ex('google.com')

('google.com', [], ['216.58.193.78'])

### Make a socket

To create a socket, you use the socket method of the socket library. It takes up to three optional positional arguments: family, type, proto, (and fileno)

socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None)

The address family should be AF_INET (the default), AF_INET6, AF_UNIX, AF_CAN or AF_RDS. 

The socket type should be SOCK_STREAM (the default), SOCK_DGRAM, SOCK_RAW or perhaps one of the other SOCK_ constants. 

The protocol number is zero by default. Don't worry about the other options.

Here we use none to get the default behavior:

In [47]:
my_socket = socket.socket()

In [48]:
my_socket

<socket.socket fd=1296, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>

A socket has some properties that are immediately important to us. These include the family, type and protocol of the socket:

In [49]:
my_socket.family

<AddressFamily.AF_INET: 2>

In [50]:
my_socket.type

<SocketKind.SOCK_STREAM: 1>

In [51]:
my_socket.proto

0

You might notice that the values for these properties are integers. In fact, these integers are constants defined in the socket library.

## Customizing Sockets

Family, Type, and Protocol are the properties of a socket and correspond to the three positional arguments you may pass to the socket constructor. Changing these allows you to customize your sockets to use a specific communications profiles. For our purposes, we will accept the defaults, but you can find more information on the different families, types and protocols in the [Socket documentation](https://docs.python.org/3/library/socket.html).



In [52]:
socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)

<socket.socket fd=1280, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=17>

## Messages are Bytes

We use sockets to send messages between a server and client. These messages must be sent as bytes, never unicode. There is a good explanation of the difference between unicode and bytes [here](https://stackoverflow.com/questions/10060411/byte-string-vs-unicode-string-python). Let's go over how they two differ in Python.

In [53]:
#This is a unicode string that we're used to

mystr = "hello world!"
print(mystr)

hello world!


In [54]:
# This is a byte string, the type of string that sockets can pass back and forth

mystr = b"hello world!"
print(mystr)

b'hello world!'


Notice the `b` before the string, telling you that it is a byte string.

In [55]:
# You can also encode a unicode into a byte string like this

mystr1 = "hello world!".encode('utf8')
print(mystr1)

b'hello world!'


In [56]:
# And decode it like this

mystr1.decode('utf8')

'hello world!'

Now its a unicode string again!

The `'utf8'` is the default here, and optional. There are other codes that you can encode and decode but we won't go over them here.

When using sockets, you must send bytes, so you must encode a message before sending it though a socket.

In [57]:
my_msg = "hello".encode('utf8')

You have now encoded my_msg into bytes. Print it how to see what happens.

In [58]:
print(my_msg)

b'hello'


Again, the `b` let's you know the message is in bytes.

Decode a message received from a socket.

In [59]:
my_msg = my_msg.decode('utf8')

In [60]:
my_msg

'hello'

Now the message is a unicode string, like you are used to.

## Sockets to browse a web page

Let's use a Python socket to read a web page. Its like using Python as a web browser!

First, make an [HTTP request](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) following the proper protocol. Note that the request is in bytes.

In [61]:
request = b"GET / HTTP/2.0\nHost: www.cutecatvideos.net\n\n"

Create a socket that will connect to the domain.

In [62]:
s = socket.socket()

Connect to the domain on port 80, the web port.

In [63]:
s.connect(("www.cutecatvideos.net", 80))

Send the request. This returns the length of the request.

In [64]:
s.send(request)

44

Create some variables for our buffer size, emtpy response (in bytes), and a boolean that we can use to end our loop.

Buffersize must be a digit of 16

In [None]:
buffsize = 4096
response = b''
done = False

Read in chunks of the buffer size until the chunk is smaller than the buffer size (this means we've come to the last chunk. In that case, close the socket and end the loop. Finally, print the complete response.

In [65]:
while not done:
    msg_part = s.recv(buffsize)
    if len(msg_part) < buffsize:
        done = True
        s.close()
    response += msg_part
print(response.decode('utf8'))

HTTP/1.1 200 OK
Date: Sat, 07 Oct 2017 17:36:35 GMT
Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips mod_bwlimited/1.4 PHP/5.6.29
X-Powered-By: PHP/5.6.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

20b3

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<TITLE>Cute Cat Videos</TITLE>
<META name="Description" content="Watch the best and funniest cat videos from YouTube! Includes cute cats, cat tricks, goofy cats, cats playing piano, or cats with other animals.">
<META name="Keywords" content="videos, cat">
<META name="ROBOTS" content="INDEX,FOLLOW,ALL">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META HTTP-EQUIV="Content-Language" CONTENT="en-us">
<meta http-equiv="Content-Style-Type" content="text/css">
<style>
body {FONT-FAMILY: Arial, sans-serif; font-size:1.2em}
h2 {FONT-FAMILY: Arial

The above is the HTTP contents of our website! Pretty neat, right?

All together that's:

In [None]:
import socket
request = b"GET / HTTP/2.0\nHost: www.cutecatvideos.net\n\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.cutecatvideos.net", 80))
s.send(request)

buffsize = 4096
response = b''
done = False
while not done:
    msg_part = s.recv(buffsize)
    if len(msg_part) < buffsize:
        done = True
        s.close()
    response += msg_part
    
print(response.decode())

Try it with another website and see what happens!

### Sneak peak at the Requests library

Psst... you wouldn't actually use sockets to access the html and other properties of webpages nowadays. You would use a modern library like [Requests](http://docs.python-requests.org/en/master/)  instead. Requests does not ship with Python, so you must run ```pip install requests``` in your terminal first to install it.

In [66]:
import requests
print(requests.get('http://www.cutecatvideos.net').text)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<TITLE>Cute Cat Videos</TITLE>
<META name="Description" content="Watch the best and funniest cat videos from YouTube! Includes cute cats, cat tricks, goofy cats, cats playing piano, or cats with other animals.">
<META name="Keywords" content="videos, cat">
<META name="ROBOTS" content="INDEX,FOLLOW,ALL">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META HTTP-EQUIV="Content-Language" CONTENT="en-us">
<meta http-equiv="Content-Style-Type" content="text/css">
<style>
body {FONT-FAMILY: Arial, sans-serif; font-size:1.2em}
h2 {FONT-FAMILY: Arial, sans-serif; color:#757e47; margin-top:40px; font-size:1.5em}
a.topmenu {text-decoration:none; color: rgb(153, 0, 0); font-size:13px; }
a.topmenu:hover{text-decoration:underline}
p {line-height:120%; }
table {line-height:120%; }
td.tmenu {float:left

Now we'll open iPython in two seperate terminals to demonstrate communication between client and server sockets. See the README for more info.