# FIT3182- Big data management and processing

Sockets can be configured to act as a server and listen for incoming messages, or connect to other applications as a client. After both ends of a TCP/IP socket are connected, communication is bi-directional.

This sample program is based on the standard library documentation. It selects random lines of text from the array of text and sends it to the client. It starts by creating a TCP/IP socket, then bind() is used to associate the socket with the server address. In this case, the address is localhost, referring to the current server, and the port number is 9999.

```
# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the port
server_address = ('localhost', 9999)
print('Starting up on {} port {}'.format(*server_address))
sock.bind(server_address)
```
Calling listen() puts the socket into server mode, and accept() waits for an incoming connection. The integer argument is the number of connections the system should queue up in the background before rejecting new clients. This example only expects to work with one connection at a time.

```
# Listen for incoming connections
sock.listen(1)
```
accept() returns an open connection between the server and client, along with the address of the client. The connection is actually a different socket on another port (assigned by the kernel). Data is read from the connection with recv() and transmitted with sendall().

When communication with a client is finished, the connection needs to be cleaned up using close(). This example uses a try:finally block to ensure that close() is always called, even in the event of an error.

```
while True:
    # Wait for a connection
    print('Waiting for a connection')
    connection, client_address = sock.accept()
    try:
        print('Connection from', client_address)
        # Get the random line from lines array
        line = lines[random.randrange(6)]
        connection.sendall(line.encode())
        print (line)
        time.sleep(5)
    finally:
        # Clean up the connection
        connection.close()

```

Let's run this application. First, we need to wrap up the above code as below:

**IPV4**
- 32-bit length/size
- Format: Dotted decimal notations
    - 192.168.1.1
- Number of addresses = 2^32

**IPV6**
- 128-bit length/size
- Format: Hexadecimal notations
    - Full --> 2001:0000:130F:0000:0000:09C0:867A:130B
    - Leading zeroes --> 2001:0:130F:0:0:9C0:867A:130B
    - 2/more consecutive groups of zero removed --> 2001:0:130F::9C0:867A:130B
- Number of addresses = 2^128

**Summary**

IPv4 is a 32-bit address scheme allowing for 4.3 billion unique addresses, represented in dotted-decimal format like 192.168.1.1. In contrast, IPv6 is a 128-bit address scheme, which increases the address space significantly, represented in hexadecimal format like 2001:0db8:85a3:0000:0000:8a2e:0370:7334

In [1]:
# Import statements
import socket
import sys
import time
import random
import datetime

# An array holding 6 lines
lines = ["This is line 0",
        "This is line 1",
        "This is line 2",
        "This is line 3",
        "This is line 4",
        "This is line 5"]

# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# AF_INET = This is IPV4. Specifies the address family
# SOCK_STREAM = This is a stream socket. Specficies type of socket to create

# Bind the socket to the port
server_address = ('localhost', 9999) 
# localhost normally is 127.0.0.1
# ('', 9999)  and ('0.0.0.0', 9999) means everything/anything
print('['+ str(datetime.datetime.now())+'] Starting streaming server on {}:{}'.format(*server_address))
sock.bind(server_address)

# IMPORTANT
# Listen for incoming connections
sock.listen(1)
# 1 means they can only handle 1 connection request at a time
# If >1 client try to connect at the same time then,
# additional connection request will be queued until server finish process current request

while True:
    try:
        # Wait for a connection
        print('['+ str(datetime.datetime.now())+'] Waiting for a connection...')
        connection, client_address = sock.accept() # means to accept the connection and return a tuple (IP address, Port)
        print('['+ str(datetime.datetime.now())+'] Connection from', client_address)
        
        while True:
            # Get the random line from lines array
            line = lines[random.randrange(6)] + '\n'
            connection.sendall(line.encode()) 
            # 'sendall' method is to send all bytes
            # 'line.encode' is to encode to utf-8
            print ('Data Sent: ' + line)
            time.sleep(1)
    finally:
        # Clean up the connection
        print('Closing server')
        connection.close()
        break

[2024-04-30 04:21:30.404152] Starting streaming server on localhost:9999
[2024-04-30 04:21:30.405435] Waiting for a connection...
[2024-04-30 04:21:30.602727] Connection from ('127.0.0.1', 43470)
Data Sent: This is line 0

Data Sent: This is line 2

Data Sent: This is line 0

Data Sent: This is line 2

Data Sent: This is line 1

Data Sent: This is line 2

Data Sent: This is line 2

Data Sent: This is line 3

Data Sent: This is line 2

Data Sent: This is line 5

Data Sent: This is line 3

Data Sent: This is line 5

Data Sent: This is line 0

Data Sent: This is line 1

Data Sent: This is line 4

Data Sent: This is line 1

Data Sent: This is line 2

Data Sent: This is line 1

Data Sent: This is line 5

Data Sent: This is line 1

Data Sent: This is line 3

Data Sent: This is line 3

Data Sent: This is line 2

Data Sent: This is line 1

Data Sent: This is line 5

Data Sent: This is line 5

Data Sent: This is line 1

Data Sent: This is line 3

Data Sent: This is line 0

Data Sent: This is li

- connection: This is the socket object associated with the connection.
- sendall(): A method of the socket object that sends all the data until everything is sent or an error occurs.
- line.encode(): Encodes the string line into bytes, which is the required format for sending data over a socket.

```
connection.sendall(line.encode()) 
```
command is used to send the entire line string as bytes over the established socket connection to the remote computer.

### Importing is important
```
import sys

from pyspark import SparkContext # spark
from pyspark.streaming import StreamingContext # spark streaming 
```