Part 1: Metrics and Plots (40 pts) From the chosen X.pcap file, extract and generate the following metrics for the data as captured by your program when you perform the pcap replay using tools like tcpreplay:

1: Find the total amount of data transferred (in bytes), the total number of packets transferred, and the 1 minimum, maximum, and average packet sizes. Also, show the distribution of packet sizes (eg. by plotting a histogram of packet sizes). 

In [5]:
from scapy.all import rdpcap, IP
import matplotlib.pyplot as plt

def analyze_packet_sizes(pcap_file):
    packets = rdpcap(pcap_file)
    
    total_data = 0
    total_packets = len(packets)
    min_size = float('inf')
    max_size = 0
    size_list = []

    # Loop over all packets in the pcap file
    for packet in packets:
        if packet.haslayer(IP):  # Only process packets with IP layer
            size = len(packet)
            total_data += size
            size_list.append(size)
            min_size = min(min_size, size)
            max_size = max(max_size, size)

    # Calculate average packet size
    avg_size = total_data / total_packets if total_packets else 0

    # Print the results
    print(f"Total Data Transferred: {total_data} bytes")
    print(f"Total Packets: {total_packets}")
    print(f"Min Packet Size: {min_size} bytes")
    print(f"Max Packet Size: {max_size} bytes")
    print(f"Average Packet Size: {avg_size} bytes")

    # Plot the distribution of packet sizes
    if size_list:
        plt.hist(size_list, bins=50)
        plt.title("Packet Size Distribution")
        plt.xlabel("Packet Size (Bytes)")
        plt.ylabel("Frequency")
        plt.show()
    else:
        print("No packets with size information to plot.")


2: Find unique source-destination pairs (source IP port and destination IP port) in the captured data

In [6]:
def unique_source_dest_pairs(pcap_file):
    packets = rdpcap(pcap_file)
    unique_pairs = set()

    # Loop over all packets in the pcap file
    for packet in packets:
        if packet.haslayer(IP):  # Only process packets with IP layer
            src = packet[IP].src
            dst = packet[IP].dst
            unique_pairs.add((src, dst))

    # Print the unique source-destination pairs
    print(f"Unique Source-Destination Pairs: {len(unique_pairs)}")
    print(unique_pairs)


3:Display a dictionary where the key is the IP address and the value is the total flows for that IP address as the source. Similarly display a dictionary where the key is the IP address and the value is the total flows for that IP address as the destination. Find out which source-destination (source IP port and destination IP port) have transferred the most data

In [7]:
def flow_counts(pcap_file):
    packets = rdpcap(pcap_file)
    src_flow_dict = {}
    dst_flow_dict = {}

    # Loop over all packets in the pcap file
    for packet in packets:
        if packet.haslayer(IP):  # Only process packets with IP layer
            src = packet[IP].src
            dst = packet[IP].dst

            # Update source flow count
            if src in src_flow_dict:
                src_flow_dict[src] += 1
            else:
                src_flow_dict[src] = 1

            # Update destination flow count
            if dst in dst_flow_dict:
                dst_flow_dict[dst] += 1
            else:
                dst_flow_dict[dst] = 1

    # Print the flow counts
    print(f"Source Flow Counts: {src_flow_dict}")
    print(f"Destination Flow Counts: {dst_flow_dict}")


4:List the top speed in terms of pps and mbps' that your program is able to capture the content without any loss of data when ij running both topreplay and your program on the same machine (VM), and s when running on oifferent machines Two student group should run the program on two different machines, eg topreplay on physical machine of studentt and sniffer program physical-machine of Muten. Single students should run between two VMs

In [8]:
def max_data_flow(pcap_file):
    packets = rdpcap(pcap_file)
    flow_data_dict = {}

    # Loop over all packets in the pcap file
    for packet in packets:
        if packet.haslayer(IP):  # Only process packets with IP layer
            src = packet[IP].src
            dst = packet[IP].dst
            flow_pair = (src, dst)
            packet_size = len(packet)

            # Update flow data count
            if flow_pair in flow_data_dict:
                flow_data_dict[flow_pair] += packet_size
            else:
                flow_data_dict[flow_pair] = packet_size

    # Find the flow with the maximum data transferred
    max_flow = max(flow_data_dict, key=flow_data_dict.get)
    print(f"Source-Destination Pair with Most Data: {max_flow}")
    print(f"Total Data Transferred: {flow_data_dict[max_flow]} bytes")
