# Milestone 1 - NFStream Code Walkthrough 
## Name: Alexander James, Joshua Ludolf, & Matthew Trevino
``Date: 02-25-2025``
### Description of this file:
This Jupyter Notebook provides a comprehensive walkthrough of using the `nfstream` library for network traffic analysis. The notebook includes the following sections:

1. **Installation of Requirements**: Installing the necessary `nfstream` library.
2. **Importing Libraries**: Importing the required libraries for network traffic analysis.
3. **Library Information**: Detailed information about the `nfstream` library and its key features.
4. **Main Function**: Defining and executing the main function to read a pcap file, analyze the network flows, and convert the flows into a pandas DataFrame for further analysis and visualization.


In [1]:
# Installing Requirements
%pip install nfstream

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Importing the required libraries
from nfstream import NFStreamer
import pandas as pd
import os

## More Information about the nfstream Library

The `nfstream` library is a Python package designed for network traffic analysis. It provides a high-level interface to capture, analyze, and process network flows from pcap files or live network interfaces. The library is built on top of the nDPI (ntop Deep Packet Inspection) library, which allows it to perform deep packet inspection and extract detailed information from network traffic.

### Key Features:
- **Flow-based Analysis**: Extracts and processes network flows, which are sequences of packets sharing common attributes such as source and destination IP addresses, ports, and protocols.
- **Deep Packet Inspection**: Uses nDPI to classify traffic and extract metadata from packets, including application protocols, SSL/TLS information, and more.
- **Customizable**: Allows users to define custom BPF (Berkeley Packet Filter) filters, set snapshot lengths, and configure various timeout settings for flow analysis.
- **Integration with Pandas**: Converts network flows into Pandas DataFrames for easy manipulation, analysis, and visualization.
- **Performance Reporting**: Provides performance metrics and reports for the analyzed traffic.



In [3]:
# Defining the main function
def main():
    pcap_file = os.path.join(os.getcwd(), "demo.pcap")

    # Check if the file exists
    if not os.path.isfile(pcap_file):
        raise FileNotFoundError(f"The file {pcap_file} does not exist.")

    # Create a NFStreamer object to analyze pcap file and extract flows

    my_streamer = NFStreamer(source=pcap_file,
                             decode_tunnels=True,
                             bpf_filter=None,
                             promiscuous_mode=True,
                             snapshot_length=1093,
                             idle_timeout=120,
                             active_timeout=1800,
                             accounting_mode=0,
                             udps=None,
                             n_dissections=20,
                             statistical_analysis=False,
                             splt_analysis=0,
                             n_meters=0,
                             max_nflows=0,
                             performance_report=0,
                             system_visibility_mode=0,
                             system_visibility_poll_ms=100)
    

    my_dataframe = my_streamer.to_pandas(columns_to_anonymize=[]).set_index('id') # convert the flows to a pandas dataframe and set 'id' as the index column
    print(f"\n{my_dataframe}") # print the dataframe
    
    total_flows_count = my_streamer.to_csv(path=None, columns_to_anonymize=[], flows_per_file=0, rotate_files=0) # convert the flows to a csv file
    
if __name__ == '__main__':
    main()


    expiration_id           src_ip            src_mac   src_oui  src_port  \
id                                                                          
0               0   196.72.132.179  00:0c:41:82:b2:55  00:0c:41         0   
1               0  254.142.107.185  00:0c:41:82:b2:55  00:0c:41         0   
2               0     71.34.18.237  00:0d:93:82:36:3a  00:0d:93         0   
3               0      95.23.88.82  00:0c:41:82:b2:55  00:0c:41         0   
4               0    67.197.56.252  00:0d:93:82:36:3a  00:0d:93         0   
5               0       147.4.3.34  00:0c:41:82:b2:55  00:0c:41         0   
6               0   191.235.101.72  00:0d:93:82:36:3a  00:0d:93         0   
7               0  129.243.140.209  00:0c:41:82:b2:55  00:0c:41         0   

             dst_ip            dst_mac   dst_oui  dst_port  protocol  ...  \
id                                                                    ...   
0     62.141.236.37  00:0c:41:82:b2:53  00:0c:41         0       235  ... 