
---

## **Part 1 — Exploring Different Modalities / Representations of Network Traffic**

Synthetic network traffic generation is useful for many applications such as dataset augmentation, network testing, and resource management. Many existing generation methods treat traffic generation as either a time-series prediction task or an autoregressive modeling task. In these approaches, models are trained directly on structured representations of packets—one common example is **nPrint**, a tabular format that encodes packet header fields from PCAP traces into machine-learning-friendly numerical vectors.

In an **nPrint**, each row represents one packet in a trace. The entire nPrint file (a large CSV) represents all packets of that trace in sequential order. Header fields are encoded using one-hot–like binary indicators (e.g., `1`, `0`, or `01`), making them easy to feed into ML models.

However, general-purpose ML models—even advanced time-series architectures and transformers—often struggle to capture the *complex dependencies* present in real network traffic:

* **Local dependencies**: relationships among columns within a single row (i.e., dependencies among header fields of a single packet).
* **Global dependencies**: relationships across rows (i.e., the evolution of packets within the same flow).
  For example: if the first packet of a flow uses TCP, the second packet in that same flow should also be TCP; sequence numbers, flags, and flow identifiers evolve in structured ways over time.

While sequential ML models struggle with these multi-scale dependencies, **vision models** (e.g., diffusion models) excel at capturing both local and global structure when data is presented spatially—like an image. By converting nPrint traces into 2D PNG images, we can take advantage of the strong representational capabilities of image models and generate synthetic traffic using visual generative approaches.

---

### **Your Task for Part 1**

In this part of the assignment, you will:

1. **Inspect the raw nPrint files** (found in the `real_nprints` directory).
2. **Understand how each CSV row and column corresponds to packet-level metadata.**
3. **Follow the provided conversion pipeline** that transforms these nPrint CSVs into PNG image representations suitable for use with diffusion models and other visual architectures.
4. **Take an already generated set of image-representation of images and convert them back into nprint representation for downstream task utilization**

This will help you understand why converting network traces to images can unlock generative modeling capabilities that traditional ML approaches struggle with.

Q1:
First, download and unzip the data you will need from (https://drive.google.com/file/d/1hY6nNXEYOwl1l-O_nCknO9xezcHr6ZXi/view?usp=sharing)
In your own words, describe how an nPrint CSV encodes a network trace.
Why might a 2D image representation capture structural relationships that a row-by-row CSV cannot?

---
**Answer:** **An nPrint CSV encodes a network trace by turning every packet into one row of numeric features.** Each column corresponds to a specific header field or a one-hot indicator for a possible value of that field. The rows appear in the same order as the packets in the original trace. This means the CSV represents both the packet metadata and the sequence of packets, but only in a flat, table-like form.

A **2D image representation** can capture structure that a row-by-row CSV cannot. In an image, nearby pixels can represent related packet fields or adjacent packets. Vision models are very good at learning spatial patterns, so they can learn both local structure within a packet and global structure across packets. The image format also lets the model see correlations across many rows and columns at once, instead of reading the trace one packet at a time. This makes long-range dependencies easier to learn.

---

Q2. Design a method for converting nPrint representations of traces (in the folder real_nprints) into image representations.
Your image representation should use only the first 1024 packets from each trace to avoid producing images that are too large.
Save the images into a folder called './nd_data/student_converted_images'

In [None]:
# The following is a pre-defined script used in NetDiffusion that will convert all of the provided real nprints into image representations. Run this code and observe the output
# !python ./scripts/nprint_to_png.py -i ./data/nd_data/real_nprints/ -o ./data/nd_data/real_traffic_images 

I observed the output and committed it to Github. However, I decided to comment this out because it creates a lot of unnecessary output while running that will make this notebook unnecessarily long.

In [None]:
import os
import pandas as pd
import numpy as np
from PIL import Image

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)


INPUT_DIR = "data/nd_data/real_nprints"
OUTPUT_DIR = "data/nd_data/student_converted_images"
MAX_PACKETS = 1024  # first N packets to consider

os.makedirs(OUTPUT_DIR, exist_ok=True)


nprint_files = [
    os.path.join(INPUT_DIR, f)
    for f in os.listdir(INPUT_DIR)
    if f.endswith(".nprint")
]

def nprint_to_image(nprint_path, output_path, max_packets):
    """
    Convert an nPrint CSV into a grayscale PNG image.

    Assumes the nPrint is mostly binary (0/1) and pads missing
    packets with -1, which we map to mid-gray.
    """
    try:
        df = pd.read_csv(nprint_path, header=0)

        # Drop unnamed index column if it exists
        df = df.drop(columns=["Unnamed: 0"], errors="ignore")

        data_matrix = df.astype(np.float32).values

        # Keep only first max_packets packets
        data_matrix = data_matrix[:max_packets, :]

        if data_matrix.shape[0] < max_packets:
            padding_rows = max_packets - data_matrix.shape[0]
            num_cols = data_matrix.shape[1]
            padding = np.full((padding_rows, num_cols), -1.0, dtype=np.float32)
            data_matrix = np.vstack([data_matrix, padding])

        img_array = np.zeros_like(data_matrix, dtype=np.uint8)

        # Here, I prompted Google Gemini for the mapping scheme:
        # Map values:
        #   1  -> 255 (white)
        #   0  ->   0 (black)
        #  -1  -> 127 (padding, mid-gray)
        img_array[data_matrix == 1.0] = 255
        img_array[data_matrix == 0.0] = 0
        img_array[data_matrix == -1.0] = 127

        img = Image.fromarray(img_array, mode="L")
        img.save(output_path)

        # print(f"Converted: {os.path.basename(nprint_path)} -> {output_path}")

    except Exception as e:
        print(f"Error processing {nprint_path}: {e}")

print(f"Starting conversion of {len(nprint_files)} nPrint files...")
for nprint_path in nprint_files:
    base_name = os.path.basename(nprint_path)
    output_filename = base_name.replace(".nprint", ".png")
    output_path = os.path.join(OUTPUT_DIR, output_filename)

    nprint_to_image(nprint_path, output_path, MAX_PACKETS)

print("Conversion process finished.")


Starting conversion of 200 nPrint files...
Conversion process finished.


Q3: Now that you have seen how NetDiffusion converts nPrints into images, compare their method with the approach you designed in Q2. What are the advantages and disadvantages of each, especially in terms of what might help or hinder a vision model’s ability to learn?

---
**Answer:** In my approach, I read each nPrint CSV, kept only the first 1024 packets, and padded shorter traces with rows of −1. I then mapped values into a grayscale image, using white for 1, black for 0, and a uniform mid-gray for padding, so every trace becomes the same size and visually separates “real” data from padding. This produces a simple, binary-style image that looks like a packet-by-feature grid.

This makes the representation very interpretable and clear for a model to separate real data from padding. It also forces each pixel to carry simple binary information, which might make it easier for a vision model to learn basic packet-feature patterns. However, the downside is that this loses detail and treats all bits the same, so the model may not capture richer structure in the original nPrint.
NetDiffusion’s official converter uses a **multi-channel RGBA image**, which stores more information per pixel. This is more expressive and better aligned with typical vision backbones, so a diffusion model can potentially capture more complex traffic structure. The tradeoff is that the result is less interpretable and may be harder for a model to learn because it must disentangle more channels and intensity values.


---

## **Part 2 — Converting Generated Images Back Into Usable Format**

In the first part of the assignment, you explored how network traces in nPrint format can be transformed into image representations suitable for vision-based generative models. In Part 2, we focus on the reverse process: taking synthetic images produced by these models and converting them back into structured network representations.

This step is crucial because real-world applications do not operate on images—they require valid, interpretable packet traces that can be analyzed, replayed, or integrated into downstream tools.

---

### **Your Task for Part 2**

In this part of the assignment, you will:

1. **Convert generated images back into the original nPrint representation.**
   You will follow a scripted pipeline that translates pixel intensities and color channels back into binary header fields, reconstructing the packet-level structure of the trace.

2. **Apply essential post-processing techniques to correct errors introduced by diffusion models.**
   Generated images are rarely perfect—vision models may introduce color drift, pixel misalignment, noise, or structural artifacts.
   You will observe how heuristic correction, formatting enforcement, and reconstruction steps ensure that the converted nPrints become:

   * syntactically valid,
   * structurally consistent,
   * and replayable.

Across this section, your goal is to understand **why the reverse transformation is fragile**, which types of artifacts break reversibility, and how post-processing logic helps repair or compensate for generative errors.

For simplicity of this assignment, we have trainined and generated the images for you. If you have sufficient GPU access and want to try fine-tuning the model and generating the images yourself, feel free to take a look at the public repo (https://github.com/noise-lab/NetDiffusion).


Q4: We have taken the images converted by NetDiffusion and trained a LoRA-fine-tuned Stable Diffusion model (with ControlNet) to generate synthetic traffic images for you. These generated samples are stored in generated_traffic_images/.
Compare these generated images visually with the real images you saw earlier in real_traffic_images/.

Do you notice anything different between the real and generated traffic images? What immediately stands out as potentially problematic if we attempt to convert these generated images back into nPrint format? (Descriptive Only)

---
**Answer:** When I compare the generated images with the real ones, the **real_traffic_images samples feel much cleaner and more regular**. The vertical stripes look crisp and evenly spaced, and the colors fall into very clear bands. The generated sample looks a bit softer, almost like the image has been slightly blurred, and some of the colors blend into nearby areas. It also feels lower in resolution even though the dimensions are the same, almost like the details are washed out.

This becomes a concern because each pixel in a real nPrint is tied to a specific field in the packet header. The diffusion model does not reproduce that pixel exactly and seems to create small distortions that look harmless at first. Tiny changes in color or location might break the mapping back into a binary field. When that happens, the conversion step might not know whether a pixel represents a zero or a one.
Because of that, the generated image may look visually fine but might not convert back into a valid packet trace without extra repair steps.

---

Q5: If you were to design a method to convert generated images back into nPrints, how would you do it? Explain your approach and describe how your method addresses the concerns you raised in Q4. (Descriptive only)

---
**Answer:** I would basically **treat the generated image as a noisy version of the real thing.**

First make sure the image size matches what a real nPrint image should look like, so packets map to rows and features map to columns. Then look at all the pixel colors and cluster them into a few allowed colors, like the colors that represent 0, 1, or padding in the real data. Replace every pixel with whichever valid color it is closest to. That gets rid of color drift and weird gradients from Q4. After that, read those pixels back into numbers the same way the original script did. If there are obvious glitches (random speckles, rows with illegal values), smooth them by looking at neighbors or by checking basic rules about which values are allowed.

So basically treat the generated picture like a noisy version of a clean nPrint and “snap” all the colors back to legal values before turning them into numbers again.

---

Below are a set of pre-written scripts that perform the necessary post-generation augmentation and processing on the synthetic images you obtained from the diffusion model. These scripts handle tasks such as color normalization/augmentation and conversion from generated images back into nPrint format.
(The PCAP step is optional — you may run it if you are interested in observing or replaying the reconstructed traffic.)

In [17]:
# Step 1: Color Augmentation
!python ./scripts/color_processor.py \
  --input_dir="data/nd_data/generated_traffic_images" \
  --output_dir="data/nd_data/color_corrected_generated_traffic_images"

Processed 100 images.


In [None]:
# Step 2: Image-to-nPrint Conversion
# !python ./scripts/image_to_nprint.py \
#   --org_nprint ./scripts/column_example.nprint \
#   --input_dir ./data/nd_data/color_corrected_generated_traffic_images \
#   --output_dir ./data/nd_data/generated_nprint


# Did not show the outputs due to how much space it takes in the notebook.
# The results, however, are in ./data/nd_data/generated_nprint.

Q6: You may now read through the provided scripts in color_processor.py (color augmentation / normalization) and image_to_nprint.py (image → nPrint reconstruction).
How do the post-processing methods implemented in these scripts compare to the approach you proposed in Q5?
Describe the pros and cons of both methods and highlight any differences in design philosophy, robustness, or assumptions.

---
**Answer:**

My Q5 idea and the provided scripts are trying to solve the same problem, but they come at it from slightly different angles. 

**In Q5, I imagined a more general “denoising” pipeline**: align the image, then cluster pixel colors into a small set of allowed values, then snap each pixel to the nearest valid color, and finally clean up obvious glitches using neighbor information or simple rules. The mindset there is basically to treat the diffusion output as noisy data and use statistics plus consistency checks to push it back onto a discrete codebook.

**The actual scripts feel a bit more hard coded and direct.** The color processor does not learn clusters. It uses hand picked thresholds on R, G, and B, then forces every pixel to be exactly red, green, or blue, or picks whichever channel is largest. The `image_to_nprint.py` script then maps those exact RGBA values to integers and rebuilds the nPrint using the column layout from a reference file. There is no explicit neighbor smoothing or semantic validation. The pipeline assumes that after thresholding, the pixels are already “clean enough” to treat as valid bits.

So **my method is more flexible** and tries to be robust when the generator drifts further from the ideal colors, since clustering and neighbor voting can adapt to different noise patterns. The downside is extra complexity and more computation. The provided scripts are fast to run, but they rely on stronger assumptions. They expect the generated images to stay close to the original color scheme. In that sense, my Q5 design is more robust, while the given code chooses a more straightforward mapping that works well as long as the model has not wandered too far from the training distribution.



---

# **Part 3 — Using Real and Synthetic nPrints for Application Classification**

In the previous parts, you learned how network traces can be converted between nPrint and image representations, generated using diffusion models, and reconstructed back into nPrint format.
Now, you will evaluate how useful these generated nPrints are for downstream **machine learning tasks**.

Each nPrint file—whether real or generated—is labeled with the **application** that produced the traffic (e.g., `amazon_1.nprint` means this sample came from Amazon traffic).
In this section, you will treat each **entire nPrint file as a single sample** and build a simple ML pipeline to classify application labels.

To simplify the task, you will restrict your model to use **only the first 3 packets** (3 rows) from each nPrint.
This mimics “early packet classification,” where only the beginning of a flow is available.

---


Q7:

You now have access to both `real_nprints/` and `generated_nprint/`.
Notice that in both directories, files are labeled using the application associated with that nPrint (e.g., `amazon_1.nprint`).
Treat each **nPrint file** as one sample.

**Design an ML pipeline that trains a model using *synthetic nPrints* (from `generated_nprint/`) and evaluates its performance on *real nPrints* (from `real_nprints/`) to predict the correct application label.**

Your pipeline should:

1. Use only the **first 3 packets (first 3 rows)** of each nPrint file as input features.
2. Train a classifier on real data.
3. Test the classifier on generated data.
4. Report how well the classifier performs.

We have already written the script to load the real and generated nprints into DataFrame for you.

Note: **As noted by Luca on Slack, there's a slight inconsistency in wording. I feel like in class, the motivation has been using synthetic data to augment training and then checking if the model generalizes to real traffic. So I would use the more sensible interpreation: train using synthetic and evaluate performance on real data.**

In [21]:
import os
import glob
import numpy as np
import pandas as pd

# ---------------------------------------------------------------
# Safe conversion for nPrint cell
# ---------------------------------------------------------------
def safe_convert(x):
    if pd.isna(x):
        return 0
    x = str(x).strip()

    if x in ["0", "1", "-1"]:
        return int(x)

    if all(c in "01" for c in x) and len(x) <= 16:
        return int(x, 2)

    if x.lstrip("-").isdigit():
        return int(x)

    return 0


# ---------------------------------------------------------------
# Get original column names from a reference nPrint
# ---------------------------------------------------------------
def get_original_columns(example_path="data/nd_data/real_nprints"):
    first_file = glob.glob(os.path.join(example_path, "*.nprint"))[0]

    df = pd.read_csv(first_file)

    # Drop index column if present (like "Unnamed: 0")
    if df.columns[0].lower().startswith("unnamed"):
        df = df.drop(df.columns[0], axis=1)

    return list(df.columns)


# ---------------------------------------------------------------
# Load nPrint → first 3 rows → flatten with prefixed column names
# ---------------------------------------------------------------
def load_nprint_with_colnames(path, base_cols, num_rows=3):
    df = pd.read_csv(path, dtype=str, low_memory=False)

    # Drop "Unnamed: 0" if present
    if df.columns[0].lower().startswith("unnamed"):
        df = df.drop(df.columns[0], axis=1)

    df = df.iloc[:num_rows, :]            # first 3 packets  
    df = df.map(safe_convert)             # clean convert  

    # Build prefixed column names
    pkt_cols = []
    for pkt in range(1, num_rows + 1):
        pkt_cols.extend([f"pkt{pkt}_{c}" for c in base_cols])

    # Flatten 3×columns into 1 vector
    flat = df.values.flatten()

    return flat, pkt_cols


# ---------------------------------------------------------------
# Load entire directory into a DataFrame (with labels)
# ---------------------------------------------------------------
def load_directory_as_df(directory, base_cols):
    rows = []
    labels = []
    colnames_set = None

    for path in glob.glob(os.path.join(directory, "*.nprint")):
        label = os.path.basename(path).split("_")[0]

        flat, cn = load_nprint_with_colnames(path, base_cols)
        rows.append(flat)
        labels.append(label)

        if colnames_set is None:
            colnames_set = cn   # only set once

    df = pd.DataFrame(rows, columns=colnames_set)
    df["label"] = labels
    return df


# ---------------------------------------------------------------
# FINAL: Load real + synthetic DataFrames
# ---------------------------------------------------------------
base_cols = get_original_columns("data/nd_data/real_nprints")

df_synth = load_directory_as_df("data/nd_data/generated_nprint", base_cols)
df_real  = load_directory_as_df("data/nd_data/real_nprints", base_cols)

print("Synthetic DF:", df_synth.shape)
print(df_synth.head())

print("\nReal DF:", df_real.shape)
print(df_real.head())


Synthetic DF: (100, 3265)
   pkt1_ipv4_ver_0  pkt1_ipv4_ver_1  pkt1_ipv4_ver_2  pkt1_ipv4_ver_3  \
0                1                0                0                0   
1                1                0                0                0   
2                0                0                0                0   
3                0                0                0                0   
4                1                1                1                1   

   pkt1_ipv4_hl_0  pkt1_ipv4_hl_1  pkt1_ipv4_hl_2  pkt1_ipv4_hl_3  \
0               0               0               0               1   
1               0               0               0               1   
2               0               0               0               0   
3               0               0               0               0   
4               1               1               1               1   

   pkt1_ipv4_tos_0  pkt1_ipv4_tos_1  ...  pkt3_icmp_roh_23  pkt3_icmp_roh_24  \
0                0                0  ...

In [25]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report


X_train = df_synth.drop(columns=["label"]).values
y_train_str = df_synth["label"].values

X_test = df_real.drop(columns=["label"]).values
y_test_str = df_real["label"].values

# Encode labels as integers
le = LabelEncoder()
y_train = le.fit_transform(y_train_str)
y_test = le.transform(y_test_str)  

clf = make_pipeline(
    StandardScaler(),             # scale all features
    LogisticRegression(max_iter=1000)
)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Accuracy on REAL nprints when training on SYNTHETIC:", acc)

print("\nDetailed report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))


Accuracy on REAL nprints when training on SYNTHETIC: 0.205

Detailed report:
              precision    recall  f1-score   support

      amazon       0.00      0.00      0.00        20
    facebook       0.17      0.15      0.16        20
   instagram       0.17      0.05      0.08        20
        meet       0.55      0.30      0.39        20
     netflix       0.00      0.00      0.00        20
       teams       0.18      0.60      0.28        20
      twitch       0.00      0.00      0.00        20
     twitter       0.20      0.95      0.33        20
     youtube       0.00      0.00      0.00        20
        zoom       0.00      0.00      0.00        20

    accuracy                           0.20       200
   macro avg       0.13      0.20      0.12       200
weighted avg       0.13      0.20      0.12       200



  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  raw_prediction = X @ weights.T + intercept  # ndarray, likely C-contiguous
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights
  grad[:, :n_features] = grad_pointwise.T @ X + l2_reg_strength * weights
  ret = a @ b
  ret = a @ b
  ret = a @ b
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


**What I did.**
I loaded the synthetic and real `.nprint` files and used the given helper functions to keep only the first three packets from each file. Each 3×columns matrix was flattened into one feature vector and labeled by the application name in the filename. I then used logistic regression to fit a model on the synthetic data and tested it only on the real traces.

**What happened.**
The accuracy on real data ended up around 0.20, and the model mostly predicted one or two labels. Several classes were never predicted at all. Logistic regression also printed a bunch of numerical warnings, which suggests it struggled with the very uneven feature scales produced by the reconstruction step. Overall, the low performance tells me that training on synthetic data does not transfer very well to the real distribution, and logistic regression is not very stable with these raw features.


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

X_train = df_synth.drop(columns=["label"]).values
X_test  = df_real.drop(columns=["label"]).values

le = LabelEncoder()
y_train = le.fit_transform(df_synth["label"])
y_test  = le.transform(df_real["label"])

rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=0,
    n_jobs=-1
)

rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy on REAL nprints when training on SYNTHETIC (RandomForest): {acc:.3f}\n")
print("Detailed report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))


Accuracy on REAL nprints when training on SYNTHETIC (RandomForest): 0.230

Detailed report:
              precision    recall  f1-score   support

      amazon       0.00      0.00      0.00        20
    facebook       0.00      0.00      0.00        20
   instagram       0.43      0.30      0.35        20
        meet       0.20      0.05      0.08        20
     netflix       0.67      0.10      0.17        20
       teams       0.23      0.90      0.36        20
      twitch       0.00      0.00      0.00        20
     twitter       0.20      0.95      0.32        20
     youtube       0.00      0.00      0.00        20
        zoom       0.00      0.00      0.00        20

    accuracy                           0.23       200
   macro avg       0.17      0.23      0.13       200
weighted avg       0.17      0.23      0.13       200



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


**Short interpretation.**
I replaced logistic regression with a RandomForest to avoid the instability I saw earlier. The tree model runs cleanly, but the accuracy on real data is still low. The classifier again focuses on a few labels and rarely predicts the others. That suggests the main problem is not the model choice, but the difference between the synthetic distribution and the real one. In other words, training only on generated nPrints does not transfer well to real traffic, so the performance remains limited even with a more flexible model.

In [27]:
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

X_train = df_synth.drop(columns=["label"]).values
X_test  = df_real.drop(columns=["label"]).values

le = LabelEncoder()
y_train = le.fit_transform(df_synth["label"])
y_test  = le.transform(df_real["label"])

svc = SVC(kernel="rbf", C=1.0, gamma="scale")
svc.fit(X_train, y_train)

y_pred = svc.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy on REAL nprints when training on SYNTHETIC (SVC): {acc:.3f}\n")
print("Detailed report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))


Accuracy on REAL nprints when training on SYNTHETIC (SVC): 0.280

Detailed report:
              precision    recall  f1-score   support

      amazon       0.00      0.00      0.00        20
    facebook       0.32      0.60      0.41        20
   instagram       0.40      0.30      0.34        20
        meet       0.35      0.80      0.48        20
     netflix       0.18      0.20      0.19        20
       teams       0.00      0.00      0.00        20
      twitch       0.00      0.00      0.00        20
     twitter       0.24      0.90      0.38        20
     youtube       0.00      0.00      0.00        20
        zoom       0.00      0.00      0.00        20

    accuracy                           0.28       200
   macro avg       0.15      0.28      0.18       200
weighted avg       0.15      0.28      0.18       200



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


**Quick interpretation.**
I tried an SVC with an RBF kernel using the same setup (train on generated, test on real). It gave slightly higher accuracy than logistic or random forest, which suggests the nonlinear decision boundary helps a bit with these reconstructed features. The improvement is still pretty limited though, and the model still predicts only a few labels on real traffic. That pattern reinforces the idea that the main challenge is the domain gap between synthetic and real nPrints, not just the choice of classifier.

**Conclusion**: The low accuracy mainly comes from a mismatch between the synthetic reconstruction and real traffic (Diffusion already smooths and distorts the original structure. Then the color-rounding and integer reconstruction change the values again. So the feature distribution of synthetic samples might not fully line up with real traces.), combined with the fact that only the first few packets provide weak information for distinguishing applications. The diffusion process adds noise that makes different applications look even more similar, so a model trained on synthetic nPrints does not transfer well to real data.