# Overview

Total entries analyzed: 964

Abstraction values frequency:
- Base: 536 (55.6%)
- Variant: 299 (31.0%)
- Class: 112 (11.6%)
- Pillar: 10 (1.0%)
- Compound: 7 (0.7%)

Usage values frequency:
- Allowed: 752 (78.0%)
- Allowed-with-Review: 86 (8.9%)
- Prohibited: 84 (8.7%)
- Discouraged: 42 (4.4%)

 A "Compound" weakness is a meaningful aggregation of several weaknesses, currently known as either a Chain or Composite per https://cwe.mitre.org/documents/schema/#AbstractionEnumeration


View is 1003 "CWE Simplified Mapping" per https://cwe.mitre.org/data/downloads.html
- it contains 130 CWEs
- as used by NVD https://nvd.nist.gov/vuln/categories
- This is not in the CWE JSON file i.e. the file does not say what views a CWE is in.

### Output 
````
[
  {
    "ID": "1004",
    "Abstraction": "Variant",
    "Usage": "Allowed",
    "1003_view": false
  },
  {
    "ID": "1007",
    "Abstraction": "Base",
    "Usage": "Allowed",
    "1003_view": false
  },
  {
    ...
````

In [30]:
from IPython.core.magic import register_cell_magic
from IPython.display import Markdown
import datetime
from datetime import date
import glob
import json
import jsonlines
import logging
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly
import warnings
import csv
import os



In [31]:

cwe_file = './data_out/cwe_trimmed_top25.jsonl' # contains 964 CWE-IDs
cwe_1003_file = './data_in/1003.csv' #contains 130 CWE-IDs
cwe_meta_file = './data_out/cwe_meta_data.json'

In [32]:
def get_unique_values(cwe_file):
    """
    Extract, count and print unique values for Abstraction and Usage fields from the JSONL file.
    """
    # Using dictionaries to store counts
    abstraction_counts = {}
    usage_counts = {}
    total_entries = 0
    
    with open(cwe_file, 'r') as file:
        for line in file:
            try:
                total_entries += 1
                data = json.loads(line.strip())
                
                # Count Abstraction values
                if 'Abstraction' in data:
                    abs_value = data['Abstraction']
                    abstraction_counts[abs_value] = abstraction_counts.get(abs_value, 0) + 1
                
                # Count Usage values
                usage = data.get('MappingNotes', {}).get('Usage')
                if usage:
                    usage_counts[usage] = usage_counts.get(usage, 0) + 1
                    
            except json.JSONDecodeError as e:
                print(f"Error parsing JSON line: {e}")
                continue
    
    print(f"\nTotal entries analyzed: {total_entries}")
    
    print("\nAbstraction values frequency:")
    for abs_value, count in sorted(abstraction_counts.items()):
        percentage = (count / total_entries) * 100
        print(f"- {abs_value}: {count} ({percentage:.1f}%)")
        
    print("\nUsage values frequency:")
    for usage_value, count in sorted(usage_counts.items()):
        percentage = (count / total_entries) * 100
        print(f"- {usage_value}: {count} ({percentage:.1f}%)")
    
    return {
        'total_entries': total_entries,
        'abstraction_counts': abstraction_counts,
        'usage_counts': usage_counts
    }

In [33]:
def extract_and_update_fields(cwe_file, cwe_1003_file):
    # First, read the CWE-1003 CSV file and create a set of IDs
    cwe_1003_ids = set()
    with open(cwe_1003_file, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            # Assuming the column is named 'CWE-ID', adjust if different
            cwe_id = row.get('CWE-ID', '').strip()
            if cwe_id:
                cwe_1003_ids.add(cwe_id)
    
    # List to store the extracted data
    extracted_data = []
    
    # Read the JSONL file line by line
    with open(cwe_file, 'r') as file:
        for line in file:
            try:
                # Parse each line as JSON
                data = json.loads(line.strip())
                
                # Extract the fields we want
                extracted = {
                    'ID': data.get('ID'),
                    'Name': data.get('Name'),
                    'Abstraction': data.get('Abstraction'),
                    'Usage': data.get('MappingNotes', {}).get('Usage')
                }
                
                # Add 1003_view field if ID is in the CSV
                if extracted['ID'] in cwe_1003_ids:
                    extracted['1003_view'] = True
                else:
                    extracted['1003_view'] = False
                
                extracted_data.append(extracted)
                
            except json.JSONDecodeError as e:
                print(f"Error parsing JSON line: {e}")
                continue
    
    # Write the extracted data to a new JSON file
    output_file = cwe_meta_file
    with open(output_file, 'w') as f:
        json.dump(extracted_data, f, indent=2)
    
    print(f"Data has been extracted and saved to {output_file}")

In [34]:
#https://cwe.mitre.org/data/downloads.html CWE Simplified Mapping
df_cwe_1003 = pd.read_csv(cwe_1003_file, usecols=["CWE-ID"])
df_cwe_1003

Unnamed: 0,CWE-ID
20,Improper Input Validation
22,Improper Limitation of a Pathname to a Restric...
59,Improper Link Resolution Before File Access ('...
74,Improper Neutralization of Special Elements in...
77,Improper Neutralization of Special Elements us...
...,...
1188,Initialization of a Resource with an Insecure ...
1236,Improper Neutralization of Formula Elements in...
1284,Improper Validation of Specified Quantity in I...
1321,Improperly Controlled Modification of Object P...


In [35]:
unique_values = get_unique_values(cwe_file)


Total entries analyzed: 964

Abstraction values frequency:
- Base: 536 (55.6%)
- Class: 112 (11.6%)
- Compound: 7 (0.7%)
- Pillar: 10 (1.0%)
- Variant: 299 (31.0%)

Usage values frequency:
- Allowed: 752 (78.0%)
- Allowed-with-Review: 86 (8.9%)
- Discouraged: 42 (4.4%)
- Prohibited: 84 (8.7%)


In [36]:
extract_and_update_fields(cwe_file, cwe_1003_file)

Data has been extracted and saved to ./data_out/cwe_meta_data.json
