<a href="https://colab.research.google.com/github/MrDev333/PhishingDetect/blob/main/tool/awskeybreaker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Tạo tool giải mã và xóa AWS Secret Access Key khỏi một file

## Xác định định dạng và loại file

### Subtask:
Công cụ cần nhận diện các loại file phổ biến (ví dụ: `.env`, `.yml`, `.json`, `.txt`, mã nguồn Python/Java/Node.js, v.v.) và các định dạng mà AWS Secret Access Key có thể xuất hiện trong đó (ví dụ: biến môi trường, trường trong JSON/YAML, giá trị trong file cấu hình key=value, v.v.).


**Reasoning**:
Define the list of common file extensions and the potential formats for AWS Secret Access Keys within those file types, as requested by the instructions.



In [None]:
file_formats = {
    ".env": ["KEY=VALUE"],
    ".yml": ["key: value"],
    ".yaml": ["key: value"],
    ".json": ['"key": "value"'],
    ".txt": ["KEY=VALUE", '"key": "value"', "key: value", "plain_text"],
    ".py": ["KEY = 'VALUE'", 'KEY = "VALUE"'],
    ".java": ["String KEY = \"VALUE\";"],
    ".js": ["const KEY = 'VALUE';", "let KEY = 'VALUE';", "var KEY = 'VALUE';"],
    ".sh": ["export KEY=VALUE"]
}

print("Common file extensions and potential AWS Secret Access Key formats:")
for ext, formats in file_formats.items():
    print(f"File extension: {ext}")
    print(f"Potential formats: {', '.join(formats)}")

Common file extensions and potential AWS Secret Access Key formats:
File extension: .env
Potential formats: KEY=VALUE
File extension: .yml
Potential formats: key: value
File extension: .yaml
Potential formats: key: value
File extension: .json
Potential formats: "key": "value"
File extension: .txt
Potential formats: KEY=VALUE, "key": "value", key: value, plain_text
File extension: .py
Potential formats: KEY = 'VALUE', KEY = "VALUE"
File extension: .java
Potential formats: String KEY = "VALUE";
File extension: .js
Potential formats: const KEY = 'VALUE';, let KEY = 'VALUE';, var KEY = 'VALUE';
File extension: .sh
Potential formats: export KEY=VALUE


## Phân tích nội dung file

### Subtask:
Đọc và phân tích nội dung file để tìm kiếm các chuỗi có thể là AWS Secret Access Key. Cần sử dụng các biểu thức chính quy (regular expressions) để nhận diện pattern của AWS Access Key ID (bắt đầu bằng "AKIA") và AWS Secret Access Key.


**Reasoning**:
Import the `re` module and define the regular expressions for AWS Access Key ID and Secret Access Key.



In [None]:
import re

# Regular expression for AWS Access Key ID (starts with "AKIA" followed by 16 uppercase letters or digits)
aws_access_key_id_pattern = re.compile(r'AKIA[0-9A-Z]{16}')

# Regular expression for AWS Secret Access Key (40 characters consisting of uppercase letters, lowercase letters, digits, '+', '/')
aws_secret_access_key_pattern = re.compile(r'[0-9a-zA-Z+/]{40}')

print("Regular expressions for AWS Access Key ID and Secret Access Key defined.")

Regular expressions for AWS Access Key ID and Secret Access Key defined.


**Reasoning**:
Create a function to read the file content, search for the defined patterns, and return the potential matches with their locations.



In [None]:
def find_aws_keys_in_file(file_path):
    """
    Reads a file and searches for potential AWS Access Key IDs and Secret Access Keys.

    Args:
        file_path: The path to the file to scan.

    Returns:
        A list of dictionaries, where each dictionary represents a potential match
        and contains the line number, the matched string, and the key type
        ('Access Key ID' or 'Secret Access Key').
    """
    potential_matches = []
    try:
        with open(file_path, 'r') as f:
            for line_num, line in enumerate(f, 1):
                # Search for Access Key IDs
                access_key_matches = aws_access_key_id_pattern.findall(line)
                for match in access_key_matches:
                    potential_matches.append({
                        'line_number': line_num,
                        'matched_string': match,
                        'key_type': 'Access Key ID'
                    })

                # Search for Secret Access Keys
                secret_key_matches = aws_secret_access_key_pattern.findall(line)
                for match in secret_key_matches:
                    potential_matches.append({
                        'line_number': line_num,
                        'matched_string': match,
                        'key_type': 'Secret Access Key'
                    })
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}")
    except Exception as e:
        print(f"An error occurred while reading the file: {e}")

    return potential_matches

print("Function find_aws_keys_in_file defined.")

Function find_aws_keys_in_file defined.


## Xử lý các định dạng encoding

### Subtask:
Nếu key được lưu dưới dạng base64, URL encoded, hoặc các định dạng khác, công cụ cần có khả năng giải mã chúng về dạng gốc trước khi xử lý. Tuy nhiên, như đã nói ở trên, Secret Access Key bản thân nó không phải là dữ liệu mã hóa cần giải mã theo nghĩa thông thường. Có thể đây là bước bạn muốn xử lý các trường hợp key được nhúng trong chuỗi phức tạp.


**Reasoning**:
Import necessary libraries for decoding potential key strings.



In [None]:
import base64
import urllib.parse

**Reasoning**:
Define a function to decode potential key strings using base64 and URL decoding, and handle exceptions.



In [None]:
def decode_potential_key(potential_key_string):
    """
    Attempts to decode a potential key string using base64 and URL decoding.

    Args:
        potential_key_string: The string that might be encoded.

    Returns:
        The decoded string if decoding is successful, otherwise the original string.
    """
    decoded_string = potential_key_string

    # Try Base64 decoding
    try:
        decoded_bytes = base64.b64decode(potential_key_string)
        # Check if the decoded bytes can be decoded to a UTF-8 string
        try:
            base64_decoded = decoded_bytes.decode('utf-8')
            # A basic check if the base64 decoded string might be a valid key pattern
            # (optional, adjust pattern as needed)
            if re.fullmatch(aws_secret_access_key_pattern, base64_decoded):
                 decoded_string = base64_decoded
        except UnicodeDecodeError:
            pass # Not a valid UTF-8 string after base64 decode

    except (base64.binascii.Error, TypeError):
        pass # Not valid base64

    # Try URL decoding (only if not successfully base64 decoded)
    if decoded_string == potential_key_string:
        url_decoded = urllib.parse.unquote(potential_key_string)
        # A basic check if the URL decoded string might be a valid key pattern
        # (optional, adjust pattern as needed)
        if re.fullmatch(aws_secret_access_key_pattern, url_decoded):
             decoded_string = url_decoded


    return decoded_string

## Xóa hoặc thay thế key

### Subtask:
Sau khi xác định vị trí của key, công cụ sẽ thực hiện xóa hoàn toàn dòng/trường chứa key, hoặc thay thế giá trị của key bằng một chuỗi rỗng hoặc placeholder an toàn.


**Reasoning**:
Define the functions `redact_key_in_line` and `redact_keys_in_file` to redact the identified keys based on their type and process the file content line by line, applying the redaction.



In [None]:
def redact_key_in_line(line, match):
    """
    Redacts a matched AWS key in a given line of text.

    Args:
        line: The line of text.
        match: A dictionary containing match information ('matched_string', 'key_type').

    Returns:
        The modified line with the key redacted.
    """
    matched_string = match['matched_string']
    key_type = match['key_type']

    if key_type == 'Access Key ID':
        placeholder = '[REDACTED_ACCESS_KEY_ID]'
    elif key_type == 'Secret Access Key':
        placeholder = '[REDACTED_SECRET_ACCESS_KEY]'
    else:
        # Should not happen with current logic, but as a fallback
        placeholder = '[REDACTED_AWS_KEY]'

    # Replace the matched string with the placeholder
    # Use replace to handle multiple occurrences in a single line if needed
    modified_line = line.replace(matched_string, placeholder)

    return modified_line

def redact_keys_in_file(file_path, potential_matches):
    """
    Reads a file, redacts the identified keys, and returns the modified lines.

    Args:
        file_path: The path to the file.
        potential_matches: A list of dictionaries with potential key matches.

    Returns:
        A list of modified lines with keys redacted.
    """
    modified_lines = []
    try:
        with open(file_path, 'r') as f:
            for line_num, line in enumerate(f, 1):
                current_line = line
                # Check if any of the potential matches are on the current line
                for match in potential_matches:
                    if match['line_number'] == line_num and match['matched_string'] in current_line:
                        current_line = redact_key_in_line(current_line, match)
                modified_lines.append(current_line)
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}")
        return None
    except Exception as e:
        print(f"An error occurred while processing the file: {e}")
        return None

    return modified_lines

print("Functions redact_key_in_line and redact_keys_in_file defined.")

Functions redact_key_in_line and redact_keys_in_file defined.


## Lưu file đã chỉnh sửa

### Subtask:
Lưu lại nội dung file sau khi đã xóa/thay thế key.


**Reasoning**:
Define the `save_modified_file` function to save the modified content back to the file.



In [None]:
def save_modified_file(file_path, modified_lines):
    """
    Saves the modified lines back to the specified file.

    Args:
        file_path: The path to the file to save.
        modified_lines: A list of strings representing the modified lines.
    """
    try:
        with open(file_path, 'w') as f:
            for line in modified_lines:
                f.write(line)
        print(f"Successfully saved modified file: {file_path}")
    except IOError as e:
        print(f"Error saving file {file_path}: {e}")

print("Function save_modified_file defined.")

Function save_modified_file defined.


## Thêm cảnh báo bảo mật

### Subtask:
Công cụ nên cung cấp cảnh báo cho người dùng về tầm quan trọng của việc xử lý khóa bí mật và khuyến nghị các biện pháp bảo mật tốt hơn (ví dụ: sử dụng AWS Secrets Manager, biến môi trường an toàn, v.v.).


**Reasoning**:
Define the `print_security_warning` function to inform the user about security risks and recommend better practices.



In [None]:
def print_security_warning():
    """Prints a security warning about storing AWS keys in files and suggests better practices."""
    print("\n" + "="*60)
    print("SECURITY WARNING: POTENTIAL AWS KEYS FOUND IN FILE!")
    print("Storing AWS Access Keys and Secret Access Keys directly in files")
    print("is a significant security risk. If this file is compromised,")
    print("your AWS resources could be accessed by unauthorized individuals.")
    print("\nRecommendations for Secure Key Management:")
    print("- AWS Secrets Manager: Securely store, manage, and retrieve secrets.")
    print("- AWS Systems Manager Parameter Store: Store configuration data and secrets securely.")
    print("- Environment Variables: Use environment variables for credentials, but ensure")
    print("  they are managed securely and not exposed (e.g., in logs).")
    print("- IAM Roles: Assign IAM roles to your EC2 instances or other AWS services")
    print("  instead of embedding keys directly in applications or configuration files.")
    print("\nIt is highly recommended to rotate your keys immediately if they have")
    print("been exposed and adopt a more secure method for credential management.")
    print("="*60 + "\n")

print("Function print_security_warning defined.")



## Summary:

### Data Analysis Key Findings

*   The process successfully identified common file extensions and potential formats for AWS Secret Access Keys within those files.
*   Regular expressions were successfully defined to identify potential AWS Access Key IDs (starting with "AKIA") and AWS Secret Access Keys (40 characters of alphanumeric characters, '+', or '/').
*   A function was created to search for these key patterns within a file, recording the line number and matched string.
*   A function was developed to attempt decoding of potential key strings using Base64 and URL decoding.
*   Functions were defined to redact identified keys within lines of text and to apply this redaction across a file.
*   A function was created to save the modified file content back to the original file path.
*   A function was defined to print a comprehensive security warning regarding the storage of AWS keys and recommending secure practices like AWS Secrets Manager, Systems Manager Parameter Store, Environment Variables, and IAM Roles.

### Insights or Next Steps

*   The developed components provide a strong foundation for a tool to detect and redact AWS keys. The next step is to integrate these functions into a single script that can take a file path as input, perform the analysis and redaction, and save the modified file, while also presenting the security warning.
*   Consider adding more robust validation to the decoding step to confirm if the decoded string truly matches the pattern of an AWS Secret Access Key, reducing false positives.


In [None]:
input_file_path = '/content/online-valid_1.csv'
output_file_path = '/content/online-valid_1_redacted.txt'

# Find potential AWS keys in the input file
potential_matches = find_aws_keys_in_file(input_file_path)

if potential_matches:
    print(f"Found potential AWS keys in {input_file_path}:")
    for match in potential_matches:
        print(f"  Line {match['line_number']}: {match['key_type']} - {match['matched_string']}")

    # Print security warning
    print_security_warning()

    # Redact the keys in the file content
    modified_lines = redact_keys_in_file(input_file_path, potential_matches)

    if modified_lines is not None:
        # Save the modified content to a new file
        save_modified_file(output_file_path, modified_lines)
else:
    print(f"No potential AWS keys found in {input_file_path}.")
    print(f"Content of {input_file_path} will be copied to {output_file_path} without changes.")
    try:
        with open(input_file_path, 'r') as infile, open(output_file_path, 'w') as outfile:
            outfile.writelines(infile.readlines())
        print(f"Successfully copied content to {output_file_path}.")
    except FileNotFoundError:
        print(f"Error: Input file not found at {input_file_path}")
    except Exception as e:
        print(f"An error occurred while copying the file: {e}")

[1;30;43mKết quả truyền trực tuyến bị cắt bớt đến 5000 dòng cuối.[0m
  Line 36026: Secret Access Key - DQSIkWdsW0yxEjajBLZtrQAAAAAAAAAAAAMAAHAH
  Line 36026: Secret Access Key - hktUNEdXTk5DVVhPWlpWRzgxMElEOEVPMllZMC4u
  Line 36027: Secret Access Key - 1vS0MWVMxrnLRpBITbfTvAgxXJIx2iYSqYB9s9NF
  Line 36035: Secret Access Key - com/scl/fi/1efum5wgwauws8hsf22du/BTINTER
  Line 36036: Secret Access Key - //bafkreifqwj3sxcv2abyyfhjdnjvww7p3wymw4
  Line 36037: Secret Access Key - yCHmAOLPUETfD6VBTs1RZdb8rqUtr4VEGQmWt5KW
  Line 36039: Secret Access Key - NdGZKCuenzhYt6QXywTDttV1LnbWkAhelHmtuCLu
  Line 36042: Secret Access Key - QN9F4LmsolHd8v2orgUEYjIFu7dlCR3EEvOIdRjo
  Line 36043: Secret Access Key - 1vShdSNBAPxGqOZ0X0xMxtQNGBQfcMx5mNwCxkOt
  Line 36046: Secret Access Key - Rb5WNvpFd0OvUDwkiqBtEqf601nElp5Sk3w7T/pu
  Line 36047: Secret Access Key - DQSIkWdsW0yxEjajBLZtrQAAAAAAAAAAAAMAAHMv
  Line 36047: Secret Access Key - vJtUODJRNzlXRDc0RTRNTFNUQ0ZFMzI1MkNINi4u
  Line 36048: Secret Access K

In [None]:
from google.colab import files

files.download('/content/online-valid_1_redacted.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
import os

input_redacted_txt_path = '/content/online-valid_1_redacted.txt'
output_csv_path = '/content/online-valid_1_redacted.csv'

try:
    # Read the content from the redacted text file
    with open(input_redacted_txt_path, 'r') as infile:
        redacted_content = infile.readlines()

    # Write the content to a new file with a .csv extension
    with open(output_csv_path, 'w') as outfile:
        outfile.writelines(redacted_content)

    print(f"Content from '{input_redacted_txt_path}' saved to '{output_csv_path}' as a CSV file.")

except FileNotFoundError:
    print(f"Error: Input redacted file not found at {input_redacted_txt_path}")
except Exception as e:
    print(f"An error occurred while creating the CSV file: {e}")

Content from '/content/online-valid_1_redacted.txt' saved to '/content/online-valid_1_redacted.csv' as a CSV file.


In [None]:
from google.colab import files

# Download the newly created CSV file
files.download('/content/online-valid_1_redacted.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>