<a href="https://colab.research.google.com/github/Agus1112/IoC_Extractor/blob/main/IoC_Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IoC Extractor

## Description:

This script uses regular expressions to extract various types of indicators of compromise (IoC) from a given text. Supported IoCs include:



*   Hashes: MD5, SHA-1 and SHA-256
*   IP addresses: IPv4 and IPv6
*   MAC address
*   Domains
*   URL

The script is easy to use. It takes a text string as input and returns a dictionary containing lists of all extracted elements categorized by type.

In [None]:
import re

def extract_data(text):
    # Regular expressions for each type of data
    ipv4_regex = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')  # Matches IPv4 addresses
    ipv6_regex = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b')  # Matches IPv6 addresses
    md5_regex = re.compile(r'\b[a-fA-F0-9]{32}\b')  # Matches MD5 hashes
    sha1_regex = re.compile(r'\b[a-fA-F0-9]{40}\b')  # Matches SHA-1 hashes
    sha256_regex = re.compile(r'\b[a-fA-F0-9]{64}\b')  # Matches SHA-256 hashes
    domain_regex = re.compile(r'\b(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}\b')  # Matches domain names
    url_regex = re.compile(r'https?://[^\s/$.?#].[^\s]*')  # Matches URLs
    mac_regex = re.compile(r'\b(?:[0-9A-Fa-f]{2}[:-]){5}[0-9A-Fa-f]{2}\b')  # Matches MAC addresses

    # Dictionary to store the extracted data
    extracted_data = {
        'ipv4': ipv4_regex.findall(text),  # List of IPv4 addresses
        'ipv6': ipv6_regex.findall(text),  # List of IPv6 addresses
        'md5': md5_regex.findall(text),  # List of MD5 hashes
        'sha1': sha1_regex.findall(text),  # List of SHA-1 hashes
        'sha256': sha256_regex.findall(text),  # List of SHA-256 hashes
        'domains': domain_regex.findall(text),  # List of domain names
        'urls': url_regex.findall(text),  # List of URLs
        'mac_addresses': mac_regex.findall(text)  # List of MAC addresses
    }

    return extracted_data

## Example of use

In [None]:
if __name__ == "__main__":
    # Prompt user for input text
    user_input = input("Enter the text to analyze: ")
    # Extract data from the input text
    result = extract_data(user_input)
    # Print the extracted data
    print(result)