Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for improved content organization #1

Open
Tibso opened this issue Oct 20, 2023 · 4 comments
Open

Request for improved content organization #1

Tibso opened this issue Oct 20, 2023 · 4 comments

Comments

@Tibso
Copy link

Tibso commented Oct 20, 2023

Greetings,

I would like to propose an enhancement to your repository.
I believe it would significantly improve data usability to establish a clear association between FQDNs, domains, IPs, and the corresponding types of content they represent. I have identified 2 methods to accomplish this.
This could be achieved through the implementation of separate directories or linking mechanisms.


Option 1: Sorting using directories

One approach would involve segregating these items into separate directories, thereby allowing for a more logical retrieval. For instance, a domain associated with ads could be stored as follows:

public.dir/domain/ads/DIGISQUAD-COM-malicious

Option 2: Prepending the type

Alternatively another approach would involve prefixing each domain with its content type:

ads:adspam.com
malware:getpwnd.net


This restructuring would enable a finer level of control when it comes to filtering content.
Personally, I am particularly interested in the filtering of ads, malware, and adult content.
However, it would be advantageous to have the flexibility to accommodate additional categories as needed.

Would you consider implementing this modification?

Additionally, I observed that the domains in the domain directory do not appear to encompass the domains listed in the fqdn directory, and vice versa. Could this be an unintended behavior?

@digisquad-repo
Copy link
Collaborator

Hi Tibso,

Thank you for your feedback.

I am agreed with your viewpoint on the need for restructuring. In fact, these adjustments are part of several other changes that will be integrated to enrich the current data with some meta/extra data and others formats. I haven't had the chance to update this in the current version yet.


Short-term Changes:

In the immediate future, I plan to relocate data from public.dir/* to the root directory, enhancing accessibility. Additionally, for improved clarity, the "DIGISQUAD-COM" string in filenames will likely be removed.

Categories and Data Source:

These categories, like ads, malware, and adult content, are ones I've been thinking about adding to the system. Do you have specific data sources you'd recommend for these?


About the option 1 :

The idea is to have a directory-file-based organization like: [ dataType (directory) ] / [ category (file) ] :

domain/ads
domain/malicious
domain/adult
...

I find this structure efficient, but if you have concerns, please do share.

About the option 2 :

Recognizing the significance of prefixes for Redis users, I could append .redis during list generation. This could yield:

domain/ads.redis
domain/malicious.redis
domain/adult.redis
...

But for consistency, the data type might be prefixed, leading to:

domain:ads:adspam.com
domain:malware:somethingbad.com
ip:malware:123.123.123.123
...

Option 3 :

Based on your feedback and the existing challenges, I'm contemplating a more tailored solution. You could specify your desired format, and I'd accommodate it in a directory, potentially named dnsblacklist-rs. The naming convention would be something close to :

[ dataType ] / [dataFormat ] / [category (file) ]

Examples :

domain/dnsblacklist-rs/categories
domain/text/categories
domain/json/categories
domain/fortinet/categories
...

Specifically for your use-case:

domain/dnsblacklist-rs/ads
domain/dnsblacklist-rs/adult
domain/dnsblacklist-rs/malware
...

Then I could implement your prepend suggestion like :

ads:adspam.com
malware:getpwnd.net
...

But you probably then prefer a single consolidated file containing domains from all categories, instead of separate files per category, right ?

If this latest solution suits your needs, I'll implement it based on my availability.
Please feel free to provide suggestions or feedback.

@Tibso
Copy link
Author

Tibso commented Oct 23, 2023

Here's my assessment of the available options:

Option 1:

The first option appears to represent the most efficient approach for data organization. The utilization of distinct directories effectively eliminates the necessity for end-users to engage in the inconvenient task of data sorting.

Furthermore, this option could hold particular appeal for users who would prefer not to contend with a single extensive file. Such an approach vaguely aligns with the methodology of maintaining a record count under a predetermined threshold, such as 50k or 100k, as per your existing practice.

Option 2:

While prefixes are undeniably useful for Redis users, I am inclined to believe that introducing a separate directory solely for Redis users may not warrant the added complexity.

Achieving the desired outcome can be accomplished by simply prefixing the data type to each individual record.

Option 3:

Similar to my assessment of Option 2, I hold the view that tailored solutions may not be justified given the ensuing increase in complexity.

The introduction of directories for specific use-cases would lead to an increase of the repository's size, despite the fact that the underlying data remains nearly identical.

For my specific use-case, it is preferable to maintain separate files to avoid the need for extensive data sorting.


Given these considerations, my recommendation is to opt for the first option, as I am inclined to believe that it holds the most promise.

@Tibso
Copy link
Author

Tibso commented Oct 24, 2023

Regarding the data sources to utilize, firebog.net appears to be the most comprehensive compilation of domains suitable for blocking that I have come across.

Moreover, it would be highly advantageous if the MISP warning lists could be incorporated to identify potential false positives.

Finally, it would be of great interest to receive real-time updates from MISP concerning IP addresses and domains that should be blocked.

@digisquad-repo
Copy link
Collaborator

Hi T,

Ok, thank you, I will review this data source. Note that I have others sources that may align more specifically with your case(s), but I have time constraints that limited my actions for now.

But don't forget, it's not an issue to accumulate vast amounts of data but it's crucial that this data remains up-to-date and efforts are made to minimize false positives.

  • Regarding the MISP warning list, I've already implemented these lists to mitigate the occurrence of false positives. Additionnaly, i have also integrated my own warning lists on top of it. As always, not perfect but that's a start (and free)

  • As for the data sourced from MISP, I'm handling for my personnal usage at the moment. My understanding is that there are restrictions on exporting this data out of MISP. Even though many might bypass these rules, I won't be publishing MISP sourced data here. This is mainly to ensure that I adhere to the traffic light protocol (TLP) associated with the data. I could consider this on a private repo other something else, or directly in a MISP instance.

That said, as explain to your boss, should there be any pressing updates or features needed, I am open to discussions regarding professionnal support on this kind of data. After all, one has to earn a living :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant