Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 implement yara keyword search #15

Merged

Conversation

the-siegfried
Copy link
Contributor

@the-siegfried the-siegfried commented Mar 24, 2022

Description

This branch extends the torcrawl.py applications available arguments to include the '-y/--yara' option and value, which is later used to determine whether keyword searching is enabled and if the scope of the search includes the entire http response or just the content text.

Motivation and Context

To extend the capability of the application to support keyword searching across all extraction methods and allow for further extensibility. This feature will allow the applications users to make use of yara rule based text matching for basic keyword searching. However, by adapting this implementation we can later add support for site and page scoring/categorisation as well.

How Has This Been Tested?

Following the example in the updated README.md project documentation:
$python torcrawl.py -v -w -u https://github.com -c -d 1 -p 5 -e -y 0

I was able to successfully run the application and test the newly implemented keyword searching capability. Please find the results attached.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    github.com.zip

Aaron Bishop added 7 commits March 17, 2022 23:36
- Adds new argument '-y' to accept yara switch.
- Implements new 'check_yara' method in extractor module to check for keyword matches from .yar file.
- Implements new 'text' method to extract only the text elements from the http content response for yara parsing.
- Amends extractor methods to accept new 'yara' argument and utilise new 'check_yara' and 'text' methods.

See Issue: MikeMeliz#14
- Refactors check_yara method to remove category checking. Future Feature
- Update main application docstring.

See Issue: MikeMeliz#14
- Removes commented out code.
- Amends absolute module calls.

See Issue: MikeMeliz#14
- Updates requirements.txt to include lxml.
- Ammends crawl method in crawler.py to dismiss None values.

See Issue: MikeMeliz#14
- Ammeds torcrawl.py to support -y argument accepting a value.
- Adds conditional handling for unexpected arguements for option '-y'.
- Refactors module/extractor.py to perform content parsing within check_yara method based on -y argument.
- Updates README.md to provide instructions for '-y/--yara' argument use.
- Updates res/keywords.yar to support README.md examples.

See Issue: MikeMeliz#14
- Updates docstrings in modules/extractor.py

See Issue: MikeMeliz#14
- Resolve gramatical mistake.

See Issue: MikeMeliz#14
Copy link
Owner

@MikeMeliz MikeMeliz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @the-siegfried throughout the whole implementation!

@MikeMeliz MikeMeliz merged commit 3241834 into MikeMeliz:master Mar 26, 2022
@the-siegfried
Copy link
Contributor Author

Thanks @MikeMeliz :)

@the-siegfried the-siegfried deleted the 14-implement-yara-keyword-search branch April 1, 2022 10:37
the-siegfried pushed a commit to the-siegfried/TorCrawl.py that referenced this pull request Jul 3, 2022
…keyword-search

14 implement yara keyword search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants