Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added shared drive crawler #141

Merged
merged 13 commits into from
Sep 1, 2019
Merged

Added shared drive crawler #141

merged 13 commits into from
Sep 1, 2019

Conversation

Pourliver
Copy link
Contributor

@Pourliver Pourliver commented Aug 1, 2019

I added the automatic shared drive crawler I had in mind, as shared a little in #137.

This has been done after #140, so its been easier to implement since I knew the download flow and quirks.

I had to create a crawler_config folder since I didn't like the idea of mixing classes definition and plaintext files.

I tried to put more documentation in this class since the core logic may be hard to follow. I've tested it a lot and it works amazingly well, but please do test it.

@Pourliver Pourliver requested review from xshill and Res260 August 1, 2019 18:52
@Res260
Copy link
Collaborator

Res260 commented Aug 6, 2019

Idea: log in a file each file, even if not downloaded

pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/FileCrawler.py Outdated Show resolved Hide resolved
pyrdp/mitm/config.py Outdated Show resolved Hide resolved
pyrdp/mitm/crawler_config/match.txt Show resolved Hide resolved
bin/pyrdp-mitm.py Outdated Show resolved Hide resolved
@Pourliver
Copy link
Contributor Author

Idea: log in a file each file, even if not downloaded

To add to this idea, since we talked about it in person, the goal is to keep track of what the crawler is seeing, so we can further improve the default match / ignore regexes later.

@Pourliver
Copy link
Contributor Author

The PR is ready for another review. I added a new logger that is not from the root logger, so we could log in another file without messing with STDOUT. This new file keeps track of every file we crawled.

@Pourliver Pourliver requested a review from Res260 August 14, 2019 14:33
@Res260
Copy link
Collaborator

Res260 commented Aug 14, 2019

Oops commented on the commet instead of the PR:

Besides the exception() log TODO, LGTM!

@Pourliver
Copy link
Contributor Author

Sorry, but I'm not too sure I understand whats wrong @Res260 , could you please elaborate a bit?

@Res260
Copy link
Collaborator

Res260 commented Aug 14, 2019 via email

@Pourliver
Copy link
Contributor Author

You're right, I totally missed this one, my bad! Fixed in latest commit.

@Res260
Copy link
Collaborator

Res260 commented Aug 14, 2019

Lgtm

@Pourliver
Copy link
Contributor Author

Pourliver commented Aug 14, 2019

Fixed a small problem with the parser where the last line of each file got truncated, and simplified the match file since */*.doc* can also match */*.docker, which is unintended. It was only meant for */*.docx so I wrote both to the file. We need to avoid trailing *.

@Res260
Copy link
Collaborator

Res260 commented Aug 27, 2019

TODO: Add the pr changes to changelog.md

Copy link
Collaborator

@Res260 Res260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, very dank. LGTM

bin/pyrdp-mitm.py Outdated Show resolved Hide resolved
@Res260 Res260 merged commit 0f96e49 into master Sep 1, 2019
@Res260 Res260 deleted the mitm_crawler branch September 1, 2019 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants