<a href="https://colab.research.google.com/github/damianiRiccardo90/BHP/blob/master/C5-Web_Hackery/Brute-Forcing_Directories_and_File_Locations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *__Brute-Forcing Directories and File Locations__*

The previous example assumed a lot of knowledge about your target. But when you're attacking a custom web application or large e-commerce system, you often won't be aware of all the files accessible on the web server.

Generally, you'll deploy a spider, such as the one included in _Burp Suite_, to crawl the target website in order to discover as much of the web application as possible. But in a lot of cases, you'll want to get ahold of configuration files, leftover development files, debugging scripts, and other security bread-crumbs that can provide sensitive information or expose functionality that the software developer did not intend. The only way to discover this content is to use a brute-forcing tool to hunt down common filenames and directories.

We'll build a simple tool that will accept word lists from common brute forcers, such as the __gobuster__ project (_https://github.com/OJ/gobuster/_) and __SVNDigger__ (_https://www.netsparker.com/blog/web-security/svn-digger-better-lists-for-forced-browsing/_), and attempt to discover directories and files that are reachable on the target web server. You'll find many word lists available on the internet, and you already have quite a few in your Kali distribution (see _/usr/share/wordlists_). For this example, we'll use a list from SVNDigger. You can retrieve the files for SVNDigger as follows:
```
cd ~/Downloads
wget https://www.netsparker.com/s/research/SVNDigger.zip
unzip SVNDigger.zip
```
When you unzip this file, the file __all.txt__ will be in your __Downloads__ directory.

As before, we'll create a pool of threads to aggressively attempt to discover content. Let's start by creating some functionality to create a __Queue__ out of a word-list file. Open up a new file, name it __bruter.py__, and enter the following code:

In [None]:
import queue
import requests
import threading
import sys

AGENT = "Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19."
EXTENSIONS = [".php", ".bak", ".orig", ".inc"]
TARGET = "http://testphp.vulnweb.com"
THREADS = 50
WORDLIST = "/home/tim/Downloads/all.txt"

def get_words(resume=None): #[1]

    def extend_words(word): #[2]
        if '.' in word:
            words.put(f"/{word}")
        else:
            words.put(f"/{word}/") #[3]

        for extension in EXTENSIONS:
            words.put(f"/{word}{extension}")

    with open(WORDLIST) as f:
        raw_words = f.read() #[4]
    
    found_resume = False
    words = queue.Queue()
    for word in raw_words.split():
        if resume is not None: #[5]
            if found_resume:
                extend_words(word)
            elif word == resume:
                found_resume = True
                print(f"Resuming wordlist from: {resume}")
        else:
            print(word)
            extend_words(word)
    return words #[6]
        

The __get_words__ helper function __[1]__, which returns the words queue we'll test on the target, contains some special techniques. We read in a word list file __[4]__ and then begin iterating over each line in the file. We then set the __resume__ variable to the last path that the brute forcer tried __[5]__. This functionality allows us to resume a brute-forcing session if our network connectivity is interrupted or the target site goes down. When we've parsed the entire file, we return a __Queue__ full of words to use in our actual brute-forcing function __[6]__.

Note that this function has an inner function called __extend_words__ __[2]__. An _inner function_ is a function defined inside another function. We could have written it outside of __get_words__, but because __extend_words__ will always run in the context of the __get_words__ function, we place it inside in order to keep the namespaces tidy and make the code easier to understand.

The purpose of this inner function is to apply a list of extensions to test when making requests. In some cases, you want to try not only the __/admin__ extension, for example, but also __admin.php__, __admin.inc__, and __admin.html__ __[3]__. It can be useful here to brainstorm common extensions that developers might use and forget to remove later on, like __.orig__ and __.bak__, on top of the regular programming language extensions. The __extend_words__ inner function provides this capability, using these rules: If the word contains a dot (_._), we'll append it to the URL (for example, __/test.php__), otherwise, we'll treat it like a directory name (such as __/admin/__).

In either case, we'll add each of the possible extensions to the result. For example, if we have two words, __test.php__ and __admin__, we will put the following additional words into our words queue:
```
/test.php.bak, /test.php.inc, /test.php.orig, /test.php.php
/admin/admin.bak, /admin/admin.inc, /admin/admin.orig, /admin/admin.php
```
Now let's write the main brute-forcing function:

In [None]:
def dir_bruter(words):
    headers = {"User-Agent": AGENT} #[1]
    while not words.empty():
        url = f"{TARGET}{words.get()}" #[2]
        try:
            r = requests.get(url, headers=headers)
        except requests.exceptions.ConnectionError: #[3]
            sys.stderr.write('x')
            sys.stderr.flush()
            continue

        if r.status_code == 200:
            print(f"\nSuccess ({r.status_code}: {url})") #[4]
        elif r.status_code == 404:
            sys.stderr.write('.') #[5]
            sys.stderr.flush()
        else:
            print(f"{r.status_code} => {url}")

if __name__ == "__main__":
    words = get_words() #[6]
    print("Press return to continue.")
    sys.stdin.readline()
    for _ in range(THREADS):
        t = threading.Thread(target=dir_bruter, args=(words,))
        t.start()

The __dir_bruter__ function accepts a __Queue__ object that is populated with words we prepared in the __get_words__ function. We defined a __User-Agent__ string at the beginning of the program to use in the HTTP request so that our requests look like the normal ones coming from nice people. We add that information into the __headers__ variable __[1]__. We then loop through the __words__ queue. For each iteration, we create a URL with which to request on the target application __[2]__ and send the request to the remote web server.

This function prints some output directly to the console and some output to __stderr__. We will use this technique to present output in a flexible way. It enables us to display different portions of output, depending on what we want to see.

It would be nice to know about any connection errors we get __[3]__, print an __x__ to __stderr__ when that happens. Otherwise, if we have a success (indicated by a status of 200), print the complete URL to the console __[4]__. You could also create a queue and put the results there, as we did last time. If we get a 404 response, we print a dot (.) to __stderr__ and continue __[5]__. If we get any other response code, we print the URL as well, because this could indicate something interesting on the remote web server. (That is, something besides a "file not found" error.) It's useful to pay attention to your output because, depending on the configuration of the remote web server, you may have to filter out additional HTTP error codes in order to clean up your results.

In the __\_\_main\_\___ block, we get the list of words to brute-force __[6]__ and then spin up a bunch of threads to do the brute-forcing.

# *__Kicking the Tires__*

OWASP has a list of vulnerable web applications, both online and offline, such as virtual machines and disk images, that you can test your tooling against. In this case, the URL referenced in the source code points to an intentionally buggy web application hosted by Acunetix. The cool thing about attacking these applications is that it shows you how effetive brute forcing can be.

We recommend you set the __THREADS__ variable to something sane, such as 5, and run the script. A value too low will take a long time to run, while a high value can overload the server. In short order, you should start seeing results such as the following ones:
```
(bph) rick@kali:~/bhp/bhp$ python bruter.py
Press return to continue.
--snip--
Success (200: http://testphp.vulnweb.com/CVS/)
...............................................
Success (200: http://testphp.vulnweb.com/admin/).
.......................................................
```
If you want to see only the successes, since you used __sys.stderror__ to write the x and dot (.) characters, invoke the script and redirect __stderr__ to _/dev/null_ so that only the files you found are displayed on the console:
```
python bruter.py 2> /dev/null

Success (200: http://testphp.vulnweb.com/CVS/)
Success (200: http://testphp.vulnweb.com/admin/)
Success (200: http://testphp.vulnweb.com/index.php)
Success (200: http://testphp.vulnweb.com/index.bak)
Success (200: http://testphp.vulnweb.com/search.php)
Success (200: http://testphp.vulnweb.com/login.php)
Success (200: http://testphp.vulnweb.com/images)
Success (200: http://testphp.vulnweb.com/index.php)
Success (200: http://testphp.vulnweb.com/logout.php)
Success (200: http://testphp.vulnweb.com/categories.php)
```
Notice that we're pulling some interesting results from the remote website, some of which may surprise you. For example, you may find backup files or code snippets left behind by an overworked web developer. What could be in that __index.bak__ file? With that information, you can remove files that could provide an easy compromise of your application.