<a href="https://colab.research.google.com/github/damianiRiccardo90/BHP/blob/master/C4-Web_Hackery/Mapping_Web_Apps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *__Web Hackery__*

The ability to analyze web applications is an absolutely critical skill for any attacker or penetration tester. In most modern networks, web applications present the largest attack surface and therefore are also the most common avenue for gaining access to the web applications themselves.

You'll find a number of excellent web application tools written in Python, including __w3af__ and __sqlmap__. Quite frankly, topics such as _SQL injection_ have been beaten to death, and the tooling available is mature enough that we don't need to reinvent the wheel. Instead, we'll explore the basics of interacting with the web by using Python and then build on this knowledge to create reconnaissance and brute-force tooling. By creating a few different tools, you should learn the fundamental skills you need to build any type of web application assessment tool that your particular attack scenario calls for.

In this chapter, we'll look at three scenarios for attacking a web app. In the first scenario, you know the web framework that the target uses, and that framework happens to be open source. A web app framework contains many files and directories within directories within directories. We'll create a map that shows the hierarchy of the web app locally and use that information to locate the real files and directories on the live target.

In the second scenario, you know only the URL for your target, so we'll resort to brute-forcing the same kind of mapping by using a word list to generate a list of filepaths and directory names that may be present on the target. We'll then attempt to connect to the resulting list of possible paths against a live target.

In the third scenario, you know the base URL of your target and its login page. We'll examine the login page and use a word list to brute-force a login.

### *__Using Web Libraries__*

We'll start by going over the libraries you can use to interact with web services. When performing network-based attacks, you may be using your own machine or a machine inside the network you're attacking. If you are on a compromised machine, you'll have to make do with what you've got, which might be a bare-bones Python 2.x or Python 3.x installation. We'll take a look at what you can do in those situations using the standard library. For the remainder of the chapter, however, we'll assume you're on your attacker machine using the most up-to-date packages.

### *__The urllib2 Library for Python 2.x__*

You'll see the __urllib2__ library used in code written for Python 2.x. It's bundled into the standard library. Much like the __socket__ library for writing network tooling, people use the __urllib2__ library when creating tools to interact with web services. Let's take a look at code that makes a very simple _GET_ request to the No Starch Press website:

In [None]:
import urllib2

url = "https://www.nostarch.com"
# GET
response = urllib2.urlopen(url) #[1]
print(response.read()) #[2]
response.close()

This is the simplest example of how to make a _GET_ request to a website. We pass in a URL to the __urlopen__ function __[1]__, which returns a file-like object that allows us to read back the body of what the remote web server returns __[2]__. As we're just fetching the raw page from the No Starch website, no JavaScript or other client-side languages will execute.

In most cases, however, you'll want more fine-grained control over how you make these requests, including being able to define specific headers, handle cookies, and create _POST_ requests. The __urllib2__ library includes a __Request__ class that gives you this level of control. The following example shows you how to create the same _GET_ request by using the __Request__ class and by defining a custom __User-Agent__ HTTP header:

In [None]:
import urllib2

url = "https://www.nostarch.com"
headers = {"User-Agent": "Googlebot"} #[1]

request = urllib2.Request(url, headers=headers) #[2]
response = urllib2.urlopen(request) #[3]

print(response.read())
response.close()

The construction of a __Request__ object is slightly different from our previous example. To create custom headers, we define a __headers__ dictionary __[1]__, which allows us to then set the header keys and values we want to use. In this case, we'll make our Python script appear to be the Googlebot. We then create our __Request__ object and pass in the __url__ and the __headers__ dictionary __[2]__, and then pass the __Request__ object to the __urlopen__ function call __[3]__. This returns a normal file-like object that we can use to read in the data from the remote website.

### *__The urllib Library for Python 3.x__*

In Python 3.x, the standard library provides the __urllib__ package, which splits the capabilities from the __urllib2__ package into the __urllib.request__ and __urllib.error__ subpackages. It also adds URL-parsing capability with the subpackage __urllib.parse__.

To make an HTTP request with this package, you can code the request as a context manager using the __with__ statement. The resulting response should contain a byte string. Here's how to make a _GET_ request:

In [None]:
import urllib.request #[1]

url = "http://boodelyboo.com" #[2]
# GET
with urllib.request.urlopen(url) as response: #[3]
    content = response.read() #[4]

print(content)

Here we import the packages we need __[1]__ and define the target URL __[2]__. Then, using the __urlopen__ method as a context manager, we make the request __[3]__ and read the response __[4]__.

To create a _POST_ request, pass a data dictionary to the request object, encoded as bytes. This data dictionary should have the key-value pairs that the target web app expects. In this example, the __info__ dictionary contains the credentials (_user_, _passwd_) needed to log in to the target website:

In [None]:
import urllib.parse
import urllib.request

info = {"user": "tim", "passwd": "31337"}
# Data is now of type bytes
data = urllib.parse.urlencode(info).encode() #[1]

req = urllib.request.Request(url, data) #[2]
# POST
with urllib.request.urlopen(req) as response:
    content = response.read() #[3]

print(content)

We encode the data dictionary that contains the login credentials to make it a bytes object __[1]__, put it into the _POST_ request __[2]__ that transmits the credentials, and receive the web app response to our login attempt __[3]__.

### *__The request Library__*

Even the official Python documentation recommends using the __requests__ library for a higher-level HTTP client interface. It's not in the standard library, so you have to install it. Here's how to do so using __pip__:
```
pip install requests
```
The __requests__ library is useful because it can automatically handle cookies for you, as you'll see in each example that follows, but especially in the example where we attack a WordPress site in "Brute-Forcing HTML Form Authentication" on Page 85. To make an HTTP request, do the following:

In [None]:
import requests

url = "http://boodelyboo.com"
# GET
response = requests.get(url)

data = {"user": "tim", "passwd": "31337"}
# POST
response = requests.post(url, data=data) #[1]
# response.text = string; response.content = bytestring
print(response.text) #[2]

We create the __url__, the __request__, and a __data__ dictionary containing the __user__ and __passwd__ keys. Then we post that request __[1]__ and print the __text__ attribute (a string) __[2]__. If you would rather work with a byte string, use the __content__ attribute returned from the post. You'll see an example of that in "Brute-Forcing HTML Form Authentication" on page 85.

### *__The lxml and BeautifulSoup Packages__*

Once you have an HTTP response, either the __lxml__ or __BeautifulSoup__ package can help you parse the contents. Over the past few years, these two packages have become more similar. You can use the __lxml__ parser with the __BeautifulSoup__ package, and the __BeautifulSoup__ parser with the __lxml__ package.

You'll see code from other hackers that use one or the other. The __lxml__ package provides a slightly faster parse, while the __BeautifulSoup__ package has logic to automatically detect the target HTML page's encoding. We will use the __lxml__ package here. Install either package with __pip__:
```
pip install lxml
pip install beautifulsoup4
```
Suppose you have the HTML content from a request stored in a variable named __content__. Using __lxml__, you could retrieve the content and parse the links as follows:

In [None]:
from io import BytesIO #[1]
from lxml import etree

import requests

url = "https://nostarch.com"
# GET
r = requests.get(url) #[2]
# content is of type "bytes"
content = r.content

parser = etree.HTMLParser()
# Parse into tree
content = etree.parse(BytesIO(content), parser=parser) #[3]
# Find all "a" anchor elements.
for link in content.findall("//a"): #[4]
    print(f"{link.get("href")} -> {link.text}") #[5]

We import 

## *__Mapping Open Source Web App Installations__*



### *__Mapping the WordPress Framework__*

