# www.mozilla.org

## Details

1. Core Site
1. Mozilla Marketing Website aka Bedrock.
1. In scope
1. Severity - Critical
1. Please use our staging instance, www.allizom.org, for testing to avoid site disruption.

## Investigation

1. Configure Burp Suite
1. Spider the website
    1. To spider we will use `scrapy`, but we must also bypass the `robots.txt` file. to do this goto the scrapy settings file and `ROBOTSTXT_OBEY = False  # Set to True to obey robots.txt rules`.
    1. You should use a VPN to avoid getting banned from the website.
    1. After this is set up, commit the following steps.
        1. `scrapy startproject myproject`
        1. `cd myproject`
        1. `scrapy genspider myspider example.com`
        1. `scrapy crawl myspider -o output.json`
        1. Now simply open the file and analyse!


When using scrapy, definitely have a list of sensitive keywords.

```cmd
sensitive_keywords = ['admin', 'dashboard', 'config', 'login', 'settings', 'account', 'user', 'control', 'management']
```

Consider changing your `spider.py` to:

```cmd
import logging
import scrapy

class SpiderSpider(scrapy.Spider):
    name = "spider"
    allowed_domains = ["www.allizom.org"]
    start_urls = ["https://www.allizom.org/"]

    # Use a set to store visited links
    visited_links = set()

    # Limit crawl depth to avoid going too deep
    custom_settings = {
        'DEPTH_LIMIT': 3,
        'DOWNLOAD_DELAY': 2,  # Adjust to avoid hitting rate limits
    }

    def parse(self, response):
        links = response.css('a::attr(href)').getall()
        sensitive_keywords = ['admin', 'dashboard', 'config', 'login', 'settings', 'account', 'user', 'control', 'management']

        for link in links:
            # Normalize the link to avoid duplicates caused by differences in slashes or trailing '/'.
            link = response.urljoin(link)  # This makes sure the URL is absolute.

            # Check if link has already been visited
            if link not in self.visited_links:
                # Mark this link as visited
                self.visited_links.add(link)

                # Follow sensitive links
                if any(keyword in link for keyword in sensitive_keywords):
                    logging.info(f"Found potentially sensitive page: {link}")
                    yield {'sensitive_page': link}

                    yield response.follow(link, self.parse)

                # Follow only internal links and avoid unnecessary file types
                elif link.startswith('https://www.allizom.org') and not link.endswith(('.jpg', '.png', '.gif', '.pdf')):
                    yield response.follow(link, self.parse)

```

though for intensive deep scans you could use:

```cmd
import logging
import scrapy

class SpiderSpider(scrapy.Spider):
    name = "spider"
    allowed_domains = ["www.allizom.org"]
    start_urls = ["https://www.allizom.org/en-GB/?v=1"]

    def parse(self, response):
        logging.info(f"Visited: {response.url}")

        # Extract titles
        titles = response.css('h1::text').getall()
        logging.info(f"Found titles: {titles}")
        for title in titles:
            yield {'title': title}

        # Extract additional headings
        h2_headings = response.css('h2::text').getall()
        logging.info(f"Found h2 headings: {h2_headings}")
        for h2 in h2_headings:
            yield {'h2': h2}

        # Extract all paragraph texts
        paragraphs = response.css('p::text').getall()
        logging.info(f"Found paragraphs: {paragraphs}")
        for paragraph in paragraphs:
            yield {'paragraph': paragraph}

        # Extract all links and their URLs
        links = response.css('a::text').getall()
        link_urls = response.css('a::attr(href)').getall()
        logging.info(f"Found links: {links} with URLs: {link_urls}")
        for link, url in zip(links, link_urls):
            yield {'link_text': link, 'link_url': url}

        # Extract images and their sources
        image_sources = response.css('img::attr(src)').getall()
        logging.info(f"Found image sources: {image_sources}")
        for src in image_sources:
            yield {'image_src': src}

        # Follow links to other pages if needed
        for next_page in response.css('a::attr(href)').getall():
            yield response.follow(next_page, self.parse)

```

## Extended Spiders

### target Forms and Input Fields

``` cmd
def parse(self, response):
    logging.info(f"Visited: {response.url}")

    # Extract forms and input fields
    forms = response.css('form')
    for form in forms:
        action = form.css('::attr(action)').get()
        method = form.css('::attr(method)').get(default='GET')
        inputs = form.css('input::attr(name)').getall()
        logging.info(f"Found form: {action}, Method: {method}, Inputs: {inputs}")
        
        if action and ('login' in action or 'search' in action or 'submit' in action):
            yield {
                'form_action': action,
                'form_method': method,
                'input_fields': inputs
            }
```

### target Potential Injection Points

``` cmd
def parse(self, response):
    links = response.css('a::attr(href)').getall()
    for link in links:
        if '?' in link:  # Check if URL contains query parameters
            yield {'potential_injection_point': link}

        # Follow links to deepen your crawl
        yield response.follow(link, self.parse)
```

### target XSS and Client-Side Issues

``` cmd
def parse(self, response):
    scripts = response.css('script::attr(src)').getall()
    for script in scripts:
        if script.endswith('.js'):
            logging.info(f"Found JavaScript file: {script}")
            yield {'javascript_file': script}

        # Follow JavaScript file links to scrape them as well
        yield response.follow(script, self.parse)
```

### target Potential Misconfigurations

``` cmd
def parse(self, response):
    hidden_inputs = response.css('input[type="hidden"]::attr(value)').getall()
    logging.info(f"Found hidden inputs: {hidden_inputs}")
    for hidden in hidden_inputs:
        yield {'hidden_input': hidden}

```

### target Administrative or Sensitive Pages

``` cmd
def parse(self, response):
    links = response.css('a::attr(href)').getall()
    for link in links:
        if 'admin' in link or 'dashboard' in link or 'config' in link:
            logging.info(f"Found sensitive page: {link}")
            yield {'sensitive_page': link}

        # Follow links to check for deeper vulnerabilities
        yield response.follow(link, self.parse)
```

### target Session Management Issues

``` cmd
def parse(self, response):
    cookies = response.headers.getlist('Set-Cookie')
    for cookie in cookies:
        if 'HttpOnly' not in cookie.decode() or 'Secure' not in cookie.decode():
            yield {'insecure_cookie': cookie.decode()}

```

## URLs

URLs of interest:

### https://www.allizom.org/media/js/data.79f9e875d181.js


This code appears to be a JavaScript module designed for tracking download events on a website, likely related to Mozilla's products. Here’s a breakdown of its main components and functionality:

Strict Mode: The code is executed in strict mode, which enforces stricter parsing and error handling in JavaScript.

Download URL Validation:

The isValidDownloadURL function checks if a given URL matches any of the predefined patterns for valid download URLs, which include various domains associated with Mozilla, Apple, Google Play, and Microsoft Store.
Event Object Creation:

The getEventObject function constructs an event object for a download event based on parameters like product name, platform, method, and optional fields like release channel and download language.
Event Extraction from URL:

The getEventFromUrl function extracts relevant information from a URL (like product name and platform) and builds an event object if the URL is valid.
Link Handling:

The handleLink function is triggered when a link is clicked. It ensures that the event handler identifies the clicked link and sends the appropriate event for tracking.
Event Sending:

The sendEvent function pushes the event object into a global dataLayer array for analytics purposes and may also trigger events in other systems (like Mozilla's Glean).
Event Listeners:

The code adds click event listeners to elements with the class ga-product-download, allowing it to track when users click on specific download links.
Experiment Tracking:

The module also includes functionality for tracking experiments (likely A/B tests) via the experiment_view event.
Overall, this code is focused on tracking downloads, sending relevant analytics data, and ensuring that download links are valid and properly categorized based on the product and platform. It integrates with Google's Tag Manager or a similar analytics framework to capture user interactions with download links.

### Vulnerabilities

Exploiting this code could involve several potential vulnerabilities or weaknesses, particularly if an attacker can manipulate or exploit how events are tracked or how URLs are processed. Here are some potential avenues for exploitation:

Malicious URLs: If the validation of download URLs (isValidDownloadURL) is bypassed or not robust enough, an attacker could inject malicious URLs that trigger the tracking functions. This could lead to phishing attacks or malware downloads.

Cross-Site Scripting (XSS): If user input is not properly sanitized before being processed (for instance, if an attacker can inject scripts via URLs or query parameters), it could lead to XSS vulnerabilities. This would allow attackers to execute arbitrary JavaScript in the context of a user’s session.

Data Leakage: The code uses a global dataLayer to push event data. If sensitive information is inadvertently included in these events, an attacker could potentially capture this data, especially if it’s accessible in the browser.

Manipulating Event Tracking: An attacker could manipulate the link handling or the event creation process to spoof download events or analytics data. This could distort analytics results and impact business decisions based on the data.

Denial of Service (DoS): If an attacker could trigger a large number of events in a short period (e.g., through automated scripts), it might overload the analytics service or disrupt normal operations.

Browser-Specific Exploits: If the code is only tested on specific browsers or platforms, attackers could exploit known vulnerabilities in those environments that the code relies on, especially if the code assumes certain APIs or behaviors.

Social Engineering: If the tracking mechanisms create a false sense of security around certain download links, attackers could craft social engineering attacks to trick users into clicking those links, thinking they are safe.

To mitigate these risks, it’s important to:

Ensure thorough validation and sanitization of any user input.
Implement Content Security Policies (CSP) to restrict the sources of executable scripts.
Regularly audit the code and dependencies for vulnerabilities.
Use secure coding practices and keep libraries up to date.

### https://www.allizom.org/en-US/privacy/websites/cookie-settings/

### Vulnerability

Key Security Headers
X-Frame-Options: DENY

Prevents the page from being embedded in iframes, mitigating clickjacking attacks.
Content-Security-Policy (CSP)

This policy specifies which resources can be loaded by the browser. It includes directives for script-src, style-src, and others, helping to prevent XSS and data injection attacks. However, the use of 'unsafe-inline' and 'unsafe-eval' can weaken this policy and may allow some XSS attacks.
Strict-Transport-Security (HSTS)

Ensures that the browser only communicates with the server over HTTPS, protecting against man-in-the-middle attacks. The max-age of 31536000 seconds (1 year) indicates a strong enforcement period.
X-Content-Type-Options: nosniff

Prevents browsers from MIME-sniffing a response away from the declared content type, reducing the risk of certain types of attacks.
Referrer-Policy: strict-origin-when-cross-origin

Controls how much referrer information is passed when navigating to different origins, enhancing privacy.
Cross-Origin-Opener-Policy: same-origin

Helps mitigate side-channel attacks by controlling how cross-origin documents can interact with each other.
Other Useful Information
Cache-Control and Expires: The page sets a cache duration, which can be important for ensuring that users receive updated content and reducing the risk of serving stale or sensitive information.

Content-Language: Indicates the language of the content (English, in this case), which can be useful for localization and accessibility considerations.

X-Clacks-Overhead: A humorous header that honors Terry Pratchett, indicating a light-hearted approach but does not impact security.

Potential Concerns
CSP Weaknesses: The presence of 'unsafe-inline' and 'unsafe-eval' suggests that there might be risks for XSS if any user input is not properly sanitized. Testing for XSS vulnerabilities is recommended.
Recommendations for Further Testing
Test for XSS: Attempt to inject scripts to see if any XSS vulnerabilities exist due to the CSP configuration.
Check Resource Loading: Analyze the effectiveness of the CSP by checking if it properly blocks unintended scripts or resources.
Evaluate Cookie Security: Inspect any cookies set by the page to ensure they have appropriate flags (e.g., HttpOnly, Secure).
Overall, the headers indicate a strong security posture, but there are areas to test further, particularly around CSP configurations and potential XSS vectors.