<!-- metadata: title -->
# How Online Media Paywalls Should Work

<!-- metadata: subtitle -->
> ### Balancing between Paywalls and Search Engine Optimization

<!-- metadata: keywords, is_array=true -->
**Keywords:**
  - nation-media-group
  - paywalls
  - search-engine-optimization
  - cloudflare

<!-- metadata: categories, is_array=true -->
**Categories:**
  - cyber-security

<!-- metadata: -->
**Disclaimer:**
<!-- metadata: disclaimer, strip_markdown=false -->
We [contacted Nation Media Group](https://www.nationmedia.com/contact/) for on Jun 3, 2024, 2:30 PM but they did not respond. ^[We [contacted Nation Media Group](https://www.nationmedia.com/contact/) through the following emails: <support@nation.africa>, <sales_inquiries@ke.nationmedia.com>, <newsdesk@ke.nationmedia.com>, <publiceditor@ke.nationmedia.com>, <mailbox@ke.nationmedia.com>, <epaper@ke.nationmedia.com>, <Customercare@ke.nationmedia.com>]

## Old Code that worked

The old vulnerability only required css to bypass, see code below:

content.js

```js
setTimeout(() => {
    // https://nation.africa/
    // Remove the paywall element
    document.querySelector('.wall-guard')?.remove();
    // Allow copying the text
    document.querySelectorAll('.blk-txt')?.forEach(i => i.classList.remove('blk-txt'));

    // https://www.businessdailyafrica.com/
    // Remove the paywall spinner
    document.querySelector('.spinner')?.remove();
    // Remove the paywall element
    document.querySelector('.paywall')?.remove();
    // Remove the call for action
    document.querySelector('.grid-container-medium')?.remove();

    // https://www.businessdailyafrica.com/ AND https://nation.africa/
    // Show the hidden content
    document.querySelectorAll('.paragraph-wrapper.nmgp')?.forEach(i => i.classList.remove('nmgp'));
    // Stop all events
    // document.body.outerHTML += ''
}, 1)
```

***

manifest.json
```js
{
    "manifest_version": 3,
    "name": "Free Nation Media Articles - (For Education Purposes)",
    "version": "1.0",
    "description": "Free Nation Media Articles - (For Education Purposes). This is a proof of concept how to read premium articles from Nation Media Group for free.",
    "content_scripts": [
      {
        "matches": ["https://nation.africa/*"],
        "js": ["content.js"]
      },
      {
        "matches": ["https://www.businessdailyafrica.com/*"],
        "js": ["content.js"]
      }
    ]
  }

```

## New Code

After my promt, they added some security, by adding some javascript layer of security.

content.js
```js
setTimeout(async () => {
    // remove popup and make page scrollable
    const removePopup = (maxRetries, retries) => {
        setTimeout(() => {
            const popUp = document.querySelector('.fc-ab-root')
            if (popUp) {
                popUp?.remove()
                document.body.style = ""
            } else if (retries < maxRetries) {
                removePopup(maxRetries, retries + 1)
            }
        }, 300);
    };
    // fetch html src
    const htmlString = await fetch(location.href).then(resp => resp.text())
    const newHtmlDocument = new DOMParser().parseFromString(htmlString, 'text/html');
    // https://nation.africa/
    // Remove the paywall element
    newHtmlDocument.querySelector('.wall-guard')?.remove();
    // Allow copying the text
    newHtmlDocument.querySelectorAll('.blk-txt')?.forEach(i => i.classList.remove('blk-txt'));

    // https://www.businessdailyafrica.com/
    // Remove the paywall spinner
    newHtmlDocument.querySelector('.spinner')?.remove();
    // Remove the paywall element
    newHtmlDocument.querySelector('.paywall')?.remove();
    // Remove the call for action
    newHtmlDocument.querySelector('.grid-container-medium')?.remove();

    // https://www.businessdailyafrica.com/ AND https://nation.africa/
    // Show the hidden content
    newHtmlDocument.querySelectorAll('.paragraph-wrapper.nmgp')?.forEach(i => i.classList.remove('nmgp'));
    // Stop all events
    // document.body.outerHTML += ''
    // Enable images
    newHtmlDocument.querySelectorAll('img.lazy-img').forEach(i => i.classList.remove('lazy-img'))
    newHtmlDocument.querySelectorAll('img[data-src]').forEach(img => {
        const { dataset } = img;
        img.src = dataset.src ?? img.src;
        img.srcset = dataset.srcset ?? img.srcset;
    });
    // Remove spinners
    newHtmlDocument.querySelectorAll('.spinner').forEach(i => i.remove());
    // Remove cloundflare email protection label
    newHtmlDocument.querySelector('.__cf_email__')?.closest('.paragraph-wrapper')?.remove();

    document.body.outerHTML = newHtmlDocument.body.outerHTML;

    removePopup(50, 0)
}, 10)
```

***

manifest.json

```json
{
  "manifest_version": 3,
  "name": "Free Nation Media Articles - (For Education Purposes)",
  "version": "1.2",
  "description": "Free Nation Media Articles - (For Education Purposes). This is a proof of concept how to read premium articles from Nation Media Group for free.",
  "content_scripts": [
    {
      "matches": ["*://nation.africa/*"],
      "js": ["content.js"]
    },
    {
      "matches": ["*://*.businessdailyafrica.com/*"],
      "js": ["content.js"]
    },
    {
      "matches": ["*://businessdailyafrica.com/*"],
      "js": ["content.js"]
    }
  ]
}

```

## Appropriate Fix

My suggested fix involves using cloudflare, which nation.africa is already using for DNS and CDN management. create a web worker that checks the IP address. if the ip address is from search engines, then return the extra paid content for SEO, otherwise reduct the extra content. with this, it would still be possible to see the content by routing the request with a https://pagespeed.web.dev/ , which makes it harder than simple jatascript and css!

The IP check involves an IP reverse lookup

In [None]:
# create a python reverse lookup code, and some tests! test with major search engines!

In [1]:
import socket
from ipaddress import ip_address
from urllib.parse import urlparse

async def reverse_dns_lookup(ip_addr, *host_names):
    """
    Perform reverse DNS lookup
    """
    # Check if host_names is empty or ip_addr is invalid
    if not host_names or not ip_addr or not is_valid_ip(ip_addr):
        return False
    
    try:
        # Get hostname from IP
        hostname, _, _ = socket.gethostbyaddr(ip_addr)
        # Check if IP matches any of the addresses for the hostname
        valid_ip = ip_addr in (i[4][0] for i in socket.getaddrinfo(hostname, None))
        # Check if hostname or its aliases match any of the allowed hosts
        valid_host = any(host_intersection(h, *host_names) for h in [hostname] + socket.gethostbyaddr(hostname)[1])
        return valid_ip and valid_host
    except:
        return False

def host_intersection(target_uri, *hosts):
    """
    Check if target_uri intersects with any of the hosts
    """
    if not target_uri or not hosts:
        return False
    
    try:
        current_host = urlparse(f"http://{target_uri}").netloc.lower()
        return any(
            current_host.endswith(h.lower()) or h.lower().endswith(current_host)
            for h in hosts
        )
    except:
        return False

def is_valid_ip(ip):
    """
    Check if the given string is a valid IP address
    """
    try:
        ip_address(ip)
        return True
    except ValueError:
        return False

# Usage example:
# import asyncio
# result = asyncio.run(reverse_dns_lookup("8.8.8.8", "google.com", "googlebot.com"))
# print(result)

In [3]:
await reverse_dns_lookup("66.249.66.1", "googlebot.com", "google.com")

True

As one can tell, doing this for every request is resource intensive, and it is best to cache this for about 7 days. a verified ip address should be allowed to query for a week without firther checks for a week!

Alternatives to doing this on the server is doing this on==in a CDN like cloudflare, using web workers in this case. this saves server resources and for a start, its free. web workers intercept a request to the server,  and is able to modify the request and the response.