# Phishers abusing free javascript hosting

Last year I [analysed](https://dfworks.xyz/blog/credential_stealer/) a phishing email that rendered a semi-convincing, html credential stealer locally rather than directing the victim to a site on the internet. Whilst this wasn't a new tactic, in order to avoid being flagged as malware, obfuscated requests to https://www.yourjavascript.com were made to retrieve additional data such as logo images and screenshots.

yourjavascript.com is a free hosting site that relies on donations to continue functioning so probably can't be held accountable for what users host on the site nor have the resource to audit obfuscated code that has been uploaded. That being said, I was curious as to what other ways this free hosting site was being abused by malicious actors.

This jupyter notebook contains the code required to replicate the following steps (please clone/branch as you wish and amend the steps for your own research):

- Collate all the URLs for uploaded code on yourjavascript.com

- Triage the collected URLs for signs of obfuscation/malicious activity

- Render any potentially malicious html as an image

- Check malicious html for further malicious URLs

- Check for yourjavascript usernames

## Requirements

You can uncomment the below cell to install the required python python modules or run the jupyter notebook in a separate environment first

In [None]:
#!pip install -r requirements.txt

## Modules

Below is a description of some of the more peculiar modules used in this project. For modules that aren't mentioned but you are unfamiliar with, please google the documentation.

- **chromedriver_autoinstaller** automatically downloads the correct chromedriver for your browser and handles all the paths required to use selenium

- **Grabzit** is an API where you can post html and retrieve a rendered image. This can be done using other python modules but this was the quickest and easiest way. To follow these steps you will need to [sign up](https://grabz.it/) for a free acount to get an api key.

- **nest_asyncio** patches asyncio to allow nested use of asyncio.run and loop.run_until_complete. By design asyncio does not allow its event loop to be nested. When in a jupyter notebook environment where the event loop is already running it’s impossible to run tasks and wait for the result. Trying to do so will give the error “RuntimeError: This event loop is already running”.

In [2]:
import chromedriver_autoinstaller
from selenium import webdriver
from time import sleep
import base64
import random
import asyncio
import time 
import aiohttp
from aiohttp.client import ClientSession
import re
from urllib.parse import unquote
from GrabzIt import GrabzItClient
import nest_asyncio
from collections import Counter
nest_asyncio.apply()

opt = webdriver.ChromeOptions()
opt.add_argument("--start-maximized")

chromedriver_autoinstaller.install()
driver = webdriver.Chrome(options=opt)
driver.set_page_load_timeout(5)

## Collate all the URLs for uploaded code on yourjavascript.com

Thankfully collecting all the urls for the code snippets was relatively easy and was indexed at the [uploaded](http://yourjavascript.com/uploaded/) endpoint. You can either run the next cell to get an up to date list of URLs or uncomment the cells below to read from a text file (accurate as of 18/01/2022) 

In [3]:
url_list = []
for x in range(1,536):
    driver.get('http://yourjavascript.com/uploaded/?p=' + str(x))
    elems = driver.find_elements_by_xpath("//a[@href]")
    for elem in elems:
        if 'file.php' in str(elem.get_attribute("href")):
            url_list.append(elem.get_attribute("href"))
    sleep(0.5)

In [4]:
len(url_list)

In [5]:
'''
with open('url_list.txt', 'w') as f:
    for item in url_list:
        f.write("%s\n" % item)
'''



In [3]:
'''
url_list = open('url_list.txt','r').read().splitlines()
'''



## Triage the collected URLs for signs of obfuscation/malicious activity

As there are >50000 snippets to audit we need to quickly detect which urls contain maliciously obfuscated code. The phishing email I analysed previously was comparable to a jigsaw puzzle: some segments of the HMTL file were local and some hosted elsewhere. The individual segments may appear harmless in isolation enabling them to slip past conventional security solutions. Those individual segments were either URL or base64 encoded so this is what I decided to search for using 'atob(un' as a pattern (*atob(unescape())* can deobfuscate encoded html strings).

I used a fairly simple substring pattern but equally the following logic could be applied to find different examples of malicious obfuscation:

- **Other Strings** - 'eval()', 'exec()' or substrings over a certain length could also help identify malicious snippets but are prone to false positives
- **Machine Learning** - Either training your own [obfucsaction classifier](https://www.kaggle.com/fanbyprinciple/javascript-obfuscation-detection) or using an [existing tool](https://github.com/Aurore54F/JaSt) would be a more refined way of finding malicious snippets
- **Whitespace analysis** - Syntactically, obfuscation often ends up creating long strings with no space or lots of space to break up strings. Analysing how much whitespace there is in a snippet would probably be a promising strategy.


In [None]:
triaged = []

async def download_link(url:str,session:ClientSession):
    async with session.get(url) as response:
        result = await response.text()
        if 'atob(un' in result: # change this logic for different obfuscation detection methods
            triaged.append(url)
            print(url)
        #print(f'Read {len(result)} from {url}')

async def download_all(urls:list):
    my_conn = aiohttp.TCPConnector(limit=10)
    async with aiohttp.ClientSession(connector=my_conn) as session:
        tasks = []
        for url in urls:
            task = asyncio.ensure_future(download_link(url=url,session=session))
            tasks.append(task)
        await asyncio.gather(*tasks,return_exceptions=True) # the await must be nest inside of the session


start = time.time()
asyncio.run(download_all(url_list))
end = time.time()
print(f'download {len(url_list)} links in {end - start} seconds')

## Render any potentially malicious html as an image

As mentioned above the Grabzit API can render html as an image if you sign up for an account and copy in your key and secret below.

For each of the "triaged" URLs identified in the previous step, this cell does the following:
- Gets the "javascript" snippet from the page
- Determines if the encoding is base64 or URL (%) encoding
- Decodes the obfuscated string
- If the decoded sting is html then it is passed to the Grabzit API
- An image is saved

In [4]:
grabzIt = GrabzItClient.GrabzItClient("key", "secret")

counter = 0
for x in triaged:
    driver.get(x)
    parent_elem = driver.find_element_by_xpath("//div[@class='js']")
    text = parent_elem.text
    
    if '%' in text:
        try:
            urilist = re.findall(r'"([^"]*)"', text)
            for u in urilist:
                html = unquote(u)
                grabzIt.HTMLToImage(html) 
                grabzIt.SaveTo("result" + str(counter) + ".jpg") #saves in local folder
                counter += 1
        except:
            pass
    
    else:
        try:    
            b64list = re.findall(r'"([^"]*)"', text)
            for b in b64list:
                html = base64.b64decode(b)
                grabzIt.HTMLToImage(html) 
                grabzIt.SaveTo("result" + str(counter) + ".jpg") #saves in local folder
                counter += 1
        except:
            pass

You can conduct the steps above yourself but I have inlcuded some of the images to demonstrate that a good selection of the snippets were malicious credential stealers.

![credential stealers](stealers.png)

## Check malicious html for further malicious URLs

The below cell retrieves URLs contained within the malicious html snippets and stores them in the mal_urls list.

I conducted some further manual analysis on the list and found that they were broadly categorised as follows:

- Legitimate links to Microsoft/O365 links so victims could be redirected after credentials had been captured
- Legitimate links to Microsoft/O365 logos and images so rendered html was more believable
- Links to presumably compromised wordpress sites which are acting as a server to capture submitted credentials
- Dead links to presumably now inactive malicious servers
- Links to blurry images used to make local credential stealers more believable

![example](blurry2.png)


In [6]:
def find(string):
    
  
    # findall() has been used 
    # with valid conditions for urls in string
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex, str(string))      
    return [x[0] for x in url]


mal_urls = []

for x in triaged:
    driver.get(x)
    parent_elem = driver.find_element_by_xpath("//div[@class='js']")
    text = parent_elem.text

    if '%' in text:
        try:
            urilist = re.findall(r'"([^"]*)"', text)
            for u in urilist:
                html = unquote(u)
                templist = find(html)
                for t in templist:
                    print(t)
                    mal_urls.append(t)
        except:
            pass
       
    
    else:
        try:
            b64list = re.findall(r'"([^"]*)"', text)
            for b in b64list:
                html = base64.b64decode(b)
                templist = find(html)
                for t in templist:
                    print(t)
                    mal_urls.append(t)
        except:
            pass


## Check for yourjavascript usernames

Each yourjavascript snippet contained a username. You can see an example [here](http://yourjavascript.com/uploaded/file.php?i=1602098388&f=343452.js.html) where the user "2motdepas" has been the creator. Unfortunately, there was no repetition between the usernames from the snippets that I collated but you may have more luck (if they aren't randomly generated). 

In [None]:
usernames = []
for x in triaged:
    driver.get(x)
    parent_elem = driver.find_element_by_xpath('//*[@id="wrap"]/div[2]/div/div/p[1]')
    text = parent_elem.text
    usernames.append(text.split(' ')[-1])
    print(text.split(' ')[-1])

In [25]:
def Most_Common(lst):
    data = Counter(lst)
    return data.most_common()

Most_Common(usernames)

## Mitigating actions

You may be conducting these steps already but here are some mitigating actions you can take to avoid having these adversarial tactics used against your organisation.


- Turn on Safe Attachments policies to check attachments to inbound email. 
- Enable Safe Links protection for users with zero-hour auto purge to remove emails when a URL gets weaponized post-delivery.
- Avoid password reuse between accounts.
- Use multi-factor authentication (MFA) especially for privileged accounts.
- Educate end users on consent phishing tactics as part of security or phishing awareness training.
- Consider blocking yourjavascript.com altogether if it isn't used by your organisation
- The following domains were identified in the mal_urls list which can be used as IOCs or added to blacklists
    - hxxp://www[.]tanikawashuntaro[.]com
    - hxxps://tannamilk[.]or[.]jp
    - hxxp://tokai-lm[.]jp
    - hxxp://coollab[.]jp
    - hxxp://201911040231048719416[.]onamaeweb[.]jp
    - hxxp://tokai-lm[.]jp
    - hxxps://liveautho20[.]000webhostapp[.]com
    - hxxp://www[.]cyuouzemi[.]co[.]jp
