# pyWhat Code Demo -- Milestone 1

- Chris Nastasi, Henry Francis, Andrew Gonzalez
- Our Project Repo can be found here: [CyberSecurityPyWhat](https://github.com/Cnasty07/CyberSecurityPyWhat)


## 1. Description & Interactions

pyWhat is a simple resource identifier and tells you what it is. It is very versatile and has the ability to identify what a file is, text is, or hex of a file is. 

### Abstract

 This project explores the functionality and applications of pyWhat, an open-source Python tool designed for resource identification. pyWhat analyzes text strings, files, and raw hexadecimal data to determine their format, encoding, or purpose. By leveraging a broad set of pattern recognizers, it can quickly classify hashes, IP addresses, cryptographic tokens, encoded data, and file signatures. This makes it highly useful in fields such as cybersecurity, digital forensics, and software development, where rapid identification of unknown data is critical. In this demo, we highlight how pyWhat simplifies the process of recognizing obscure data formats, showcase practical examples of its usage, and evaluate its versatility. Additionally, we discuss its limitations and contexts where alternative tools may be preferable.

### Introduction

In the world of computing and cybersecurity, professionals often encounter unfamiliar strings of text, suspicious files, or raw hexadecimal dumps that require interpretation. Manually identifying whether these artifacts represent a cryptographic hash, an encoded message, or a specific file format can be time-consuming and error-prone. This is where pyWhat becomes a valuable tool.
pyWhat is an open-source Python utility that acts as a universal identifier, capable of analyzing text, files, and hex data to determine what they are. Much like the Unix file command but specialized for digital artifacts, pyWhat relies on a curated database of patterns to match inputs against known identifiers such as IP addresses, JWTs, Base64 encodings, ZIP headers, and more.

### Why Would You Not Want To Use pyWhat?

pyWhat depends on a set of predefined patterns. If the data type is rare, custom, or not included in its library, it won’t be recognized 

While it detects formats like Base64, JWTs, or ZIP files, it doesn’t automatically decode, extract, or analyze their contents

It requires Python to run, which may not be available or permitted in restricted production or enterprise environments.

It cannot detect malware, vulnerabilities, or malicious intent. It only classifies data, however, it doesn’t analyze threats.



### Examples Where pyWhat Would Be Useful

Cybersecurity Investigations

CTF (Capture the Flag) Competitions

Incident Response

Digital Forensics

Software Development & Debugging


## **2. Getting Started**

This is a simple walkthrough of how the program would run when giving a command. In this example we would use a PCAP file for pyWhat to Identify the contents. 

### **Information On pyWhat Implementation**

Official Documentation can be found here: https://github.com/bee-san/pyWhat


### **Walkthrough Example: Identifying A Random Hash String**

The code in the following section is depicted and explained to simulate what happens when you run the 'what' or 'pywhat' command in the Command-Line-Interface. In this example we are giving pyWhat a random hash to determine what it is.
**Important**: Note that due to how complex the walkthrough could get, we decided to do something simple. However, pyWhat has the ability to scan malware, scan pcap files, and identify strings easily. You can also scan and identify directories and files recursively. 

For this example, we chose to use a simple string hash to identify. This is "0x52908400098527886E0F7030069857D2E4169EE7" which is the example used in the main repository. This is an Ethereum Wallet address. We also used the default filters and tags rather than adding complexities to this walkthrough.

To run this yourself you would need to either clone the repo, or install pyWhat using a package manager. The install command for pip is: 
```bash
pip install pyWhat
```


### ***Step 1.0: Invoking pyWhat from the CLI***

[](step-one)

In this function we are invoking pyWhat using python and inputting arguments into the subprocess using python. We want to analyze a random hash so we are are going to input the hash directly into a quote string in the command line.

From CLI:
> ```bash
> example@tamusa\~: what "0x52908400098527886E0F7030069857D2E4169EE7"
> ```

Using Python:

In [1]:
import subprocess ## Importing subprocess to run CLI commands

try:
    result = subprocess.run(['what', '0x52908400098527886E0F7030069857D2E4169EE7', '--format', "pretty"], capture_output=True, text=True) ## Example command to invoke pywhat CLI
    # print(result.stdout)
except FileNotFoundError:
    print("pywhat CLI is not installed or not found in PATH.")
    


#### **Step 1.1: Tracing Invokation From CLI**

When pyWhat is called the \_\_main\_\_.py script is ran and then calls the what.py file. This is the main entry point to the program.

> ```python
> """
> pyWhat: Identify Anything.
> """
>
> import platform
> import sys
>
> if __name__ == "__main__":
>    """ Main entry point for the pyWhat application. """
>     # Check for Python version
>     if sys.version_info < (3, 6):
>         print(
>             f"What requires Python 3.6+, you are using {platform.python_version()}. Please install a higher Python version."
>         )
>         sys.exit(1)
>     # Importing what here to avoid circular imports
>     from pywhat import what
> 
>     # If no arguments are provided, show help message
>     if len(sys.argv) == 1:
>         what.main(["--help"]) 
> 
>     what.main()  # calls the main function in what.py
> ```


### ***Step 2.0: what.main()***


The first step in what.main() is to assign and call the What_Object class. This class then calls the function of what_is_this() to identify the input given for pyWhat to identify.


*Note: In the what.main this is where the filters and tags are checked and loaded, but to keep this walkthrough simple we did not add any flags to our command. So we will skip this part for simplicity for now.*

> ```python
>  ## Step 2: Identifying the input using the What_Object class
>     what_obj = What_Object(dist) # dist is the variable used for filtering to specific flags.
>     if kwargs["key"] is None: 
>         key = Keys.NONE
>     else:
>         try:
>             key = str_to_key(kwargs["key"])
>         except ValueError:
>             print("Invalid key")
>             sys.exit(1)
>     ## Step 2.1: Getting the identified output
>     identified_output = what_obj.what_is_this(
>         kwargs["text_input"], 
>         kwargs["only_text"],
>         key,
>         kwargs["reverse"],
>         boundaryless,
>         kwargs["include_filenames"],
>     ) # Identifies the user input and applies the filters and tags, then stores it into the identified_output variable with a dictionary type.
> ```

#### **Step 2.1: What_Object class**

This class takes in the filters and distribution of the command line, and attempts to identify what the user has given them. Because it is recursive, it can identify multiple things at once in a file or program. This will then return a dictionary type back to the main() function.

> ```python
> class What_Object:
>     def __init__(self, distribution):
>         self.id = identifier.Identifier(dist=distribution)
> 
>     ## INFO: Step 2.1: Getting the identified output
>     def what_is_this(
>         self,
>         text: str,
>         only_text: bool,
>         key,
>         reverse: bool,
>         boundaryless: Filter,
>         include_filenames: bool,
>     ) -> dict:
>         """
>         Returns a Python dictionary of everything that has been identified
>         """
>         return self.id.identify(
>             text,
>             only_text=only_text,
>             key=key,
>             reverse=reverse,
>             boundaryless=boundaryless,
>             include_filenames=include_filenames,
>         )
> ```

### ***Step 3.0: Printing Results***

Lastly, the what.main() calls the printing.py class to print the results, with multiple options to display the results. Default mode is pretty which will be shown in this example, but there are other options to display the output which include json, none, and raw formats.

> ```python
>     ## Step 3.1: Printing the output using the printer class
>     p = printer.Printing()
> 
>     ## Step 3.2: Deciding how to print the output based on user arguments
>     if kwargs["json"] or str(kwargs["format"]).strip() == "json":
>         p.print_json(identified_output)
>     elif str(kwargs["format"]).strip() == "pretty":
>         p.pretty_print(identified_output, kwargs["text_input"], kwargs["print_tags"])
>     elif kwargs["format"] is not None:
>         p.format_print(identified_output, kwargs["format"])
>     else:
>         p.print_raw(identified_output, kwargs["text_input"], kwargs["print_tags"])
> ```

### ***Step 4: End Result***

> From start to finish for pyWhat to identify input from the user you will get the cli and output in pretty format that looks like this. 
>
> ![main_demo.gif](https://github.com/bee-san/pyWhat/raw/main/images/main_demo.gif)

#### **Step 4.1: Alternatively, this is what our result from the python cell in [Step 1](#step-10-invoking-pywhat-from-the-cli) would return:**

In [2]:
print(result.stdout) # Printing the result from the step 1.0

[1;37mPossible Identification[0m
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1;37m [0m[1;37mMatched Text           [0m[1;37m [0m┃[1;37m [0m[1;37mIdentified as          [0m[1;37m [0m┃[1;37m [0m[1;37mDescription             [0m[1;37m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 5290840009852           │ Phone Number            │ None                     │
├─────────────────────────┼─────────────────────────┼──────────────────────────┤
│ 7030069857              │ Phone Number            │ None                     │
├─────────────────────────┼─────────────────────────┼──────────────────────────┤
│ 529084000               │ American Social         │ An ]8;id=835429;https://en.wikipedia.org/wiki/Social_Security_number\[97mAmerican [0m]8;;\             │
│                         │ Security Number         │ ]8;id=835429;https://en.wikipedia.org/wiki/Social_Security_number\[97mIdentif

<!-- ![entryPoint](./walkthrough_snaps/entryPoint.png) -->
<!-- <img src="./walkthrough_snaps/entryPoint.png" alt="entryPointPicture" width=50%> -->