# Secure Coding in Python

<img src="https://github.com/FSCJ-FacultyDev/SWC-Virtual-2024/blob/main/notebooks.day5/images/SecurePython.png?raw=true" width=200 height=200/>

Secure coding practices are essential to protect applications and systems from a wide array of cyber threats and vulnerabilities. As software increasingly handles sensitive data and performs critical functions, the potential consequences of security breaches grow more severe, ranging from data theft and financial loss to reputation damage and legal liabilities. Implementing secure coding practices helps to ensure  applications are resilient against common  attacks that can exploit weaknesses in code. By adopting secure coding practices, developers can build software that meets functional requirements and also maintains the integrity, confidentiality, and availability of data.

Several organizations provide guidelines and frameworks for secure coding:

[OWASP Secure Coding Practices Quick Reference Guide](https://owasp.org/www-project-secure-coding-practices-quick-reference-guide/)
OWASP (Open Web Application Security Project) provides a comprehensive guide to secure coding practices, focusing on the most critical security risks to web applications.

[NIST SP 800-53: Security and Privacy Controls for Information Systems and Organizations](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final)
NIST (National Institute of Standards and Technology) publishes various standards and guidelines for cybersecurity, including secure coding practices.

[CERT Secure Coding Standards](https://wiki.sei.cmu.edu/confluence/display/seccode/SEI+CERT+Coding+Standards)
CERT (Computer Emergency Response Team) publishes secure coding standards for different programming languages to improve software security.

[ISO/IEC 27001 Information Security Management](https://www.iso.org/isoiec-27001-information-security.html)
ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) 27001 is an international standard for managing information security which includes guidelines for secure software development.

[PCI DSS Secure Software Standard](https://www.pcisecuritystandards.org/document_library?category=software_security)
PCI DSS (Payment Card Industry Data Security Standard) provides guidelines for secure coding practices specifically for applications handling payment card data.

Secure coding topics include

- Input Validation
- Error Handling
- Use of Secure Libraries
- Access Control and Authorization
- Logging and Monitoring
- Common Vulnerabilities
  - SQL Injection
  - Cross-Site Scripting (XSS)
  - Insecure Deserialization
  - Sensitive Data Exposure
  - Improper Error Handling
- Software Supply Chain Security

## SQL Injection

SQL injection is a code injection technique that exploits vulnerabilities in an application's software by inserting malicious SQL statements into an entry field for execution. It can lead to unauthorized access to or manipulation of the database, exposure of sensitive data, and potentially damaging changes to the database structure.

An unsafe query can directly incorporate user input into a query. "Alice' OR '1'='1" becomes "SELECT * FROM users WHERE name = 'Alice' OR '1'='1'", which returns all rows because '1'='1' is always true.

Parameterized queries are not vulnerable to SQL injection. By using a placeholder (?) and passing user input as a parameter which replaces the placeholder when the query executes, the query data is treated as a single string instead of separate commands. "'Alice' OR '1'='1'" is unlikely to match a name entry in the USERS database.

In [None]:
# use an in-memory DB to demonstrate parameterized queries
import sqlite3

# Unsafe input (vulnerable to SQL Injection)
def unsafe_query(user_input):
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE users (id INTEGER, name TEXT)")
    cursor.execute("INSERT INTO users (id, name) VALUES (1, 'Alice')")
    cursor.execute(f"SELECT * FROM users WHERE name = '{user_input}'")
    result = cursor.fetchall()
    conn.close()
    return result

print("Unsafe query result:", unsafe_query("Alice' OR '1'='1"))

# Safe input using parameterized query
def safe_query(user_input):
    conn = sqlite3.connect(':memory:')
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE users (id INTEGER, name TEXT)")
    cursor.execute("INSERT INTO users (id, name) VALUES (1, 'Alice')")
    cursor.execute("SELECT * FROM users WHERE name = ?", (user_input,))
    result = cursor.fetchall()
    conn.close()
    return result

print("Safe query result:", safe_query("Alice' OR '1'='1"))

## Cross-Site Scripting (XSS)

Cross-Site Scripting is a security vulnerability that allows attackers to inject malicious scripts into web pages viewed by other users. These scripts can be used to steal sensitive data, hijack user sessions, redirect users to malicious sites, and perform other harmful actions.

The code below demonstrates how XSS vulnerabilities can occur and how to mitigate them.

- **user_input** contains a script tag with JavaScript code that shows an alert box
- **display(HTML(user_input))** directly renders this HTML content, including the script tag.
- this rendering method is vulnerable to XSS attack; if an attacker can control the content of user_input, they can inject malicious scripts that will execute in the user's browser
- in this example, an alert box with the message 'XSS' appears, demonstrating the injection of a script
- The html.escape function sanitizies the user input; the browser will not execute the content because it does not recognize the text sequences as actual HTML tags but as literal text representations.

In [None]:
from IPython.display import display, HTML
import html

def sanitize_input(user_input):
    return html.escape(user_input)

user_input = "<script>alert('XSS');</script>"
safe_input = "Hello, user!"

# Unsafe rendering (vulnerable to XSS)
display(HTML(user_input))

# Safe rendering with sanitized input
print("user_input:", user_input)
sanitized_input = sanitize_input(user_input)
print("Sanitized input:", sanitized_input)
display(HTML(sanitized_input))

# Safe rendering with predefined safe input
display(HTML(safe_input))

## Insecure Deserialization

This vulnerability allows an attacker to execute arbitrary code on the system, which can lead to severe consequences, such as data theft, data corruption, or taking control of the system.

When untrusted data is deserialized, it can lead to arbitrary code execution. This example illustrates how malicious code can be executed through Python's pickle module by serializing and then deserializing an unsafe object.

When pickle.loads(malicious_data) is executed, the os.system("echo 'Exploited!'") command runs, printing "Exploited!" to the console.

### Preventing Insecure Deserialization
- Do not use pickle to serialize and deserialize data from untrusted sources.
  - Instead, use a safer alternative such as JSON, which does not support arbitrary code execution.
- Use libraries designed to handle serialization safely, such as json or simplejson
- Ensure that any data being deserialized is from a trusted and validated source.

In [None]:
import pickle
import subprocess

class Exploit:
    def __reduce__(self):
        import os
        # use this for true console execution
        #return (os.system, ("echo 'Exploited!'",))
        # use this as a workaround for Google Colab
        return (subprocess.check_output, (("echo", "Exploited!"),))

# Unsafe deserialization
malicious_data = pickle.dumps(Exploit())
pickle.loads(malicious_data)  # This will execute the malicious code

## Sensitive Data Exposure

Sensitive data exposure is a security vulnerability where sensitive information such as passwords, credit card numbers, personal data, API keys, and other confidential information is unintentionally exposed to unauthorized users. This can occur through various means, such as insecure storage, transmission over unsecured channels, inadequate access controls, or improper handling of data within applications. When sensitive data is exposed, it can lead to severe consequences including identity theft, financial loss, and unauthorized access to systems. Proper data protection practices, such as encryption, secure storage, and access controls, are essential to mitigate the risks associated with sensitive data exposure.

In this example
- the API key is stored directly in the code as insecure_api_key
- the Exploit class uses this key to demonstrate how an attacker could potentially access it through unsafe deserialization
- the Exploit class uses subprocess.check_output to echo the insecurely stored API key
- the unsafe deserialization of the Exploit object shows how the key can be exposed and accessed by an attacker.

To defend against data exposure, the API key can be stored in an environment variable using os.environ['API_KEY']. The key is then accessed securely using os.getenv('API_KEY'), which retrieves the value of the environment variable.

**NOTE**:  There are rare scenarios where this could still be insecure, e.g.,
- Environment variables set in a shared environment without proper access control.
- Accidentally exposing environment variables in client-side code.

In [None]:
import pickle
import subprocess
import os

# Class to demonstrate the exploit
class Exploit:
    def __reduce__(self):
        return (subprocess.check_output,
            (("echo", f"API key: {insecure_api_key}"),))

# Insecure storage of API key
insecure_api_key = 'my_insecure_api_key'

# Unsafe deserialization demonstrating insecure storage vulnerability
malicious_data = pickle.dumps(Exploit())
output = pickle.loads(malicious_data)
print("Output from insecure storage exploit:")
print(output.decode('utf-8'))

# Secure storage of API key
os.environ['API_KEY'] = 'my_secure_api_key'

# Access the API key securely
secure_api_key = os.getenv('API_KEY')
print("\nSecure API key:", secure_api_key)

## Improper Error Handling

Improper error handling occurs when errors are not managed in a way that maintains the security and stability of an application. It can lead to information leakage, where sensitive information is exposed to unauthorized users, or it can fail to provide meaningful feedback to the user or system administrators, making it harder to diagnose and fix issues. It can also result in an application that is less robust and more prone to crashes or other unintended behaviors.

When the login function is called with incorrect credentials, the output will be
```
Error: Invalid credentials
```
This happens because the credentials do not match the hardcoded ones, causing the ValueError to be raised and caught by the first except block. The error message "Invalid credentials" might give away too much information to a potential attacker. A more secure approach would be to provide a generic error message that does not confirm the validity of either the username or password.

While this example does not log sensitive information, improper error handling might lead to scenarios where sensitive data (such as passwords) could be logged.

The **except Exception as e** block catches all exceptions, which might mask underlying issues that need specific handling. It’s generally better to handle known exceptions explicitly and let unexpected ones propagate or be logged appropriately.

The current error in this example handling only prints error messages to the console. In a real application, it would be better to handle errors in a way that informs the user appropriately, possibly through a user interface, and logs the error for further investigation by developers or administrators.

In [None]:
def login(user, password):
    try:
        # Simulate a login process
        if user == "admin" and password == "password":
            return "Login successful"
        else:
            raise ValueError("Invalid credentials")
    except ValueError as e:
        print(f"Error: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Call the login function with incorrect credentials
login("admin", "wrongpassword")

## Software Supply Chain Security

- Vulnerabilities can occur in any programming language
  - Python is a great example because it is  so widely used, especially in data science
- People of all skill levels are using it, so risks are more pronounced
- Examples of recently reported vulnerabilities in open source Python tools:
  - https://thehackernews.com/2024/02/new-malicious-pypi-packages-caught.html
  - New Malicious PyPI Packages Caught Using Covert Side-Loading Tactics
  - https://www.sonatype.com/blog/top-8-malicious-attacks-recently-found-on-pypi
    - RAT (Remote Access Trojan) Mutants
    - PyTorch Namespace Confusion Attack
    - GTA 5 Multihack Site

### The Software Bill of Materials (SBOM)
- <a href="https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/" target="_blank">
  <strong>U.S. Executive Order on Improving the Nation’s Cybersecurity (14028)</strong>
  </a>
  <blockquote>"Understanding the supply chain of software, obtaining an SBOM, and using it to analyze known vulnerabilities are crucial in managing risk."</blockquote>
- A **Software Bill of Materials (SBOM**) is a detailed inventory of all components, libraries, and dependencies used by a software application
- It provides a comprehensive record which lists open-source, proprietary, and third-party components
- It contains component metadata, including version numbers, licenses, and source information
- SBOMs promote visibility into the software supply chain
- It is used in conjunction with scanning tools to identify components with known security issues

### Generating an SBOM

- Popular SBOM generators include CycloneDX, SPDX, OWASP Dependency-Track, Syft, Anchore, and FOSSA
- Output formats vary, but JSON (JavaScript Object Notation, pronounced "Jason") is one of the most popular
- JSON is a lightweight, text-based, human-readable format used to represent data as key-value pairs and arrays
  - https://www.json.org/json-en.html
  - A simple JSON example:
```
      {"name": "John Doe",  "age": 30,  "city": "New York"}
```



### More JSON Examples

<img src="https://github.com/FSCJ-FacultyDev/SWC-Virtual-2024/blob/main/notebooks.day5/images/JSON-Jasons.png?raw=true" width=400 height=250/>

### Open Source Scanning Tools
- Scanning tools examine software codebases to identify open-source components and their licenses
- These tools find known vulnerabilities in open-source libraries and dependencies
- They also ensure compliance with open-source licenses and legal requirements
- The tools assess and manage potential risks associated with using open-source software
- Popular scanning tools include Sonatype Nexus IQ, Snyk, Black Duck, OWASP Dependency-Check, WhiteSource, Trivy, and Clair
- **U.S. Executive Order 14028** (again) mandates the verification of open source software components using these types of tools


### Scanning Plus SBOMS For Security and Compliance

Steps:
- Create the project SBOM (includes components, dependencies, and  metadata)
- Configure the scanner to use the generated SBOM
  - The scanner cross-references SBOM data with vulnerability databases to identify known issues
  - e.g., CVE (Common Vulnerabilities and Exposures), National Vulnerability Database (NVD), Aqua Vulnerability Database, OSS Index, GitHub Advisory Database, Snyk Vulnerability Database
- A report is generated highlighting vulnerabilities and providing actionable insights for remediation and updates


### Walkthrough

- The following example demonstrates how to perform a scan for a Python program which uses TensorFlow, a widely-used open source machine learning library.
- The scenario is that we are developing a Python application which uses several popular data science libraries (numpy, pandas, etc.)
- A JSON-based SBOM is created using the cyclonedx generator library
The trivy scanner is executed against the generated SBOM to identify known vulnerabilities in the installed modules


In [None]:
# Download the script
!wget https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Virtual-2024/main/notebooks.day5/scripts/gensbom.py -O /content/gensbom.py

In [None]:
!pip install cyclonedx-python-lib

In [None]:
# Execute the script
!python /content/gensbom.py

In [None]:
# inst all trivy
!apt-get update
!apt-get install -y wget apt-transport-https gnupg lsb-release
!wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | apt-key add -
!echo deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main | tee -a /etc/apt/sources.list.d/trivy.list
!apt-get update
!apt-get install -y trivy

In [None]:
!trivy sbom /content/sbom.json