# &#128214; Lab 10: Type System Analysis

## &#127919; Objective

Develop an understanding of type systems and their importance in static code analysis. Implement an analyzer to examine Python code for type consistency and potential type-related issues.

## &#128214; Background

Type systems play an important role in static analysis, providing information about the data types used in programs. This knowledge can help detect type inconsistencies and potential bugs early in the development process.

### Why Type Systems Help Detect Bugs

1. **Early Error Detection:** Type systems enforce rules about how variables and functions are used according to their data types. By checking these rules during compilation or static analysis, many errors can be caught early before the code is run.

2. **Avoiding Type Mismatches:** Type systems prevent operations that are nonsensical or unsafe due to type incompatibility (like adding a string to an integer). This avoids runtime errors and ensures that operations make sense in terms of the data involved.

### Equivalence of Type Systems to Data Flow Analyses

Type systems and data flow analyses, although conceptually different, share a fundamental similarity: they both track information as it moves through a program.

1. **Tracking Data Flow:** Both systems track how data moves and transforms within a program. Data flow analysis focuses on the flow of data facts, while type systems concentrate on the flow of data types.

2. **Ensuring Correctness:** Just as data flow analysis aims to ensure the correct use of data (like avoiding uninitialized variables), type systems ensure the correct use of data types. Both methodologies aim to enforce certain correctness properties in a program.

## &#10145; Tasks

### Task 1: Implementing a Basic Type Analyzer

In this exercise, you will create a Python script that can analyze Python code for type-related information. Your task is to identify variables, functions, and their associated types, and implement functionality to detect type inconsistencies or mismatches.

**Build a Type Analyzer for Python Code:**
   - Create a Python script that can analyze Python code for type-related information.
   - The analyzer should identify variables, functions, and their associated types.
   - Implement functionality to detect type inconsistencies or mismatches.

### Import the necessary library

&#128161; *In the following cell, we will import the library needed for this exercise:*
- `ast`: a module of the python standard library to transform Python code in its AST representation

In [None]:
import ast

Python code

&#128161; The following cell contains a string that represents the Python code that will be analyzed through this exercise

In [None]:
code = """
x = 5
y = 'hello'
x = y      # This should raise a flag since x is an "int" and y a "str"
b = x + y  # This should raise a flag since x is an "int" and y a "str"
"""

&#128161; Now, you will create a class `TypeAnalyzer` that extends `ast.NodeVisitor`. This class will visit each node in the AST and perform type analysis.
Implement a method `infer_type` in your `TypeAnalyzer` class to determine the type of a given node. Start with simple types like integers and strings.
Next, implement the `visit_Assign` method to handle variable assignments. This method should update the variable types and check for type inconsistencies.

In [None]:
class TypeAnalyzer(ast.NodeVisitor):
    def __init__(self):

    def infer_type(self, node):

    def visit_BinOp(self, node):

    def visit_Assign(self, node):

&#128161; Finally, test your analyzer with some sample Python code. Look for variables, their types, and any inconsistencies.

&#10067; Did your analyzer detect the types inconsistencies?

&#128161; Ok, now you will write another class called `NoneDetector`, this class is supposed to detect potential variables that hold `None` values. You can use a custom type, such as a String called "None" to detect that. 

&#128161; You will analyze the code in the following cell

In [None]:
code = """
a = None
do_something(a)
"""

&#128161; In the following cell, implement the `NoneDetector` class.

In [None]:
class NoneDetector(ast.NodeVisitor):

&#128161; Test your code:

Good !

&#128161; But now, analyze this code

In [None]:
code = """
a = None
a = 1
do_something(a)
"""

In the previous example, a is not None anymore, was your code good to detect that? 

If yes, good !

Otherwise update the `NoneDetector` class in the following cell 

In [None]:
class NoneDetector(ast.NodeVisitor):

Perfect, now your code can update the value of `a` to `int`.
Now analyze the following piece of code

In [None]:
code = """
a = None
a = 1
a = None
do_something(a)
"""

Normally, if you implemented everything correctly, your code should work, no changes needed.
But what about this code

In [None]:
code = """
a = None
a = 1
do_something(b)
"""

Does it yield an error?
If yes, it means you correctly handled variables that never received a value.
If no, you need to implement that behavior, do it in the following cell

In [None]:
class NoneDetector(ast.NodeVisitor):

&#128161; Nice! now your code is able to detect some cases where variables might be `None`.

### Broadening the Concept of 'Type'

Traditionally, 'types' refer to categories like integers or strings. However, in advanced type systems:

- A 'type' can represent **varied properties or facts** about data.
- For example, a 'type' might indicate whether a variable contains sanitized input, extending beyond basic data types.

### Dataflow Facts as Types

- Dataflow analysis tracks the movement and transformation of data in a program.
- **Encoding dataflow facts as types** merges dataflow analysis with type systems.
- Example: In taint analysis, 'tainted' or 'untainted' can be considered types, integrating the concept of dataflow within the type system.

### Equivalence in Concept

- This extended view shows that **types in a type system can be equivalent to dataflow facts**.
- Both represent data properties and can be used to analyze program behavior.

&#128161; Now, I would like you to write the `SQLInjectionDetector` class to detected data that are tainted in a program to detect SQL injections.
It means any data coming from user input, let's say from `input` or `read` functions, should be tainted.
Then, you should check if a tainted value flows into a function to perform an SQL query.

&#128161; You will analyze the following code

In [None]:
code = """
import sqlite3

user_input = input("Enter data: ")
print(f"The user has given the following input: {user_input}")

do_something()

query = "SELECT * FROM users WHERE username = " + user_input

conn = sqlite3.connect("mydb.db")
cursor = conn.cursor()
cursor.execute(query)
"""

In [None]:
class SQLInjectionDetector(ast.NodeVisitor):

&#128161; Test your code

&#128161; Congratulations! You can now detect SQL injections with type systems

### &#10067; Questions

Can you explain how a type in a type system can represent a dataflow fact? 

In the exercise where you developed a script to detect type mismatches and conversion errors, how did the type system help identify potential issues? Give an example of a code snippet that would raise a type mismatch error and explain why.

Describe how buffer overflow vulnerabilities can occur in programming. How might a type system be used to prevent such vulnerabilities, based on what you've learned in this lab?

Why is it important to detect potential NoneType errors in Python? Using the NoneType detector you built, explain how the script identifies these errors and suggest a way to handle them to prevent runtime exceptions.

What are some of the challenges you might face when implementing type inference in a dynamic language like Python, as seen in our type mismatch detection exercise? How does dynamic typing complicate type inference?

Gather some information about type annotations on the Internet and explain why they are important and how they could contribute to improve type systems.