# New generation datamodels and DBMSS Project
2023 / april 2025 edition

Before diving into reading this notebook, make sure you have read the project guidelines provided by the professor. You can find them [here](./Project2023-vers1.pdf).


## 1) Transaction Data Simulator Tool
...

## 2) Conceptual Model

To create the following conceptual model, I analyzed the data generated by the "Transaction Data Simulator" tool, aiming to understand its semantics in order to design a simple structure that clearly illustrates the relationships between the data to be stored in the database. Additionally, I grouped some data into custom types to further enhance the semantics and readability of the model.

<img src="./assets/Conceptual model UML.svg" alt="UML Diagram" style="width:800px;">

### 2.2) Costraints
#### Terminal
- 0 <= `coords.x` <= 100
- 0 <= `coords.y` <= 100

#### Customer
- 0 <= `coords.x` <= 100
- 0 <= `coords.y` <= 100
- `spending_mean` >= 0
- `spending_std` >= 0
- `transactions_per_day_mean` >= 0

#### Transactions
- `amount` > 0
- 0 <= `fraud_scenario` <= 3
- 0 <= `security_feeling` <= 5



## 3) Logical Model

Before proceeding with the logical model, it is important to indicate which database I have chosen to manage the data and the decisions I made regarding the representation of the data to meet the workload requirements.

### 3.1) Database
As a database, I chose to use Neo4j due to the nature of the data, which suggests a graph structure. In fact, all the relationships present are of the N:N type, and such relationships are excellently handled by graph databases. 

Furthermore, this choice was confirmed by the workload, especially by query 3c, which involves continuous traversal of relationships up to a certain `K` value that determines when to stop. Performing this query would be extremely costly if we had to perform a join (or lookup) for each traversed relationship. 

Additionally, as we will see later, Cypher, Neo4j's query language, offers a library called APOC that will allow us to execute query 3c with impressive performance.

### 3.2) Data representation (Workload friendly)
Since Neo4j does not allow the definition of custom types or the insertion of objects within node properties, I decided to eliminate all custom types and implement them using primitive types. For the custom types representing objects, I created a property for each attribute with its corresponding primitive type. For enums, I used simple strings.

The attribute names in the logical model differ from those in the conceptual model because they are the same as those used by the "Transaction Data Simulator" tool, except for the new data added by me, which are the ones explained in the following paragraph or the ones that were to be added as indicated in the project guidelines. For more information on what a specific field means, refer to the page provided in the project guidelines on the "Transaction Data Simulator" tool, as it explains all the fields in detail.

As we will see later, to improve the efficiency of the workload through indexing, I decided to split the `transactions.registration` field into its components: day, month, year, and time. These components are now represented as `tx_date_day`, `tx_date_month`, `tx_date_year`, and `tx_date_time`, respectively. This division was made because many queries in the workload filter data using only the month and year of the transactions.registration field. If I had created an index on the entire field, it would not have been used, as the filters in the queries would only utilize a subset of the entire field. Therefore, the division was made, and a composite index was created only on the year and month fields.

The data types specified are those present in Neo4j.

### 3.1) Database
As a database, I chose to use Neo4j due to the nature of the data, which suggests a graph structure. In fact, all the relationships present are of the N:N type, and such relationships are excellently handled by graph databases. 

Furthermore, this choice was confirmed by the workload, especially by query 3c, which involves continuous traversal of relationships up to a certain `K` value that determines when to stop. Performing this query would be extremely costly if we had to perform a join (or lookup) for each traversed relationship. 

Additionally, as we will see later, Cypher, Neo4j's query language, offers a library called APOC that will allow us to execute query 3c with impressive performance.

<img src="./assets/Logical model UML.svg" alt="UML Diagram" style="width:800px;">

### 3.3) Costraints
#### Terminal
- 0 <= `x_terminal_id` <= 100
- 0 <= `y_terminal_id` <= 100

#### Customer
- 0 <= `x_customer_id` <= 100
- 0 <= `y_customer_id` <= 100
- `mean_amount` >= 0
- `std_amount` >= 0
- `mean_nb_tx_per_day` >= 0

#### Transactions
- `tx_amount` > 0
- 0 <= `tx_fraud_scenario` <= 3
- 0 <= `tx_security_feeling` <= 5
- `tx_date_day`, `tx_date_month`, `tx_date_year` form a correct date type object 
- `tx_date_time` forms a correct localTime object
- `tx_day_period` is one of the following strings ["morning", "afternoon", "evening", "night"]
- `tx_products_type` is one of the following strings ["high-tech", "food", "clothing", "consumable", "other"]

### 3.4) Assumptions
Since the constraints that can be implemented in Neo4j focus only on the structure and data type, and do not allow constraints on the actual values or the direction of relationships, I assume that whichever software provides the data to be inserted into the database has correctly implemented all the constraints listed above (except for the constraints on the properties `tx_date_...`, since those can be validated at the database level). In our case, we assume that the values produced by the "Transaction Data Simulator" tool are correct and comply with the constraints. 

Since Neo4j constraints also do not allow us to define the direction of relationships, it is our responsibility to ensure that, in the queries used to create relationships, we do not make mistakes and avoid generating relationships in the wrong direction.

For more detailed information, I refer you to the Neo4j [documentation](https://neo4j.com/docs/cypher-manual/current/constraints/managing-constraints/).