### **Business Rule**

A statement that creates a restriction on specific parts of a database.


Business rules affect:

- What data is collected and stores
- How relationships are defined
- What kind of information the database provides
- The security of the data

These checks are important because they ensure that databases do their jobs as intended.
And because business rules are so integral to the way databases function, verifying that they're working correctly is very important.

**Business rules**

As you have been learning, a business rule is a statement that creates a restriction on specific parts of a database. These rules are developed according to the way an organization uses data. Also, the rules create efficiencies, allow for important checks and balances, and also sometimes exemplify the values of a business in action.  For instance, if a company values cross-functional collaboration, there may be rules about at least 2 representatives from two teams checking off completion on some data set. They affect what data is collected and stored, how relationships are defined, what kind of information the database provides, and the security of the data. In this reading, you will learn more about the development of business rules and see an example of business rules being implemented in a database system.

**Imposing business rules**

Business rules are highly dependent on the organization and their data needs. This means business rules are different for every organization. This is one of the reasons why verifying business rules is so important; these checks help ensure that the database is actually doing the job you need it to do. But before you can verify business rules, you have to implement them.

For example, let’s say the company you work for has a database that manages purchase order requests entered by employees. Purchase orders over $1,000 dollars need manager approval. In order to automate this process, you can impose a ruleset on the database that automatically delivers requests over $1,000 to a reporting table pending manager approval. Other business rules that may apply in this example are: prices must be numeric values (data type should be integer); or for a request to exist, a reason is mandatory (table field may not be null).


<img src="./images/5.png"></img>

In order to fulfill this business requirement, there are three rules at play in this system:

1. Order requests under $1,000 are automatically delivered to the approved product order requests table

2. Requests over $1,000 are automatically delivered to the requests pending approval table

3. Approved requests are automatically delivered to the approved product order requests table

These rules inherently affect the shape of this database system to cater to the needs of this particular organization.

**Verifying business rules**

Once the business rules have been implemented, it’s important to continue to verify that they are functioning correctly and that data being imported into the target systems follows these rules. These checks are important because they test that the system is doing the job it needs to, which in this case is delivering product order requests that need approval to the right stakeholders. 

**Key takeaways**

Business rules determine what data is collected and stored, how relationships are defined, what kind of information the database provides, and the security of the data. These rules heavily influence how a database is designed and how it functions after it has been set up. Understanding business rules and why they are important is useful as a BI professional because this can help you understand how existing database systems are functioning, design new systems according to business needs, and maintain them to be useful in the future.

## **Database performance testing in an ETL context**

In previous lessons, you learned about database optimization as part of the database building process. But it’s also an important consideration when it comes to ensuring your ETL and pipeline processes are functioning properly. In this reading, you are going to return to database performance testing in a new context: ETL processes.

**How database performance affects your pipeline**

Database performance is the rate that a database system is able to provide information to users. Optimizing how quickly the database can perform tasks for users helps your team get what they need from the system and draw insights from the data that much faster.

Your database systems are a key part of your ETL pipeline– these include where the data in your pipeline comes from and where it goes. The ETL or pipeline is a user itself, making requests of the database that it has to fulfill while managing the load of other users and transactions. So database performance is not just key to making sure the database itself can manage your organization’s needs– it’s also important for the automated BI tools you set up to interact with the database.

**Key factors in performance testing**

Earlier, you learned about some database performance considerations you can check for when a database starts slowing down. Here is a quick checklist of those considerations:

- Queries need to be optimized

- The database needs to be fully indexed

- Data should be defragmented

- There must be enough CPU and memory for the system to process requests

You also learned about the five factors of database performance: workload, throughput, resources, optimization, and contention. These factors all influence how well a database is performing, and it can be part of a BI professional’s job to monitor these factors and make improvements to the system as needed.

These general performance tests are really important– that’s how you know your database can handle data requests for your organization without any problems! But when it comes to database performance testing while considering your ETL process, there is another important check you should make: testing the table, column, row counts, and Query Execution Plan.

Testing the row and table counts allows you to make sure that the data count matches between the target and source databases. If there are any mismatches, that could mean that there is a potential bug within the ETL system. A bug in the system could cause crashes or errors in the data, so checking the number of tables, columns, and rows of the data in the destination database against the source data can be a useful way to prevent that.

**Key takeaways**

As a BI professional, you need to know that your database can meet your organization’s needs. Performance testing is a key part of the process. Not only is performance testing useful during database building itself, but it’s also important for ensuring that your pipelines are working properly as well. Remembering to include performance testing as a way to check your pipelines will help you maintain the automated processes that make data accessible to users!

**Defend against known issues**

In this reading, you’ll learn about a defensive check applied to a data pipeline. Defensive checks help you prevent problems in your data pipeline. They are similar to performance checks but focus on other kinds of problems. The following scenario will provide an example of how you can implement different kinds of defensive checks on a data pipeline.

**Scenario**

Arsha, a Business Intelligence Analyst at a telecommunications company, built a data pipeline that merges data from six sources into a single database. While building her pipeline, she incorporated several defensive checks that ensured that the data was moved and transformed properly.

Her data pipeline used the following source systems:

1. Customer details

2. Mobile contracts

3. Internet and cable contracts

4. Device tracking and enablement

5. Billing

6. Accounting

All of these datasets had to be harmonized and merged into one target system for business intelligence analytics. This process required several layers of data harmonization, validation, reconciliation, and error handling.

**Pipeline layers**

Pipelines can have many different stages of processing. These stages, or layers, help ensure that the data is collected, aggregated, transformed, and staged in the most effective and efficient way. For example, it’s important to make sure you have all the data you need in one place before you start cleaning it to ensure that you don’t miss anything. There are usually four layers to this process: staging, harmonization, validation, and reconciliation. After these four layers, the data is brought into its target database and an error handling report summarizes each step of the process.


<img src="./images/d1.png"></img>

**Staging layer**

First, the original data is brought from the source systems and stored in the staging layer. In this layer, Arsha ran the following defensive checks:

- Compared the number of records received and stored

- Compared rows to identify if extra records were created or records were lost

- Checked important fields, such as amounts, dates, and IDs

Arsha moved the mismatched records to the error handling report. She included each unconverted source record, the date and time of its first processing, its last retry date and time, the layer where the error happened, and a message describing the error. By collecting these records, Arsha was able to find and fix the origin of the problems. She marked all of the records that moved to the next layer as “processed.”

**Harmonization layer**

The harmonization layer is where data normalization routines and record enrichment are performed. This ensures that data formatting is consistent across all the sources. To harmonize the data, Arsha ran the following defensive checks:

- Standardized the date format

- Standardized the currency

- Standardized uppercase and lowercase stylization

- Formatted IDs with leading zeros

- Split date values to store the year, month, and day in separate columns

- Applied conversion and priority rules from the source systems

When a record couldn’t be harmonized, she moved it to Error Handling. She marked all of the records that moved to the next layer as “processed.”

**Validations layer**

The validations layer is where business rules are validated. As a reminder, a business rule is a statement that creates a restriction on specific parts of a database. These rules are developed according to the way an organization uses data. Arsha ran the following defensive checks:

- Ensured that values in the “department” column were not null, since “department” is a crucial dimension

- Ensured that values in the “service type” column were within the authorized values to be processed

- Ensured that each billing record corresponded to a valid processed contract

Again, when a record couldn’t be harmonized, she moved it to error handling. She marked all the records that moved to the next layer as “processed.”

**Reconciliation layer**

The reconciliation layer is where duplicate or illegitimate records are found. Here, Arsha ran defensive checks to find the following types of records:

- Slow-changing dimensions

- Historic records

- Aggregations

As with the previous layers, Arsha moved the records that didn't pass the reconciliation rules to Error Handling. After this round of defensive checks, she brought the processed records into the BI and Analytics database (OLAP).

**Error handling reporting and analysis**

After completing the pipeline and running the defensive checks, Arsha made an error handling report to summarize the process. The report listed the number of records from the source systems, as well as how many records were marked as errors or ignored in each layer. The end of the report listed the final number of processed records.

<img src="./images/d2.png"></img>

**Key takeaways**

Defensive checks are what ensure that a data pipeline properly handles its data. Defensive checks are an essential part of preserving data integrity. Once the staging, harmonization, validations, and reconciliation layers have been checked, the data brought into the target database is ready to be used in a visualization.