## Course-level competencies

This course develops three major competencies that map onto three industry roles:

- Data Analyst (somebody who extracts insights from data)
- Data Architect (somebody who designs databases)
- Back-end Engineer (somebody who develops non-user-facing code on software stacks)

The pages in this pseudo-lesson describe the competency levels related to each of these roles, as relates to the content in this course. The course is designed to progress through these levels.

### Data analytics

Extracting insights into coherent data visualisations from a relational database using complex, expressive SQL code that completes within a reasonable amount of time

##### **Level 1:** Writes SQL code to generate desired effects on a relational database

- Anticipates the result of SQL queries executed against relation instances
- Expresses declarative query intent in terms of relational operators
- Extracts data from relations with precise selection predicates and attribute projection
- Combines data from tables with appropriately chosen JOIN operations
- Modifies data in relations in bulk with set-theoretic SQL DML queries

#####  **Level 2:** Massages data into visualisation-ready layouts by slicing, dicing, pivoting, and rolling it up it directly in SQL

- Expresses complex logic as single SQL queries using aggregation and sub-queries
- Understands how functional dependencies and referential integrity affect the semantics of queries
- Describes the logical ordering of operators in complex queries that involve nested logic
- Plans out how to transform data from relations into a desired output layout as in standard OLAP/ETL operators
- Prefers embedding complex logic into RDBMS over handling it in application-layer code

##### **Level 3:** Uses a variety of SQL constructs and indexes to produce readable, efficient, idiomatic queries

- Avoids sub-queries when possible.
- Identifies which attributes should be indexed in order to accelerate a query.
- Embraces declarative aspect of SQL to write concise code and avoids iteration logic whenever possible.
- Styles code in a manner consistent with the broader SQL developer community
- Articulates what makes one query better than another semantically equivalent query

##### **Level 4:** Optimises SQL queries to map onto more efficient physical operators

- Creates indexes that prevent the materialisation of temporary tables.
- Recognises queries that will have poor asymptotic complexity in the external memory model
- Understands which logical operators can be rearranged in a query execution plan
- Appreciates why SQL is only partly declarative in practice and how this influences query design
- Connects the physical layout of tables and indexes to the implementation of physical query operators

### Data Modelling

Transforming application and business requirements into well-normalised data abstractions, documented with clear conceptual and relational schemata, and populating instances of those schemata from reliable data sources

##### **Level 1:** Stores data in a set of tables that are compatible with data sources

- Selects appropriate data types for tables
- Writes SQL code that implements a relational design
- Loads data from .JSON and .CSV formats without truncation or other forms of data loss
- Appends newly acquired data to pre-existing tables, modifying their structure as necessary
- Documents the relationships between tables with syntactically and semantically correct entity-relationship diagrams 

##### **Level 2:** Constructs well-normalised conceptual and relational schemata that capture requirements without redundancy

- Eliminates data anomalies with effective normalisation
- Identifies dependencies among attributes and appropriate identifiers/keys for entity sets and relations
- Justifies the quality of a schema through a theoretical lens
- Maps requirements onto schemata and vice versa to ensure designs are minimal and complete
- Internalises the merits of the relational data model even still today

##### **Level 3:** Applies advanced ERD constructs and normalisation methods to produce more natural schemata

- Uses inheritance and weak entity sets when they are more expressive than alternatives.
- Assesses incongruity between conceptual and relational schemata.
- Applies alternative normal forms when they better suit the application requirements.
- Simplifies complicated relationships with powerful ERD constructs like ternary relationships, identifiers on relationships, and composite attributes
- Systematically evaluates strengths and weaknesses of a schema using a consistent framework
- Considers the impact of NULL values and inheritance on functional dependencies

##### **Level 4:** Designs industrial-strength databases that can be deployed in the real world

- Designs for extensibility with an appreciation that data sources may change 
- Considers matters of data governance, ethics, and data privacy in deciding how to meet application requirements
- Anticipates data access patterns and load in database design and carefully considers trade-offs like denormalisation
- Develops conceptual schemata that are compatible with multiple logical database models
- Avoids over-engineering designs in favour of simplicity and satisfying only those requirements already known

### Back-end Engineering

Writing compatible SQL and API code to interface and synchronise data between applications and relational databases, using appropriate access, data consistency, and concurrency controls

##### **Level 1:** Creates conditions to ensure relational databases exhibit ACID behaviour

- Creates transactions to batch queries into atomic units
- Understands the consistency principle to ensure a database never enters an inconsistent state
- Identifies whether a transaction execution schedule is non-serializable and the implications thereof
- Can manually restore a database from a log file to ensure durability

##### **Level 2:** Implements effective controls to ensure that a database can be used in a concurrent, multi-user environment

- Identifies the privileges conferred to each user and the effects on other users of revoking those privileges 
- Creates views to regulate access to data and ensure better data privacy and data governance
- Maximises database throughput by selecting justifiably minimal isolation levels for transactions based on transaction semantics
- Understands the relationship between views and base tables and how DML privileges on views can affect underlying data

##### **Level 3:** Develops robust data access tier API's to safely expose a relational database to other developers 

- Implements stored procedures to improve efficiency of API execution
- Preemptively designs API code to be robust to security threats like SQL injection attacks
- Interfaces between alternative data layouts in application-layer data structures and RDBMS tables, e.g., with ORM's
- Utilises best practices for connection objects, such as RAII to ensure that connections are closed
- Provides mechanisms to push complex logic into the RDBMS and closer to the data