# Data Modeling

A  database can be thought of as a representation of a collection of entity sets and relationships between the entities.

Entities do not usually include:
* An output of the system(i.e. a report)
* The system itself
* The company that owns the system

The name of a entity should usually be singular nouns. Try not to abbreviate names.

## Mapping ER Diagram to Database Tables

* Entity set often corresponds to a table in the database
* Entity instance often corresponds to a row in a table
* Attribute often corresponds to a column in a table
* Relationship set(link between entity sets) often corresponds to a Foreign Key in a table
* Relationship instance(link between entity instances), Foreign Key value = Primary Key value

## Representing Entities and Attributes

* Key(Identifier): Fully identifies an instance
* Partial Key: Partially identifies an instance(together with other attributes forms a Key)
* Attributes: Can be mandatory, optional, derived(For a employee table, the years a employee worked can be derived from the date start working), multivalued(skills), Composite(The name consists of first name and last name)

When converting a conceptual design to a logical one, composite attributes become individual attributes, multi-valued attributes become a new table, resolve many-to-many relationships via a new table, add foreign keys at crows foot end of relationships.

Converting to physical design, determine data types for each attribute.

Suppose we have a employee table with role being one of the attribute. Each employee can have several roles. Therefore we create a new StaffRole table, the employeeid together with the role forms the primary key for the table.

## Business Rules and Relationships

Business rules are assertions that constrain entities. It can be assertions about attributes(score can never exceed 100) or assertions about entities(A customer sets up at least one account).

## Keys

Keys or Identifiers are used to identify individual entity instances
* Primary Key: (set of) columns, the value in which uniquely identify each instance. No column can be removed from the key without losing uniqueness. (If two columns together forms the key for a table, of course any other column plus those two columns would make another key for this table, however they are not primary key)
* Candidate Key: The set of possible primary keys. We select the primary key from this
* Composite Key: a key which is made up of more than one attribute, e.g., for the entity "airline flight" we might use the composite key FlightNumber+FlightDate
* Foreign Key: If this attribute is primary key for another table, then it is called foreign key. It's used to link to a primary key in another table

Primary keys are unique, never null, do not change their value.

The tables get linked through a foreign key in the following figure.

<img src="img/img17.png" width="400">

Add foreign keys at crows feet end of relationships. Every CustomerID in Account must be present in Customer, this is called referential integrity.

<img src="img/img18.png" width="400">

"ON DELETE RESTRICT" assures that we can not delete a customer in the Customer table. "ON UPDATE CASCADE" makes sure that any change we did for the CustomerID in Customer table will be transfered to Account table.

If there is no customer with CustomerID 5, we cannot add a new row in the Account table with its foreign key equals to 5.

## Relationship Degree

<img src="img/img19.png" width="400">

For a unary case, one example is that the employee table got a boss attribute which is again a employeeid.

<img src="img/img20.png" width="400">

## Strong and Weak entities

Strong Entity: entity 2's identity(primary key) is independent of the identity of other entities.(dotted line)

Weak Entity: entity 2's identity depends on(include) the identity of entity 1.(solid line)

# SQL

<img src="img/img21.png" width="300">

<img src="img/img22.png" width="300">

<img src="img/img23.png" width="300">

<img src="img/img24.png" width="300">

"having" is the same word as "where" which works on groups.

## Join

If data about an entity is spread across $2$ tables, join them.

Inner join: Join rows where FK value = PK value

<img src="img/img25.png" width="500">

Natural join: Gives the same result as Inner join, but requires PK and FK columns to have the same name. It looks for columns with the same name. If there happens to be more than one pair of same names, bad thing happens. Therefore, try to use inner join.

<img src="img/img26.png" width="500">

Outer join: Can be left or right

<img src="img/img27.png" width="500">

For Customer RIGHT OUTER JOIN Account, the customer Akin without any account has been omitted from the table.

Lets say you have a `Students` table, and a `Lockers` table. In SQL, the first table you specify in a join, `Students`, is the **LEFT** table, and the second one, `Lockers`, is the **RIGHT** table. Each student can be assigned to a locker, so there is a `LockerNumber` column in the `Student` table. More than one student could potentially be in a single locker, but especially at the beginning of the school year, you may have some incoming students without lockers and some lockers that have no students assigned. 

For the sake of this example, lets say you have **100 students**, 70 of which have lockers. You have a total of **50 lockers**, 40 of which have at least 1 student and 10 lockers have no student.

**INNER JOIN** is equivalent to "show me all students with lockers".
Any students without lockers, or any lockers without students are missing.
**Returns 70 rows**

**LEFT OUTER JOIN** would be "show me all students, with their corresponding locker if they have one". 
This might be a general student list, or could be used to identify students with no locker. 
**Returns 100 rows**

**RIGHT OUTER JOIN** would be "show me all lockers, and the students assigned to them if there are any". 
This could be used to identify lockers that have no students assigned, or lockers that have too many students. 
**Returns 80 rows** (list of 70 students in the 40 lockers, plus the 10 lockers with no student)

If there is no join condition, the result will be Cartesian product.(every row in Customer combined with every record in Account) Never do this!!

<img src="img/img28.png" width="500">