# Database Conceptual Design

## database schemas

__External:__ 
- An external schema is the view of the database from the perspective of a particular user or application. It defines how the data is presented to the user and what data is visible to them. External schemas allow users to access and manipulate data without having to be concerned with how the data is stored or organized internally.
- Example: a data mart is a type of external schema were a specific data (ex: HR) is extracted form the main warehouse and summarized for a specific user to access

__Conceptual:__ 
- A conceptual schema is a high-level description of the entire database. It defines the overall structure of the database, including the relationships between different data elements. The conceptual schema is designed to be independent of any specific database management system and is often represented using an entity-relationship (ER) diagram.
- Example: a data warehouse is a type of conceptual schema where all data is centralized in one source logically organized according to ER

__Physical:__ 
- A physical schema describes how the data is physically stored in the database. It includes details such as the data storage format, the indexing strategy, and the physical location of the data on the storage device. The physical schema is closely tied to the specific database management system being used.

![pic](https://afteracademy.com/images/what-is-a-schema-three-levels-of-schema-84a896db453efdac.jpg)

## data mining vs data analysis:

__data mining examples:__ 
- Clustering
- association rule mining
- anomaly detection

__data analysis exampels:__
- Regression analysis
- hypothesis testing
- time series analysis

## centerlized vs distrbuted database 

### centerlized database enviroment

- 1 teir (main frame) :
    - the database and the application client both are hosted on the same server
    - cons: single point of failure (appication and database), limited scalability, slow and high trafic
- 2 tier (client server):
    - the application is setup at the user device and connects to the server hosting the database. 
    - cons: single point of failure (database), applications needs to be update for each user, limited scalability, speed is faster since processing for the application is done of the user device 
- 3 tier archetctuer:
    - a small app or web app connects to an application server which connects to the database server
    - cons: single point of failure for application (app server), for database (database server)
- n tier archetcture:
    - add multiple app servers that connects to the database
    - pros: remove single point of failure (for redundant app server), add multiple applications for different servers(HR, accounting)
    
### distrbuted database enviroment:
removes the single point of failure for a database but at more storage costs

- replication: copy part or all of the database into another server where the two databases update eachother at a specified frequency
    - full replica: copy the complete database on another server
    - partial replica: copy part of the database on another server

- fragmentation: split the database into fragments where each fragment is on a diffrent server
    - the fragment can be a group of rows or group of coulmns or a hybrid between

## Database modelling
### Entity Mapping: (mapping table's columns)
for each table check the attributes of each column
- __unique attributes:__ set as primary keys.
    - ex: set employee_id as the primary key
    
</b>

- __Multi-value attributes:__ create a new table use the main table primary key as foreign key and the attribute as the primary key.
    - ex: multiple phone numbers, create a new table where employee phone number and employee id are the primary key and employee_id connects to the main table as a foreign key.

</b>

- __Composite attributes:__ split sub attributes (use each sub attribute as an attribute.)
    - ex: address where it contains country, city, street, apartment.

</b>

- __Derived attributes:__ don’t add the derived attributes.
    - ex: the sum of two column is considered dervied and shouldnt be included it can be calcualted when needed

</b>

- __Weak entitles:__ add a foreign key to entity.
    - a weak entity might be an order entity in an e-commerce database. An order might be identified by a combination of its order number and the customer who placed the order. In this case, the order entity depends on the customer entity to provide a unique identifier.

### Relationship Mapping:


- __One to may:__ add foreign key to the many side.
    - ex: a customer can place multiple orders but orders are made by only one customer (one (customer table) to many (orders table))
    - to know which customer placed a aceartin order the order table has a forign key from the customer table

</b>

- __Many to many:__ add both foreign key into a new table.
    - example: a student can enroll in many courses, and a course can have many students enrolled in it.
    - creating a table with every student course pair (student_id, course_id) can help identify all students enrolled in a certain course and all courses a student is enrolled in

</b>

- __One to one:__
    - the forign key can be put in either side without any problems but its generally prefered to put the foreign key in a less accessed table since an extra column reduces query speed. here are some general rules to apply when mappinga  one to one relationship:
        - May/ Must: add foreign key to the must side.
        - May /May: add foreign key to either side.
        - Must/Must: merge the two tables.
    - ex: a person may have only one passport, and a passport can be issued to only one person.
    - in context of an airport database the person_id is a must have and the passport_id is a must have too so the two tables should be merged

</b>

- __Ternary relationship:__ create a new table with all 3 primary keys as foreign keys. (comparable to a fact table in a warehouse)

### Normalization

#### Zero Normal form
- pick a primary key for the table(s)

#### First Normal form

1. split composite attribute
    - ex split address column(composite) to country city street columns

</b>

2. remove multi-value attributes and repeating groups
    1. multi-value attributes example: employee with multiple phone numbers comma separated in the phone column.
    2. repeating groups example: employee with multiple phone numbers where each phone number is in a column.
    - solution: create an new table phone numbers where: employee_id (foreign) and phone number are the primary key
    
    multi-value attribute example
    |employee_id| phone number|
    |----------|---------------|
    |1|0101115422,0101111245|
    |2|0101111245,0103333313|

    repeating groups example
    |employee_id|phone number1|phone number2|
    |-----------|-------------|------------|
    |1|0101115422|0101111245|
    |2|0101111245|0103333313|
    
    solution example 
    |employee_id| phone number|
    |----------|---------------|
    |1|0101115422|
    |1|0101111245|
    |2|0101111245|
    |2|0103333313|

#### Second Normal form
- identify any partial dependencies in the table and separate them into their own tables. A partial dependency occurs when a non-key attribute is dependent on only part of the primary key. (remove partial dependencies from composite keys)

    - example: for a table that has a book_id, book_name, author_name, author address. the author name /author address is not directly dependant on the book id so it should be moved to a new table (author table)

#### Third Normal form

- identify any transitive dependencies in the table and separate them into their own tables. A transitive dependency occurs when a non-key attribute is dependent on another non-key attribute, rather than on the primary key.

    |Order ID|Item ID|Item Name|Item Price|Quantity|
    |--|--|--|--|--|
    |1|1001|Widget A|10.00|3|
    |1|1002|Widget B|15.00|2|
    |2|1003|Widget C|20.00|1|
    |2|1004|Widget D|25.00|4|

    In this table, the primary key is a composite key made up of "Order ID" and "Item Name". The "Supplier Name", "Supplier Address", and "Supplier Phone" attributes are dependent on the "Item Name" component of the composite key, not on the entire primary key.
    
- solution: separate to the following tables
        
    - Items Table:
        |Item Name|Item Price|
        |-|-|
        |Widget A|10.00|
        |Widget B|15.00|
        |Widget C|20.00|
        |Widget D|25.00|
    
    </b>

    - Suppliers Table:
        |Supplier Name|Supplier Address|Supplier Phone|
        |-|-|-|
        |Supplier X|123 Main St|555-1234|
        |Supplier Y|456 Elm St|555-5678|
        |Supplier Z|789 Oak St|555-9012|
    
    </b>

    - Order Items Table:
        |Order ID|Item Name|Quantity|
        |-|-|-|
        |1|Widget A|3|
        |1|Widget B|2|
        |2|Widget C|1|
        |2|Widget D|4|

