### **Relational databases**
Many tasks in data science involve extracting and manipulating data from a database. Although files and spreadsheets are commonly used when working with small and static data sets, large data sets are handled by multiple users from different geographic regions, which presents many challenges. Complex data sets are usually managed in database systems, for several reasons:

***Performance*** - Databases optimize data organization on storage media for fast query processing.

***Security*** - Databases prevent unauthorized data access and encrypt data in case of a security breach.

***Concurrency*** - Databases manage conflicts that occur when concurrent users update the same data.

***Recovery*** - Databases restore the database to a consistent state after system failures.

Most leading database systems are relational. A **relational database** organizes data in rows and columns of tables. Tables are related with special columns, called keys, and queried with Structured Query Language (SQL). Ex: The figure below illustrates two tables in a relational database.

##### **Relational Database Example**

**Country**

CountryCode	CountryName	SurfaceArea	ContinentName

JPN	Japan	377829	Asia

ITA	Italy	301316	Europe

ECU	Ecuador	283561	South America

---

**City**

CityName	CountryCode	Population

Osaka	JPN	2595674

Kyoto	JPN	1461974

Rome	ITA	2643581

Verona	ITA	255268

Naples	ITA	1002619

Quito	ECU	1573458

---

**NoSQL**  is a type of database that stores information in a non-tabular format. In particular, the file format of the example below is called JSON, which stores objects in attribute-value pairs and arrays. This example is non-tabular, which cannot be a relational database.


In [3]:
%%javascript


// Define the JSON data
var jsonData = {
    "Countries": [
        {
            "CountryCode": "JPN",
            "CountryName": "Japan",
            "SurfaceArea": 377829,
            "ContinentName": "Asia",
            "Cities": [
                {"CityName": "Osaka", "Population": 2595674 },
                {"CityName": "Kyoto", "Population": 1461974 }
            ]
        },
        {
            "CountryCode": "ITA",
            "CountryName": "Italy",
            "SurfaceArea": 301316,
            "ContinentName": "Europe",
            "Cities": [
                {"CityName": "Rome", "Population": 2643581 },
                {"CityName": "Verona", "Population": 255268 },
                {"CityName": "Naples", "Population": 1002619 }
            ]
        },
        {
            "CountryCode": "ECU",
            "CountryName": "Ecuador",
            "SurfaceArea": 283561,
            "ContinentName": "South America",
            "Cities": [
                {"CityName": "Quito", "Population": 1573458 }
            ]
        }
    ]
};

// Access and manipulate JSON data
console.log(jsonData.Countries[0].CountryName); // Output: Japan
console.log(jsonData.Countries[1].Cities[0].CityName); // Output: Rome


<IPython.core.display.Javascript object>

In the example above, jsonData represents the JSON object. You can access its properties and values using dot notation or square bracket notation. JavaScript provides methods to parse JSON strings into JavaScript objects (JSON.parse()) and convert JavaScript objects into JSON strings (JSON.stringify()), making it easy to work with JSON data in JavaScript applications.

#### **Keys**
A **unique column** never contains duplicate values in different rows. Each value appears in one row only, regardless of inserts and updates to the table.

Every table has a primary key. A **primary key** is a column, or group of columns, that identifies individual rows. To ensure that each primary key value corresponds to exactly one row, primary keys must be unique and not NULL.

A **foreign key** is a column, or group of columns, that refer to a primary key. Foreign key values must either be NULL or match some value of the referenced primary key. This rule prevents references to non-existent rows. Unlike primary keys, foreign keys are not necessarily unique.

The data types of the foreign and primary keys must be the same, but the names may be different. A foreign key is usually in a different table than the referenced primary key, but may be in the same table.

In table diagrams, the primary key is marked by a solid circle (●) and appears as the leftmost column of the table . A foreign key is marked by an empty circle (○) and an arrow leading to the referenced primary key.

**Keys Example:**
- Say there's two tables: "City" and "Country". Their primary keys are "● CityNumber" and "● CountryCode" respectively. The "City" Table has a column of "○ CountryCode", which is a foreign key given it refers to the primary key of the "Country" table.

#### **Structured Query Language**
**Structured Query Language (SQL)** is a computer language for manipulating and retrieving data. SQL is the standard language for relational databases and is also supported by many non-relational databases. SQL is pronounced either "S-Q-L " or "seekwəl ".

SQL consists of language elements:

**SELECT CountryName**

**FROM COUNTRY**

-- WHERE SurfaceArea > 200000 (This is A comment)

**WHERE SurfaceArea > 300000;**

- A **keyword** is a reserved word with special meaning.
    - SELECT, FROM, and WHERE are keywords.
    - 

   
- A **literal** is a fixed numeric or text value. Text literals are enclosed in single or double quotes.
    - 30000 is a numeric literal.
    - 


- An **identifier** is a user-defined name for a table, column, database, and so on.
    - CountryName, Country, and SurfaceArea are identifiers.
    - 


- A **clause** groups a keyword with identifiers and expressions.
    - Each clause appears on a separate line.
    - 


- A **statement** is a complete database action, consisting of one or more clauses and ending with a semicolon.
    - The three lines end with a semicolon and form a statement.
    - 


In most database systems, keywords and identifiers are not case-sensitive. Ex: SELECT and select are equivalent.

SQL ignores white space, such as line breaks. Although a statement can be formatted on one line, each clause usually appears on a separate line for clarity.

Some statements create and drop tables and indexes, while others retrieve, insert, update, and delete data. This material emphasizes statements that retrieve data, called **queries**.