<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_01_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Starship SQL: Navigating Data and Databases on the Enterprise

Set your phasers to "SQL" ("sequel") and prepare to beam up to the Star-Ship Enterprise as we embark on an incredible journey of exploring the fascinating realm of data! Just as the Enterprise voyages across vast galaxies, encountering countless civilizations, we'll traverse the data universe, discovering various forms of data and understanding how to work with them. (In other words, we're going to be learning about data with some Star Trek references. I promise you that you don't need to know anything about the show! :)).

To truly appreciate this journey, we need to first understand the essence of data itself, and the the vital role in our decision-making process. It provides us with the necessary information and knowledge that equips us for any challenge that might lie ahead. In our first section, we'll delve into the fundamental definition of data, its relationship to information and knowledge, and why it is so vital in today's world.

As our journey progresses, we'll encounter different types of data, each with its own unique characteristics. You'll learn about the difference between structured and unstructured data, similar to how Captain Kirk must navigate through calm space versus the chaos of a wormhole. We'll see real-world examples of these data types to make these abstract concepts more tangible.

Next, we'll venture into the world of databases, the treasure troves of data that are as crucial to data analysts as the ship's computer is to the Enterprise crew. We'll compare and contrast databases with flat files, helping you understand their different uses and benefits.  From there, we'll dive deeper into the database universe, exploring the differences between relational and non-relational databases.

Our journey doesn't end there. We'll encounter various types of data that we may need to store - Date, Numeric, Alphanumeric, Currency, Text, and more. We'll talk about discrete versus continuous data and discuss categorical or dimension data. We'll also touch upon how to handle more complex data types like images, audio, and video files.

Finally, you'll then get a chance to step into the shoes of an Enterprise officer, exploring the Star-Ship Enterprise Crew Data table. By the end of this journey, you'll have a "stellar" start to your study of data.

## What's Up With All the Weird Case Studies?

If you've taken a look ahead in this book, you may have noticed that some of our case studies involve topics that may seem, well, a bit out of this world. Yes, we'll be analyzing data from the Starship Enterprise, cataloguing spells at Hogwarts, managing Tony Stark's Superhero Database, and even organizing the library at Unseen University in Discworld. But before you start wondering if you've picked up a sci-fi or fantasy novel instead of a database textbook, let me explain the method to my madness.

First and foremost, these case studies are intended to be engaging and entertaining. The last thing I want is for you to be bored to tears as you learn SQL or about database design. Learning should be fun, and by embedding real database concepts in fantastical worlds, I hope to make these complex subjects more approachable and enjoyable.

Beyond the fun factor, these case studies offer a broad range of complexities and unique situations that will encourage you to think outside the box and develop problem-solving skills. For example, consider the challenge of categorizing the arcane and chaotic collections of the Unseen University library. Or ponder the logistics of managing a web shop as eccentric as Wednesday's Addams Web Shop. These situations force us to reconsider assumptions, push boundaries, and be creative, all of which are vital skills in the world of databases and beyond.

More concretely, each case study is carefully designed to highlight specific database principles. Tony Stark's Superhero Database, with its extensive data on numbers, strings, and dates, provides a great environment for exploring SQL operations. The vast amount of user-generated content on Goodreads and the myriad relationships among films, actors, and directors on IMDB are perfect for examining SQL aggregate queries and joins. The complexities of the Hogwarts School of Witchcraft and Wizardry and Dunder Mifflin from "The Office" show us why advanced design techniques, like normalization and inheritance models, are necessary in database management.

Moreover, I've also included some real data sets, such as those from Goodreads and IMDB. Working with real-world data is an essential experience for anyone studying databases, and these data sets give you that opportunity while also showcasing how the concepts you're learning apply to real-world problems.

In sum, these case studies are not just flights of fancy. They are carefully chosen and constructed teaching tools that allow us to apply complex concepts in a fun, engaging, and meaningful way. As you travel through each of these fantastical and real worlds, you'll be gaining a strong understanding of databases and their many applications.

I hope you enjoy working through these case studies as much as I've enjoyed prepearing them. And please let me know if you have ideas for changes or impovements!

## Background to the Case Study: The Starship Enterprise

The Starship Enterprise is a vessel from the universe of "Star Trek," a popular science fiction franchise that began in the 1960s. It is a futuristic spacecraft from the 23rd century operated by Starfleet, an exploratory, scientific, and military service maintained by the United Federation of Planets.
The Enterprise is a jewel of future human engineering, a starship capable of faster-than-light travel, thanks to its powerful warp drive. It is a vast complex, akin to a floating city in space, complete with a variety of departments, including command, engineering, medical, science, and security.

Hundreds of crew members from various species and backgrounds work together aboard the Enterprise, each contributing their unique skills to the common goal of exploration and discovery. They perform a wide range of tasks, from navigating through uncharted territories, conducting scientific research, negotiating diplomatic relations with alien civilizations, and maintaining the ship's systems.

The Enterprise, despite being a fictional construct, serves as a valuable and vivid example of a massive, intricate system highly dependent on data. Just like any complex organization in the real world, the ship and its crew must deal with vast amounts of information to operate efficiently and navigate the challenges they encounter.  Here are a few ways in which data is essential aboard the Enterprise:

1.  Operations and Logistics: Data is crucial for daily operations, from determining the ship's course to managing its resources. Data helps in tracking the inventory of supplies, scheduling maintenance, predicting potential system failures, and optimizing energy usage.

2.  Scientific Research: The Enterprise is equipped with various scientific instruments to collect data, from sensors that scan for life forms to devices that analyze the chemical composition of an alien atmosphere. Scientists on board rely on this data to conduct their research and provide insights that aid in decision-making.

3.  Security and Safety: Data helps keep the ship and its crew safe. Security systems monitor access to sensitive areas, medical data tracks crew health, and sensors around the ship provide alerts about potential threats or hazards.

4.  Diplomatic and Cultural Understanding: As the Enterprise encounters different alien species, data about their languages, cultures, political systems, and histories is crucial for successful communication and diplomacy.

In the case of the Starship Enterprise, as with any large organization, data isn't just a byproduct of activities; it's an essential resource that, when managed and utilized effectively, enables the ship and its crew to achieve their mission of exploration and discovery.

For our case study, we'll delve into how this futuristic starship collects, stores, manages, and uses its data, providing a tangible context to understand the abstract concepts of data analysis. This exploration will help us understand the principles and techniques that are just as applicable in our current age of information, where data drives decisions in fields as diverse as business, healthcare, technology, and beyond.

## Data, Information and Knowledge

Just as the Enterprise relies on its sensors to gather information about its surroundings, we use data to understand our world. So, what exactly is data?

- **Data** can be described as raw, unprocessed, and uninterpreted facts or statistics collected for reference, analysis, or computation. It's like the raw materials you might find on an alien planet - not particularly useful in its original form, but with potential for something more.  

    Consider the Enterprise's sensor readings. The temperature, radiation levels, gravity fluctuations - all these individual readings are data. They are just unprocessed facts, numbers or measurements, lacking context and meaning.

- **Information**, on the other hand, is data that has been processed, organized, structured, or presented in a given context to make it meaningful. If data is the raw material, information is the finished product.

    For instance, interpreting the sensor data and finding out that there's a habitable planet nearby - that's information. It has context, it's meaningful, and it's something that the crew can act upon.

- **Knowledge** is information that has been processed by individuals and includes their personal understanding, experience, skills, or learned facts. It's a more comprehensive understanding of information and its implications.
    
    In our Star-Ship Enterprise analogy, knowledge would be the crew's understanding that the habitable planet could potentially be a new home for a species in need, based on their previous experiences and learned facts.

So, in summary, data is raw and unprocessed, information is processed data that is meaningful, and knowledge is the understanding or insight gained from that information. Each step adds more value and context, transforming meaningless figures into actionable insights. In our data analysis journey, we'll be doing much the same - transforming raw data into valuable knowledge.

## Structured vs Unstructured Data
In the vast expanse of the data universe, we encounter various forms of data. Two of the primary types of data that we come across are structured and unstructured data. Let's delve into what these terms mean, using both Star Trek examples and real-world instances.

**Structured Data** is data that adheres to a specific model – it's organized and formatted in a way that it's easily searchable by simple, straightforward search engine algorithms or other search operations. It typically includes data that resides in relational databases (tables with rows and columns), spreadsheets, and so on.

- Star Trek Example: Consider the ship's crew registry on the Enterprise. It's a structured set of data where each crew member's details (like name, rank, role, species, etc.) are recorded in a specific, predefined format.

- Real-World Example: A customer database of an online store, where details like customer ID, name, email, contact number, and address are stored in an organized manner.

**Unstructured Data** is data that doesn't fit into traditional row-column databases. It's not organized in a pre-defined manner or model, making it more difficult to collect, process, and analyze. Unstructured data includes things like text files, social media posts, emails, blog posts, audio files, video files, etc.

- Star Trek Example: The Captain's log is a perfect example of unstructured data. Captain Kirk's detailed narration of the ship's activities, challenges faced, and decisions made doesn't adhere to a specific, searchable format.

- Real-World Example: The tweets about a particular hashtag on Twitter are unstructured. They consist of a vast variety of data (text, images, links, emojis, etc.), making it challenging to sort and analyze without sophisticated tools.

The importance of differentiating between structured and unstructured data lies in how we analyze them. Structured data can be easily queried using standard SQL tools, while unstructured data requires more advanced techniques such as natural language processing, text analytics, or machine learning algorithms. Understanding these differences is crucial as we venture further into our data exploration journey.

## Databases and Flat Files
Imagine the Star-Ship Enterprise's vast computer system. It stores everything from star charts to crew biographies to historical records. This immense storage system, with its well-organized and searchable records, is quite similar to what we call a database in the realm of data.

A **database** is a structured set of data. It organizes data in a way that allows for efficient access, retrieval, and management. Databases can handle large amounts of data and support multiple simultaneous users. They also provide mechanisms to ensure data consistency and security.

Now, imagine if all the Enterprise's data were stored in one massive document, with information just listed one after another. Finding the needed information would be a nightmare! This is akin to a flat file. A **flat file** is a plain text or a binary file which contains records without structured relationships. Data in flat files are generally in the form of tables, where each line of the file represents a record and each record contains one or more fields (attributes or properties of the record). A typical example of flat file is a **spreadsheet** such as MS Excel or Google Docs.

The key differences between databases and flat files lie in their structure, complexity, and the operations that can be performed on them. Databases can establish relationships between data elements, enforce data integrity rules, and efficiently handle large volumes of data. On the other hand, flat files are simpler, not designed for performance, and lack the mechanisms to enforce data consistency. However, flat files (like word processing documents or spreadsheets) can be suitable for smaller, less complex datasets.

### Advantages of Databases
Databases provides a number of advantages over flat files
#### Reduce Redundancy:

**Redundancy** refers to the unnecessary repetition of data. Suppose we're managing crew data on the Starship Enterprise. We could keep all the crew data in one flat file, including their name, rank, department, species, and so forth.

Now, consider the 'department' data. There are several departments on the ship, like Command, Engineering, and Medical, each with multiple crew members. In a flat file, you would have to repeat the department information for every crew member in that department, leading to *data redundancy*.

On the other hand, a database could have separate tables for 'crew members' and 'departments.' Each crew member in the 'crew members' table would then have a department ID that links to the 'departments' table. This setup removes redundancy because each department's information is stored just once in the 'departments' table, not repeated for every crew member.

#### Avoid Anomalies:

**Anomalies** are problems that can occur when adding, updating, or deleting data. Staying with the Starship Enterprise example, suppose Chief Medical Officer Dr. McCoy leaves the ship. In a flat file, you would have to find and delete every record containing his data. If you forget one, you end up with an *anomaly* -- Dr. McCoy is both present and not present in your records.  In contrast, a database would store Dr. McCoy's details in one place. Deleting him from the database would involve removing just that single record, thereby avoiding anomalies.

#### Maintain Integrity:

**Integrity** refers to the accuracy and consistency of data. In our Starship Enterprise flat file, someone could accidentally enter a typo for a department name (e.g., 'Engeneering' instead of 'Engineering'). This error would lead to inconsistencies in the data.  In a database, however, we could maintain **referential integrity** by linking crew members to departments using department IDs rather than names. Even if someone makes a typo while entering a department's name, every crew member linked to that department would still correctly refer to the department ID.

#### Independence of Data:

**Independence** is the separation between the data and the applications that use the data. For instance, suppose we decide to add a new piece of data for each Starship Enterprise crew member, like 'date of joining.' In a flat file, this change could affect all applications using this data, because they are not *independent*.  In a database, though, we can add a new column to the crew members table for 'date of joining.' Applications that don't need this data can continue to function without change, demonstrating *data independence*.


## Databases vs Flat Files: Use Cases
Both flat files and databases are used in numerous contexts in the real world, each serving their own unique purpose. Let's dive into some examples and see why one might be chosen over the other.

### Flat Files:

1.  Configuration Files: Many software applications use flat files for storing configuration settings that the application will use when it runs. These files don't require complex querying or relationships, so a flat file is suitable.

2.  Data Interchange: Flat files are often used for data interchange between systems (exporting and importing data) because they're simple and virtually all systems can handle these file types.

3.  Simple Data Storage: If you're writing a simple application that just needs to store a list of items (e.g., a to-do list app), a flat file could be a quick and easy solution.

### Databases:

1.  Banking Systems: Banks need to maintain secure, consistent, and fast access to customer data. This includes account details, transaction history, and more. A database system, with its robust data management capabilities, is ideal for this.

2.  Online Retailers: Companies like Amazon use databases to manage vast amounts of customer and product data, and to support complex queries and transactions.

3.  Social Media Platforms: Platforms like Facebook use databases to store user profiles, friendship relationships, posts, comments, and other data. The complex, interrelated nature of this data makes a database system the right tool for the job.

The decision to use a flat file or a database often comes down to the specific requirements of the situation. Here are some key considerations:

1.  Complexity and Volume of Data: As mentioned, databases handle complex, large-scale data better than flat files. If you need to manage significant amounts of interrelated data, a database is generally the way to go.

2.  Performance: Databases are typically faster and more efficient when dealing with large amounts of data. If performance is a concern, a database may be a better choice.

3.  Consistency and Integrity: If you need to ensure that your data remains consistent (no duplication or contradiction), databases offer mechanisms to maintain data integrity.

4.  Security: Databases typically provide more robust security features than flat files.

5.  Cost and Simplicity: On the other hand, if your data needs are simple, a flat file could be easier and more cost-effective to set up and maintain.

By understanding these considerations, you can make informed decisions about whether to use a flat file or a database in a given situation.

## Data Types
Data comes in many forms and types, each with their own characteristics and uses, much like the diverse species, cultures, and technologies the Enterprise encounters in its voyages. Let's explore some of these data types:

Got it, let's go through the data types along with both Star Trek and real-world examples.

1.  Date: This data type is used to represent date and time. It can store year, month, day, hour, minute, and second.

    -   *Star Trek Example:* The Starfleet records the stardate for each significant event. Stardate is a date system in Star Trek, representing the time in the Star Trek universe.
    -   *Real-World Example:* Airlines use the date data type to store departure and arrival times of flights.
2.  Numeric: Numeric data types are used to store numerical values. They can be further categorized into integers (whole numbers), floating-point numbers (numbers with decimal points), and complex numbers (numbers with real and imaginary parts).

    -   *Star Trek Example:* The warp speed of the Enterprise would be a numeric value, representing the velocity of the spaceship.
    -   *Real-World Example:* Banks use numeric data types to store the balance in each customer's account.
3.  Alphanumeric: Alphanumeric data types consist of both numbers and letters. These are typically used for identifiers, such as product codes or vehicle VIN numbers.

    -   *Star Trek Example:* Each Starfleet officer could have an alphanumeric code, consisting of their initials and a numeric identifier (e.g., 'JTK1701' for James T. Kirk).
    -   *Real-World Example:* Vehicle identification numbers (VINs) in a car dealership's database would be alphanumeric.
4.  Currency: The currency data type is used to store monetary values. It helps in accurately storing and calculating financial data.

    -   *Star Trek Example:* Although the Federation doesn't use money, other cultures do. For example, the Ferengi's unit of currency is the Gold-Pressed Latinum, which would be stored as a 'currency' data type in their financial systems.
    -   *Real-World Example:* E-commerce websites use the currency data type to accurately store the price of each item they sell.

5.  Text: Text data types are used for storing letters, words, sentences, or paragraphs. This could include anything from a person's name to the content of a book.

    -   *Star Trek Example:* The Captain's log entries would be stored as 'text' data type. These logs record the events and personal reflections of the ship's captain.
    -   *Real-World Example:* A blog post on a website would be stored as 'text' data, including the title, author, and the content of the post.

6.  Discrete vs. Continuous Data: Discrete data is countable and distinct. Continuous data, on the other hand, can take on any value within a certain range.

    -   *Star Trek Example:* The number of crew members on the Enterprise is a discrete value, while the warp speed of the spaceship is a continuous value - it can be any value within the engine's operational range.
    -   *Real-World Example:* The number of employees in a company is a discrete value, whereas a person's weight is a continuous value that can vary within a range.

7.  Categorical/Dimension: Categorical or dimension data types are used to represent characteristics such as gender, ethnicity, or hair color. They typically have a finite number of categories or groups.

    -   *Star Trek Example:* Species of a Starfleet crew member (Human, Vulcan, Andorian, etc.) would be a categorical data type.
    -   *Real-World Example:* In a survey, the respondent's choice of answers (for example, "very satisfied", "somewhat satisfied", "neutral", "somewhat dissatisfied", and "very dissatisfied") would be stored as categorical data.

8.  Images, Audio, Video: These are complex data types that store non-text information. Images are used to store visual data, audio for sound, and video for moving pictures.

    -   *Star Trek Example:* The visual and auditory records of a First Contact event with a new species would be stored as image and audio data types, respectively.
    -   *Real-World Example:* A music streaming service like Spotify stores songs as audio data, while a platform like YouTube stores content as video data.

Understanding these different types of data is crucial as it impacts how data can be stored, processed, and analyzed to derive meaningful insights.

## A Little History of Databases
The history of databases is an exciting journey that mirrors the progress of computer technology itself. (Note: See the glossary at the end of this chapter for more detail on unfamiliar terms).

### 1960s -- Hierarchical and Network Databases:

The database concept emerged in the late 1960s with hierarchical and network models. IBM's Information Management System (IMS) is an example of a **hierarchical database**. It was used for NASA's Apollo space program. This database model resembled a tree structure where each record had one parent record and one or many children records. **Network databases**, such as the Integrated Data Store (IDS) developed by Charles Bachman, were similar but allowed each child to have multiple parents, forming a web-like structure. These models allowed for **physical independence** of data (meaning the "content" of the data was conceptually seperate from the "physical medium in which it was stored).

### 1970s -- Relational Databases:

The **relational database model** was proposed by E.F. Codd of IBM in 1970, a revolutionary step that made database manipulation much more accessible. In this model, data is organized in tables with rows and columns, and relationships are established through primary and foreign keys. **SQL (Structured Query Language)** was developed as a standard language to interact with relational databases. Relational databases have remained the industry standard until this day. The most popular variants include:

1. **Proprietary** enterprise-scale databases like **Oracle Database**, **Microsoft SQL Server**, and **IBM DB2**. These dominate specialize in handling large numbers of concurrent users and vast quantities of data.
2. **Open-source** alternatives like **PostgreSQL** and **MySQL**. These provide much of the same functionalities as the previous category, but their source code can be viewed (and altered) by the public.
3. Speciality databases like **SQLite** (used "internally" by many computer applications) and **Microsoft Access** (a "personal database" for individuals or small businesses.

### 1980s -- Object-Oriented Databases:

The 1980s saw the rise of **object-oriented databases**, which stored data as objects, just like in object-oriented programming languages. These databases were designed to handle complex data types, such as multimedia, more effectively.

### 1990s -- Online Processing, Data Warehousing, and OLAP:

The 1990s were marked by the development of **Online Transaction Processing (OLTP)**, which facilitated real-time, reliable transactions. Also, the concepts of **Data Warehousing** and **Online Analytical Processing (OLAP)** arose, enabling the efficient analysis of vast amounts of data for business intelligence.

### Late 1990s to Early 2000s -- NoSQL Databases:

NoSQL databases emerged as an alternative to the relational model, especially for handling large-scale, distributed data. NoSQL databases include various types, such as **document databases**, **key-value stores**, **wide-column stores**, and **graph databases**. They excel in areas where relational databases may fall short, like scalability and handling unstructured data.

### 2010s -- NewSQL and Cloud Databases:

NewSQL databases attempt to combine the ACID (Atomicity, Consistency, Isolation, Durability) properties of SQL databases with the scalability of NoSQL. Additionally, cloud databases became popular with the rise of cloud computing, providing scalable, reliable, and cost-effective database solutions.

### Today:

Currently, we're in an era where databases are more varied and specialized than ever. From in-memory databases to time-series databases, to databases tailored for machine learning -- there's a solution for almost every need. At the same time, databases are becoming more user-friendly, enabling more people to harness the power of data in their work.

## An Overview of Data-Related Roles
Finally, let's explore the various roles people might have when working with data, in both the context of the Starship Enterprise and our real world.

- **Data Engineers** are like the Scottys of the Starship Enterprise. They keep the engines running, but instead of warp drives, they work on data infrastructure. They design, build, and maintain the systems that allow data to be stored, processed, and queried effectively.

    Tasks they might handle include setting up data pipelines to pull in data from various sources (like sensors across the Enterprise), ensuring data is clean and correctly formatted (making sure the sensors are calibrated and working correctly), and optimizing database performance (making sure the computer can quickly pull up whatever data Captain Kirk needs).

    In the real world, data engineers might set up data pipelines from a company's website, app, or other data sources. They might clean and format data, handle database optimization, and manage data storage and retrieval systems.


- **Data Analysts** are the Spocks of the Enterprise. They take the data that the data engineers have prepared and analyze it, looking for patterns, trends, and insights. They might run queries to find out how often the ship has encountered a specific type of anomaly or use statistical analysis to predict when the warp core might need maintenance.

    In the real world, data analysts might analyze sales data to find out which products are most popular, investigate user behavior data to see how people are using an app, or crunch financial data to predict future costs and revenues.

- **Data Stewards** are the keepers of the data. In the context of the Starship Enterprise, this might be a role for someone like Data himself. They are responsible for the data's quality, privacy, and security, making sure the data is accurate, up-to-date, and used responsibly. They might manage data from old systems, ensuring it's properly archived and still accessible if needed.

    In the real world, data stewards might handle tasks like ensuring data complies with privacy regulations, managing access controls to protect data from unauthorized access, and maintaining a data catalog to keep track of what data the company has, where it's stored, and how it can be used.

- **Database Administrators (DBAs)** are the professionals responsible for managing, improving, and ensuring the seamless operation of databases. On the Starship Enterprise, DBAs would be responsible for maintaining the ship's database, ensuring it can handle the queries coming in from crew members, optimizing its performance, and keeping the data safe and secure.

    In real life, DBAs manage databases for businesses, government agencies, and other organizations. They play a critical role in ensuring that databases run efficiently, are secure from unauthorized access, and data is available to the users when they need it.

These roles often work together, each contributing their expertise to help the organization effectively use its data. And as data sources and technologies change, all these roles need to adapt, learning how to handle new types of data, using new tools and techniques, and integrating new data sources with the existing data infrastructure.  While the specific tasks each role handles can vary, the overall goal is the same: to turn data into knowledge that can help the organization (whether that's the Starship Enterprise or a real-world company) make better decisions.

## Case Study: Personell Files

The Enterprise currently stores its personell data in a "flat" file. Here is a sample:


| CrewID | FullName | Position | Species | DateOfBirth | Is_Active_Duty | LastMedicalCheckup | ServiceYears | MonthlyStipend | HomePlanetLatitude | HomePlanetLongitude |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | James T. Kirk | Captain | Human | 2233-03-22 | True | 2269-11-30 08:00:00 | 7.5 | 6500.00 | 42.907 | -83.697 |
| 2 | Spock | First Officer | Vulcan | 2230-01-06 | True | 2269-12-01 10:30:00 | 8.2 | 6200.00 | -31.952 | 115.861 |
| 3 | Leonard McCoy | Chief Medical Officer | Human | 2227-01-20 | True | 2269-11-30 11:00:00 | 6.9 | 6000.00 | 33.748 | -84.388 |
| 4 | Nyota Uhura | Communications Officer | Homo Sapien |2239-03-05 | True | 2269-12-01 13:30:00 | 5.3 | 5500.00 | -1.286 | 36.817 |
| 5 | Hikaru Sulu | Helmsman | Human | 2237-06-22 | True | 2269-12-01 15:00:00 | 5.1 | 5400.00 | 34.052 | -118.243 |
| 6 | Montgomery Scott | Chief Engineer | Human | 2222-08-14 | True | 2269-11-30 14:30:00 | 9.7 | 6350.00 | 55.953 | -3.188 |
| 7 | Pavel Chekov | Navigator | Human | 2245-02-19 | True | 2269-12-01 16:00:00 | 3.4 | 5100.00 | 55.755 | 37.617 |
| 8 | Worf | Security Officer | Klingon | 2340-12-09 | False | 2269-12-02 08:00:00 | 2.6 | 5000.00 | 49.282 | -123.121 |
| 9 | Deanna Troi | Counselor | Betazoid | 2336-03-29 | False | 2269-11-30 09:30:00 | 2.1 | 4900.00 | 40.712 | -74.006 |
| 10 | Data | Second Officer | Android | 2336-02-02 | True | 2269-12-02 10:00:00 | 4.2 | 5850.00 | 40.712 | -74.006 |
| 11 | Seven of Nine | Astrometrics Officer | Human-Cyborg | 2348-06-24 | True | 2269-12-03 09:00:00 | 3.7 | 5700.00 | 51.509 | -0.125 |
| 12 | Jadzia Dax | Science Officer | Trill | 2341-10-27 | False | 2269-12-02 13:30:00 | 3.1 | 5500.00 | 47.606 | -122.332 |
| 13 | Kira Nerys | 1st Officer | Bajoran | 2343-02-19 | True | 2269-12-03 15:00:00 | 3.9 | 5600.00 | 35.689 | 139.692 |
| 14 | Tasha Yar | Security Chief | Human Being | 2337-03-03 | False | 2269-12-04 10:00:00 | 2.8 | 5300.00 | 48.856 | 2.352 |
| 15 | Guinan | Bartender | El-Aurian | Unknown | True | 2269-12-04 11:30:00 | Unknown | 5200.00 | -34.928 | 138.601 |

## Discussion Questions: Personell File
1.  DataType Discussion: Look at the 'MonthlyStipend' column. Why do you think it is important to store this information as a decimal (or currency) data type rather than an integer or a float?

2.  Table Expansion: If we were to include information about the crew members' educational background, what additional columns might we need to add? What data types would these columns be, and why?

3.  Database vs Flat Files: Given the various roles and responsibilities aboard the Starship Enterprise, why might a database be more useful than a flat file for managing this crew data? What specific advantages does a database offer in this context?

4.  Real-World Application: Can you think of a real-world organization that might benefit from a similar database to manage their personnel? What additional data fields might they require?

5.  Boolean Usage: The 'Is_Active_Duty' column uses a Boolean data type. Why is this an effective choice for representing whether a crew member is on active duty or not?

6.  Datetime Utilization: How might the Starship Enterprise benefit from storing the 'LastMedicalCheckup' as a datetime data type rather than just a date? Provide some potential use cases.

7.  Unstructured Data Discussion: Suppose we wanted to store notes about each crew member's unique abilities or characteristics. How might we incorporate this unstructured data into our table?

8.  Data Normalization: Look at the 'Position' and 'Species' columns. Do you notice any redundancy (repeated values) or anamolies (inconsistencies)? Why might this cause problems?

9.  Data Integrity: Imagine a situation where a crew member's position changes, or they transfer to a different Starship. How can a relational database help maintain the integrity of our data in these situations?

10. Expanding Datatypes: We currently store 'HomePlanetLatitude' and 'HomePlanetLongitude' as separate columns. In more advanced databases, we might use a geographic data type. What advantages could this offer, and can you think of other scenarios where such a datatype would be useful?

## Glossary


| Term | Definition |
| --- | --- |
| Data | Raw, unprocessed facts or figures without context, like a list of random numbers. |
| Information | Data that has been processed or organized in a way that allows it to be useful or meaningful, such as a table showing the average temperature of each month. |
| Knowledge | Information that has been processed or interpreted, often involving relationships among multiple pieces of information, like knowing that sales increase when the temperature is warm based on past sales data and weather reports. |
| Structured Data | Data that is organized and formatted in a specific way, often into rows and columns like a table, making it easier to understand and process. |
| Unstructured Data | Data that lacks a specific format or organization, such as a blog post or an unlabeled photograph, which requires effort to process and understand. |
| Database | A structured set of data, stored in a system, that allows for easy access, manipulation, and retrieval, akin to a library with an index system. |
| Flat File | A file containing records that have no structured interrelationship, typically with a single table of data, like a text file containing a list of names. |
| Spreadsheet | A type of "flat file" that stores data in cells, arranged into rows and columns. MS Excel and Google Sheets are two examples. |
| Data Type | A classification identifying the kind of data a particular value is, like categorizing a value as numeric, text, or date/time. |
| Discrete Data | Numeric data that only takes distinct or separate values, typically integers or a finite set of options, like the number of students in a class. |
| Continuous Data | Data that can take any value within a certain range, like the speed of a car that could be any value within the car's capabilities. |
| Categorical Data | Data that can be divided into multiple groups or categories but doesn't have a numerical representation, like different flavors of ice cream. |
| Data Redundancy | The duplication of data in a database or other storage system, like storing the same customer's address in multiple tables. |
| Data Anomaly | An unusual or unexpected piece of data that deviates from the norm or pattern in a dataset, like having a customer's name spelled in two different ways in a datase. |
| Data Integrity | The accuracy, consistency, and reliability of data over its lifecycle, like ensuring a customer's contact information is correct and up-to-date. |
| Data Independence | The ability to modify the schema of a database without affecting the application layer that uses it, like adding a new table or column without breaking the existing application. |
| Relational Database | A type of database that organizes data into tables which can be linked to each other based on relationships, for example, a database with separate tables for 'customers' and 'orders' that are linked by 'customer_id'. |
| Structured Query Language (SQL) | A standardized programming language specifically used for managing and manipulating relational databases, like retrieving data from a table or inserting a new record into a database. |
| Object-Oriented Database | A database that uses the object-oriented model of data representation, storing data in the form of objects as in object-oriented programming languages, such as storing 'customer' objects with their associated properties and methods. |
| Online Transaction Processing (OLTP) | A system that supports the execution of day-to-day transactional tasks, like updating inventory records in a retail database after each sale. |
| Online Analytical Processing (OLAP) | A system designed for complex data analysis, data mining, and business reporting, like analyzing sales data over years to determine trends. |
| NoSQL Database | A non-relational database that allows for storage and retrieval of data not held in tabular relations, suited for large volumes of data or data lacking a clear schema, like storing social media posts or real-time game data. |
| Cloud Database | A database that runs on a cloud computing platform, which allows for scalability and accessibility from anywhere with an internet connection, like a database storing user data for a mobile app. |
| Proprietary Software | Software owned by an individual or a company, often requiring purchase or licensing for use, like Oracle Database, Microsoft SQL Server, or IBM DB2. |
| Open-Source Software | Software that is freely available for use, modification, and distribution, with its source code accessible, like PostgreSQL or SQLite. |
| Data Engineer | A professional who designs, builds, and manages the data infrastructure, preparing data for analytical or operational uses, like creating a pipeline to pull sales data into a warehouse for analysis. |
| Data Analyst | An individual who interprets complex datasets to help make informed decisions, like analyzing user behavior data to guide product development. |
| Data Steward | A role focused on data quality, security, privacy, and lifecycle management, like ensuring compliance with data privacy regulations and ensuring data is used properly in an organization. |
| Database Administrator | A professional responsible for the maintenance and performance of a database, protecting it against threats, and troubleshooting problems, like ensuring a company's databases remain available and performant. |
| Data Warehouse | A large collection of data (from many sources) that is used for analysis and reporting. For example, a retail company might use a data warehouse to track, analyze, and predict customer purchasing behavior over time.|