You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Structured data is data whose elements are addressable for effective analysis. It has been organized into a formatted repository that is typically a database. It concerns all data which can be stored in database SQL in a table with rows and columns. They have relational keys and can easily be mapped into pre-designed fields. Today, those data are most processed in the development and simplest way to manage information. Example: Relational data.
Semi-structured data is information that does not reside in a relational database but that has some organizational properties that make it easier to analyze. With some processes, you can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data.
Unstructured data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.
Architectures
Event-driven
Data-driven
Differences
Database (Relational)
Designed to capture and record data (OLTP)
Live, real-time data
Data stored in tables with rows and columns
Data is highly detailed
Flexible schema (how the data is orfanized)
Data Warehouse (Relational)
Designed to analytical processing (OLAP)
Data is refreshed from source systems - stores current and historical
Data is summarized
Rigid schema (how the data is orgnaizaed)
Key Differences (Relational)
Databases are designed for transactions, Data Warehouse are designed for analytics and reporting
Databases data is fresh and detailed, Data Warehouse data is refreshed periodically and is summarized
Databases work slowly for querying large amounts of data and can slow down transactional processes, Data Warehouse don't interfere with any processes and are generally faster
Database vs Data Warehouse
No one is better than the other
Depends on what kind of data you have an what you want to do with it
Company can have only one or all options
Data Lake
Designed to capture raw data (structured, semi-structured, unstructured)
Made for large amounts of data
Used for ML and AI in its current state or for Analytics with processing
Can organize and put into Databases or Data Warehouses