# 📊 Understanding Data

## 📌 What is Data?
Data are raw facts that have not been processed to explain their meaning.

---

## 🔹 Types of Data
There are three main types of data:
1. **📑 Structured Data**
2. **📂 Unstructured Data**
3. **📃 Semi-structured Data**

---

## 1️⃣ Structured Data 📑
Structured data is organized and stored in a well-defined format, often using relational databases. It follows a pre-defined data model and is easy to search, retrieve, and analyze.

### ✅ Characteristics of Structured Data:
- Stored in a **tabular format** (rows and columns).
- Clearly defined and organized.
- Data is stored in **SQL databases or Excel files**.
- **Rows and columns** are related to each other, providing a proper view and understanding of the data.
- **Relational Database Management Systems (RDBMS)** are used to manage structured data.
- Structured data is stored in **relational databases**.

### 📌 What is Tabular Format?
Tabular format refers to the organization of data in rows and columns, where:
- **📌 Rows** represent individual records.
- **📌 Columns** define the attributes of the records.

---

## 2️⃣ Unstructured Data 📂
Unstructured data does not have a predefined format or model. It is irregular, ambiguous, and harder to analyze compared to structured data.

### ✅ Characteristics of Unstructured Data:
- ❌ No predefined structure.
- ❌ No data model.
- ❌ Data is irregular and ambiguous.
- ✅ Easiest to extract data from.
- 🔹 **80% - 90%** of data is unstructured.
- 🔹 Includes **text, numbers, audio, video, images, messages, social media posts**.
- 📌 Examples: **Facebook, Instagram, YouTube**.

### 📊 Example: Surveys
Survey data can be unstructured. For instance:
- Some survey questions require multiple-choice answers.
- Others require open-ended responses (e.g., *"How does coffee make you feel? Please elaborate."*). This is unstructured data.

### ⚠️ Challenges of Unstructured Data:
- **🛠 Complex to analyze** compared to structured data.
- Previously, only structured data was extensively used.
- However, with the help of **🤖 Artificial Intelligence (AI)**, unstructured data is now widely utilized.
- 📌 Example: **Face recognition technology used by Google** relies on unstructured data.

### 🗄 Storage of Unstructured Data:
Unstructured data is often stored in a **data lake**.

#### 💾 What is a Data Lake?
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. Unlike structured data storage systems like relational databases, a data lake can handle **unstructured, semi-structured, and structured data**.

---

## 3️⃣ Semi-structured Data 📃
Semi-structured data falls between structured and unstructured data. It has elements of both.

### ✅ Characteristics of Semi-structured Data:
- It is a combination of structured and unstructured data.
- Some organization exists, but it does not fit rigidly into a tabular format.

### 📌 Examples of Semi-structured Data:
- 📧 **Emails** (contains structured metadata like sender and timestamp, but unstructured content in the body).
- 📜 **XML files**.
- 🌐 **World Wide Web (WWW)**.

### 🔹 What is XML?
XML (**eXtensible Markup Language**) is a markup language that defines rules for encoding documents in a format that is both **human-readable and machine-readable**. It is used to structure data in a way that can be easily shared across different systems.

---

## 🎯 Conclusion
Understanding the different types of data is essential for **data management and analysis**. With advancements in **AI and big data technologies**, both structured and unstructured data are becoming increasingly valuable for decision-making and insights generation. 🚀

