# Big Data Evolution - Introduction

## Introduction

This notebook covers the historical evolution of data processing from the early days of business computing to the emergence of the Big Data problem.

## What You'll Learn

- The beginning of business data processing with COBOL
- The RDBMS revolution and its impact
- The Internet and Mobile Revolution
- The Big Data Problem (3Vs: Variety, Volume, Velocity)


## Beginning of Business Data Processing

### COBOL (Common Business-Oriented Language)

**COBOL** was developed in **1959** by the Conference on Data Systems Languages (CODASYL) to create a universal programming language for business applications.

**Key Characteristics:**
- Designed for business and finance applications
- Used by companies and government organizations
- Allowed efficient data storage in files
- Enabled creation of index files for fast data access
- Processed data efficiently on mainframe computers
- Widely used for large-scale batch and transaction processing jobs
- English-like syntax for readability
- Portable across different computer systems

**Impact:**
- Dominated business computing for decades
- Set the foundation for structured business data processing


## Revolution in Business Data Processing

### RDBMS (Relational Database Management Systems)

In the **1970s and 1980s**, Relational Database Management Systems revolutionized business data processing. Popular RDBMS systems include:

- **MySQL** (1995)
- **Oracle Database** (1979)
- **Microsoft SQL Server** (1989)
- **PostgreSQL** (1996)

### What Do RDBMS Offer?

**1. SQL (Structured Query Language)**
- Easy-to-use data query language
- Standardized language for data manipulation
- Declarative syntax (what you want, not how to get it)

**2. Scripting Languages**
- **PL/SQL** (Oracle)
- **T-SQL** (Microsoft SQL Server)
- **pgPL/SQL** (PostgreSQL)
- Allows procedural programming within the database

**3. Programming Language Interfaces**
- **JDBC** (Java Database Connectivity)
- **ODBC** (Open Database Connectivity)
- Enables integration with various programming languages (Java, Python, C++, etc.)

### Data Structure in RDBMS

**Structured Data:**
- **Table**: Organized in rows and columns
- **File**: Can also be structured as rows and columns (CSV, fixed-width files)
- Schema-on-write: Data structure defined before data is stored


## Internet and Mobile Revolution

The advent of the **Internet and Mobile Revolution** in the 1990s and 2000s brought about a dramatic change in data types and volumes.

### Evolution of Data Types

**1. Structured Data**
- Traditional table format (rows and columns)
- Examples: Database tables, CSV files, Excel spreadsheets

**2. Semi-Structured Data**
- **JSON** (JavaScript Object Notation): Lightweight data interchange format
- **XML** (eXtensible Markup Language): Markup language for data representation
- **YAML**: Human-readable data serialization
- Examples: Web APIs, configuration files, log files

**3. Unstructured Data**
- **Text**: Documents, emails, social media posts
- **Documents**: PDFs, Word documents
- **Images**: Photos, graphics, medical images
- **Videos**: Movies, surveillance footage, streaming content
- **Audio**: Music, podcasts, voice recordings

### Impact

- Data generation exploded exponentially
- Traditional RDBMS systems struggled to handle new data types
- Need for new storage and processing paradigms emerged


## The Big Data Problem

Traditional RDBMS systems **failed to handle the Big Data problem**, which is characterized by the **3 Vs**:

### 1. Variety
- **Challenge**: Multiple data types and formats
- **Examples**: Structured (tables), semi-structured (JSON, XML), unstructured (text, images, videos)
- **RDBMS Limitation**: Designed primarily for structured data

### 2. Volume
- **Challenge**: Massive amounts of data
- **Scale**: Terabytes, petabytes, and exabytes of data
- **RDBMS Limitation**: Vertical scaling limitations, storage constraints

### 3. Velocity
- **Challenge**: High-speed data generation and processing requirements
- **Examples**: Real-time streaming data, high-frequency transactions, IoT sensors
- **RDBMS Limitation**: Not optimized for high-velocity data ingestion and processing

### Why RDBMS Failed

| Limitation | Description |
|------------|-------------|
| **Schema Rigidity** | Schema-on-write model doesn't accommodate diverse data types |
| **Vertical Scaling** | Limited by single machine resources (CPU, RAM, Disk) |
| **Cost** | Expensive to scale vertically with high-end hardware |
| **Processing Speed** | Not optimized for distributed processing |
| **Data Types** | Primarily handles structured data |
