# Topic 0 ‚Äì Introduction to Big Data Analytics

This notebook introduces the fundamental ideas behind **Big Data Analytics**.

The goal is not to build models yet, but to understand:
- what Big Data is,
- why analytics is needed,
- how raw data is transformed into business value.

This notebook connects theory with practice and prepares us for working with real datasets.

## Big Data in One Picture

Raw Data ‚Üí Information ‚Üí Knowledge ‚Üí Business Action

Big Data Analytics is the process that moves data along this chain.
Without analytics, raw data has no practical value for decision-making.


## Big Data Lifecycle

Big Data follows a well-defined lifecycle:

1. **Data Generation**
   Data is produced by users, devices, sensors, systems, and applications.

2. **Data Acquisition**
   Relevant data is selected, cleaned, filtered, and preprocessed.

3. **Data Storage**
   Data is stored persistently using appropriate technologies.

4. **Data Analytics**
   Statistical and machine learning methods are applied to extract insights.

5. **Data Visualization & Interpretation**
   Results are evaluated, interpreted, and presented to support decisions.


## Why We Need Analytics

Raw data has **no intrinsic value**.

Value appears only when data is:
- analyzed,
- interpreted,
- transformed into actionable insight.

Analytics exists to support:
- business decisions,
- management strategies,
- process optimization.


In [1]:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print("Python version:", sys.version)
print("NumPy version:", np.__version__)
print("Pandas version:", pd.__version__)

# Purpose of this cell:
# - verify the environment works correctly

Python version: 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:37:03) [MSC v.1929 64 bit (AMD64)]
NumPy version: 2.1.3
Pandas version: 2.2.3


## What Is Structured Data?

Structured data is organized in a **tabular form**:

- rows represent **observations**
- columns represent **features (variables)**

If a value is missing, it is represented as **NaN** (Not a Number).

Most analytical methods in this course work on structured data,
typically stored in **CSV files**.


In [2]:
data = {
    "Age": [25, 32, 40, None],
    "Salary": [3000, 4200, 5200, 6100]
}

df = pd.DataFrame(data)
df

# What to notice
# - rows = individual observations
# - columns = features
# - missing values appear automatically as NaN

Unnamed: 0,Age,Salary
0,25.0,3000
1,32.0,4200
2,40.0,5200
3,,6100


## Types of Analytics

There are three main types of analytics:

- **Descriptive analytics**
  Answers: *What happened?*

- **Predictive analytics**
  Answers: *What will happen?*

- **Prescriptive analytics**
  Answers: *What should we do?*

Each type builds on the previous one.


## Business Examples of Big Data Analytics

- **Human Resources (HR)**
  Recruitment, employee retention, career management.

- **Telecommunications**
  Churn prediction, upselling, network optimization.

- **Business Process Management (BPM)**
  Process monitoring, optimization, and automation.


## What Comes Next?

The next step is working with **real data**.

We will:
- load CSV files,
- explore datasets,
- understand how data is represented in pandas.

üìÅ This will be done using the `datasets/` folder.

**From concepts ‚Üí real data**
