# The field of Data Science and popular techniques for working with data: 

<hr>

## Traditional Data 

Traditional data refers to the types of data organizations used and managed before the rise of modern large-scale, high-velocity digital data sources.

<hr>

### Techniques for Working with **Traditional Data**

<br>

<span style="color: lightgreen;">Raw Data → Processing → Information</span>

<br><hr style="display: flex; width: 50%; margin: auto;"><br>

## **1. Raw Data / Data Collection**

Raw data is also called:  
- **raw facts**  
- **primary data**

Raw data is **untouched**, **unprocessed**, and cannot yet be analysed.  
It refers to any data captured from operational systems and stored on a server.

Common sources of traditional data include:  
- Transaction logs  
- Spreadsheets  
- Operational databases (SQL)  
- Surveys and forms  
- Enterprise systems (ERP, CRM)  

Raw data **must be prepared** before meaningful analysis can occur.

---

## **2. Data Processing (Pre-processing)**

Data processing converts raw data into a structured, analysable form.

### **2.1 Data Pre-processing Operations**
- **Class labelling**  
  Assigning correct data types (numeric, categorical, text, date).
  
- **Data cleansing / cleaning / scrubbing**  
  Fixing inconsistencies, correcting errors, removing duplicates.

- **Handling missing values**  
  - deletion  
  - imputation (mean, median, mode)  
  - interpolation (time-series)  

- **Normalization / Standardization**  
  Scaling numeric values to comparable ranges.

- **Data transformation**  
  Converting data into new forms (e.g., extracting month from a date).

- **Data integration**  
  Combining data from multiple traditional systems into one dataset.

### **2.2 Balancing (Use Case)**
Example: survey with **80% females / 20% males**  
→ apply balancing to achieve a 50/50 or representative distribution.

Why?  
- Reduces bias  
- Improves fairness in modelling  
- Ensures more accurate interpretations  

### **2.3 Data Shuffling**
Randomizing the order of data records helps:  
- prevent unwanted patterns  
- improve predictive performance  
- avoid misleading statistical results  

---

Additional techniques used when working with traditional data 

### ✅ **Data Reduction**
Reducing the size of the dataset while keeping important information.  
Includes:  
- aggregation (monthly → yearly totals)  
- sampling  
- dimensionality reduction  

### ✅ **Data Sorting and Filtering**
Typical SQL/Excel operations:  
- sorting rows  
- filtering records by conditions  
- grouping data  

These are core *traditional* analytics techniques.

### ✅ **Data Validation**
Ensuring data is correct, consistent, and within expected ranges.  
Examples:  
- checking date formats  
- ensuring no negative values in “age” or “price” fields  

### ✅ **Data Storage & Retrieval**
Traditional data techniques require:  
- relational databases (SQL)  
- table structures  
- indexing  
- queries for retrieval  

### ✅ **Data Summarization**
Generating summaries such as:  
- averages  
- totals  
- frequency tables  
- descriptive statistics  

This is essential for turning processed data into **information**.

---

## **4. Outcome: Information**

After collection and processing, traditional data becomes **usable information** for:  
- descriptive analytics  
- reporting  
- dashboards  
- basic forecasting  
- decision-making  

---
