 

# 📊 **Analytics & Statistics Cheat Sheet**



## 🎯 **Goal**

Understand key terms, when to use them, and how they connect in analytics.



## **1. Core Metrics**

| Metric             | Use It For           | Notes                               |
| ------------------ | -------------------- | ----------------------------------- |
| **Mean**           | Clean numeric data   | True average; sensitive to outliers |
| **Median**         | Data with outliers   | Stable middle value                 |
| **Mode**           | Categorical data     | Most frequent value                 |
| **Variance**       | Measuring spread     | Average of squared deviations       |
| **Std. Deviation** | Spread in same units | √Variance — shows consistency       |
| **Range**          | Simple spread        | Max − Min                           |





## **2. Data Basics**

| Term           | Meaning                  | Example                   |
| -------------- | ------------------------ | ------------------------- |
| **Population** | Whole group              | All customers             |
| **Sample**     | Small part of population | 300 surveyed customers    |
| **Data**       | Facts collected          | Age, salary, gender, etc. |
| **Variable**   | What you measure         | Age, income, city         |





## **3. Why Sampling?**

* Studying full population = expensive & slow.
* Samples save time and cost.
* Good samples still reflect the population.





## **4. Sampling Types (Easy View)**

| Type              | How It Works                  | When to Use                           |
| ----------------- | ----------------------------- | ------------------------------------- |
| **Simple Random** | Equal chance for all          | Small, uniform population             |
| **Systematic**    | Pick every kᵗʰ record         | Data in order (e.g., every 10th)      |
| **Stratified**    | Divide by group → sample each | To keep proportions (gender, region)  |
| **Cluster**       | Pick a few full groups        | When population is large & spread out |
| **Convenience**   | Choose what’s easy            | Quick, but biased                     |
| **Snowball**      | Chain referrals               | Hard-to-reach groups                  |





## **5. Two Branches of Statistics**

| Type            | What It Does               | Example                         |
| --------------- | -------------------------- | ------------------------------- |
| **Descriptive** | Summarizes data            | Mean, charts, counts            |
| **Inferential** | Predicts population trends | A/B tests, confidence intervals |





## **6. Data Types**

### **A. Quantitative (Numeric)**

* **Discrete:** Countable (no decimals) → *students, cars*
* **Continuous:** Measurable (with decimals) → *height, salary*

### **B. Qualitative (Categorical)**

* **Nominal:** No order → *city, gender*
* **Ordinal:** Ordered → *grades, satisfaction level*





## **7. Central Tendency (Day 3 Notes Recap)**

| Measure    | Use When     | Meaning             |
| ---------- | ------------ | ------------------- |
| **Mean**   | No outliers  | Average value       |
| **Median** | Has outliers | Middle value        |
| **Mode**   | Categorical  | Most repeated value |

🧠 *If data is clean → Mean*
🧠 *If outliers exifst → Median*
🧠 *If categorical → Mode*




### **Outlier**

An **outlier** is a value in a dataset that is **much higher or much lower** than most of the other values.
It **stands out** from the rest and can **skew the results** if not handled carefully.

**Example:**

Salaries = [45,000, 48,000, 50,000, 52,000, 5,00,000]

* **5,00,000** is an **outlier** because it is way higher than the other salaries.

**Key Point:**

* Use **median** instead of mean when outliers exist to get a more accurate central value.




## **8. Dispersion (Data Spread)**

| Term          | Meaning                     | Tells You                 |
| ------------- | --------------------------- | ------------------------- |
| **Range**     | Max − Min                   | Quick spread check        |
| **Variance**  | Avg. squared diff from mean | How far values deviate    |
| **Std. Dev.** | √Variance                   | How consistent values are |





## **9. Frequency Distribution**

* Shows **how often** values occur.
* Helps identify patterns or dominant ranges.
* Used in **histograms, pie charts, and bar graphs**.





## **10. Quick Example**

Data = [5, 10, 15, 20, 30]

| Metric          | Formula      | Value |
| --------------- | ------------ | ----- |
| Mean            | 80 ÷ 5       | 16    |
| Median          | Middle value | 15    |
| Range           | 30 − 5       | 25    |
| Variance (Pop.) | 370 ÷ 5      | 74    |
| Std. Dev.       | √74          | ≈ 8.6 |




## **11. Quick Takeaways**

```
Statistics = understand + summarize + infer
Descriptive = describe what you have
Inferential = predict what you don’t have
Outlier = extreme value → use Median
Sample well = less bias, more accuracy
```
 