# Statistics Basics

#   Theory Questions

**Q 1.  What is statistics, and why is it important?**

**Ans.** tatistics is the branch of mathematics that deals with the collection, organization, analysis, interpretation, and presentation of data. It helps us understand and draw conclusions from data, whether it's numerical (quantitative) or categorical (qualitative).

Statistics is Important: 

1. **Decision-Making:** Statistics provides tools to make informed decisions based on data — whether in business, medicine, government, or daily life.
2. **Understanding Trends:** It helps us identify trends and patterns — for example, how sales are growing or how a disease is spreading.
3. **Scientific Research:** In scientific studies, statistics is essential for designing experiments, analyzing results, and validating hypotheses.
4. **Risk Management:** In finance, insurance, and engineering, statistics helps assess and manage risk.
5. **Policy Making:** Governments rely on statistics to shape public policy based on population surveys, economic data, and health metrics.
6. **Predictive Power:** Using probability and statistical models, we can predict future outcomes like weather forecasts, election results, or consumer behavior.

**Q 2. What are the two main types of statistics?**

**Ans.** The two main types of statistics are:

1. **Descriptive Statistics:** Descriptive statistics summarize and organize data so it can be easily understood.

**Key features:**
- Focuses on what the data shows.
- Involves measures such as:
    - Mean (average)
    - Median (middle value)
    - Mode (most frequent value)
    - Standard deviation (spread of data)
    - Graphs and charts (like bar charts, histograms, pie charts).

Example: A teacher calculates the average test score of her class to summarize performance.

2. **Inferential Statistics:** Inferential statistics use data from a sample to make predictions or generalizations about a larger population.

**Key features:**
- Focuses on drawing conclusions beyond the data.
- Involves:
    - Hypothesis testing
    - Confidence intervals
    - Regression analysis
    - Probability models

Example: A researcher tests a new drug on 100 patients and uses the results to infer its effectiveness on the entire population.

**Q 3.  What are descriptive statistics?**

**Ans.** Descriptive statistics are methods used to summarize, organize, and present data in a meaningful way. They help you understand the main features of a dataset without drawing conclusions beyond the data itself.

**Key Features of Descriptive Statistics:**

1. They describe the data — not interpret or predict.
2. They apply to the actual data collected, not to a larger population.
3. They simplify large amounts of information into understandable forms 

**Types of Descriptive Statistics:**

1. **Measures of Central Tendency:**

These indicate the center or average of a dataset:

- Mean – the average value
- Median – the middle value
- Mode – the most frequent value

2. **Measures of Dispersion (Spread):**

These show how spread out the data is:

- Range – difference between highest and lowest values
- Variance – average squared deviation from the mean
- Standard Deviation – average distance from the mean

3. **Measures of Position**

These locate values within a dataset:

- Percentiles – indicate relative standing (e.g., 90th percentile)
- Quartiles – divide data into four equal parts

4. **Data Visualization Tools**

These help visually interpret data:

- Bar charts
- Histograms
- Pie charts
- Box plots
- Frequency tables

**Q 4. What is inferential statistics?** 

**Ans.** Inferential statistics is the branch of statistics that allows us to make predictions, generalizations, or decisions about a larger population based on data collected from a sample.

>Key Idea: You study a small part (the sample) and use it to draw conclusions about the whole group (the population).

**Purpose of Inferential Statistics:**

- To make inferences about populations.
- To test hypotheses and determine relationships.
- To estimate population parameters (like the true average or proportion).

**Key Techniques:**

1. **ypothesis Testing**: Tests assumptions or claims about a population (e.g., "Does this medicine work better than the old one?").
2. **Confidence Intervals**: Gives a range of values within which the true population parameter likely falls (e.g., "We are 95% confident that the average height is between 165–170 cm").
3. **Regression Analysis**: Predicts the value of one variable based on another (e.g., income based on education level).
4. **Chi-square tests and ANOVA**: Determine relationships or differences between groups or categories.
5. **Probability** Used to measure the likelihood that an observed result happened by chance.

Example:

Suppose a researcher surveys 200 people about their favorite soda, and 60% say they prefer brand A.

With inferential statistics, they might say:

*"We estimate that around 60% of all people in the city prefer brand A, with a 5% margin of error."*

**Q 5. What is sampling in statistics?**

**Ans.** Sampling in statistics is the process of selecting a subset (sample) from a larger group (population) to collect data and make inferences about the whole population.

**Purpose of Sampling:** 

To gather data from a representative group so you can:

1. Estimate population values
2. Test hypotheses
3. Make predictions

**Types of Sampling:**

1. **Probability Sampling (random and unbiased):**
    - Simple Random Sampling: Everyone has an equal chance of being selected.
    - Stratified Sampling: Population is divided into groups (strata), and samples are taken from each.
    - Systematic Sampling: Every nth item is chosen.
    - Cluster Sampling: Population is divided into clusters, and some clusters are fully surveyed.

2. **Non-Probability Sampling (not random, may be biased)**
    - Convenience Sampling: Easy-to-reach participants are selected.
    - Judgmental Sampling: Based on expert judgment.
    - Quota Sampling: Ensures certain characteristics are represented.
    - Snowball Sampling: Participants refer others (useful for hard-to-reach groups).

**Q 6. What are the different types of sampling methods?**

**Ans.** In statistics, sampling methods are techniques used to select individuals or items from a larger group (the population) to form a sample. These methods are broadly divided into two categories:

1. **Probability Sampling Methods:** Every member of the population has a known, non-zero chance of being selected.

- *Simple Random Sampling*
    - Every individual has an equal chance of being chosen.
    - Usually done using a random number generator or lottery system.
    - 🔹 Example: Drawing 50 names out of a hat from a class of 200.

- *Stratified Sampling*
    - The population is divided into strata (subgroups) based on a characteristic (e.g., gender, age).
    - A random sample is taken from each stratum.
    - 🔹 Example: Surveying 100 people by selecting 50 men and 50 women from different age groups.

- *Systematic Sampling*
    - Select every kth individual from a list after a random starting point.
    - 🔹 Example: Every 10th person on a company employee list.

- *Cluster Sampling*
    - Divide the population into clusters (often geographically), then randomly select entire clusters.
    - 🔹 Example: Randomly selecting 5 schools and surveying all students in those schools.

2. **Non-Probability Sampling Methods:**  Not every member has a known or equal chance of being selected; often quicker and cheaper but more prone to bias.

- **Convenience Sampling:**
    - Select individuals who are easy to reach or access.
    - 🔹 Example: Surveying people at a mall.

- **Judgmental (Purposive) Sampling:**
- The researcher selects individuals based on their knowledge or expertise.

- 🔹 Example: Interviewing experts in a medical study.

- **Quota Sampling:**
    - Similar to stratified sampling but non-random; the researcher fills quotas for different groups.
    - 🔹 Example: Interviewing 30 men and 30 women from different age groups until quotas are met.

- **Snowball Sampling:**
    - Existing participants refer others from their network; useful for hard-to-reach populations
    - 🔹 Example: Studying people with rare diseases or underground communities.  

*Summary Table:*

| Method Type     | Sampling Method | Key Feature                                     |
| --------------- | --------------- | ----------------------------------------------- |
| Probability     | Simple Random   | Equal chance for all                            |
| Probability     | Stratified      | Sample from each subgroup                       |
| Probability     | Systematic      | Every kth element selected                      |
| Probability     | Cluster         | Randomly select entire groups                   |
| Non-Probability | Convenience     | Based on accessibility                          |
| Non-Probability | Judgmental      | Based on researcher judgment                    |
| Non-Probability | Quota           | Target specific subgroups without randomization |
| Non-Probability | Snowball        | Participants recruit other participants         |


**Q 7.  What is the difference between random and non-random sampling?**

**Ans.** The main difference between random and non-random sampling lies in how participants are selected from the population — whether by chance or by choice.

*Random Sampling (also called probability sampling):* In random sampling, every individual in the population has a known and equal chance of being selected. It uses random methods like a lottery or random number generator.

*Key Features:*
- Unbiased: Reduces selection bias
- Representative: Better reflects the entire population
- Statistical inference: Results can be generalized to the population
- Examples:
    - Simple Random Sampling
    - Stratified Sampling
    - Systematic Sampling
    - Cluster Sampling

*Non-Random Sampling (also called non-probability sampling):* In non-random sampling, not every individual has an equal or known chance of being selected. Selection is based on convenience, judgment, or accessibility, not randomness.

*Key Features:*
- Biased: Higher risk of sampling bias
- Less representative: May not reflect the population accurately
- Limited generalization: Results apply mainly to the sample
- Examples:
    - Convenience Sampling
    - Judgmental (Purposive) Sampling
    - Quota Sampling
    - Snowball Sampling

In [None]:
Summary Table:

| Feature             | Random Sampling                    | Non-Random Sampling                      |
| ------------------- | ---------------------------------- | ---------------------------------------- |
| Selection Method    | Based on chance                    | Based on convenience or judgment         |
| Chance of Selection | Known and equal                    | Unknown or unequal                       |
| Bias                | Low (if done properly)             | High                                     |
| Representation      | Likely representative              | May not represent population well        |
| Generalizability    | Results can be generalized         | Limited generalization                   |
| Examples            | Simple random, stratified, cluster | Convenience, quota, snowball, judgmental |


**Q 8. Define and give examples of qualitative and quantitative data**