# **Statistics Basics**

Q1. Explain the different types of data (qualitative and quantitative) and provide examples of each. Discuss
nominal, ordinal, interval, and ratio scales. ?

✴ Data can be classified into two main types: qualitative and quantitative.
 1. Qualitative Data >
 - Also known as categorical data
 - It represents descriptions or characteristics that cannot be measured numerically.
 - Typically classified into different categories or labels.
 - Further divided into Nominal and Ordinal scales.
 - Examples: Gender (Male, Female, Other)

 2. Quantitative Data >
 - Also known as numerical data
 - It represents numerical values that can be counted or measured.
 - Further divided into Discrete and Continuous types.
 - Uses Interval and Ratio scales.
 - Examples: Age (22 years)

1. Nominal Scale (Categorical, No Order)
 - Data is classified into distinct categories with no specific order.
 - Examples: Blood groups (A, B, AB, O)


2. Ordinal Scale (Categorical, Ordered)
 - Data is categorized with a meaningful order, but the difference between categories is not uniform.
 - Used when ranking or ordering matters.
 - Examples: Education levels, Survey responses

3. Interval Scale (Numerical, No True Zero)
 - Numeric scale where differences between values are meaningful.
 - No absolute zero point, meaning values can be negative.
 - Examples: Temperature (°C, °F), IQ scores


4. Ratio Scale (Numerical, True Zero)
 - Numeric scale with a meaningful zero point, allowing for meaningful ratio comparisons.
 - Most precise level of measurement.
 - Examples: Height (0 cm means no height)









Q2. What are the measures of central tendency, and when should you use each? Discuss the mean, median,
and mode with examples and situations where each is appropriate ?
- Measures of central tendency are statistical metrics that summarize a set of data by identifying the central point within that dataset.

1. Mean

 - Definition: The average of a dataset, calculated by summing all values and dividing by the number of values.
 - Example: For the dataset 3, 5, 7, 9, the mean is ( 3 + 5 + 7 + 9) / 4 = 6.
 - Use: Best for normally distributed data without outliers.
2. Median

 - Definition: The middle value when data is ordered. If even, it's the average of the two middle values.
 - Example: For 3, 5, 7, 9, the median is 5. For 3, 5, 7, 9 the median is (5 + 7) / 2 = 6.
 - Use: Ideal for skewed data or when outliers are present.
3. Mode

 - Definition: The most frequently occurring value in a dataset.
 - Example: In 2, 3, 3, 5, 7, 10, the mode is 3. In 2, 3, 3, 5, 5, the modes are 3 and 5 (bimodal).
 - Use: Useful for categorical data or to find the most common value.

Q3. Explain the concept of dispersion. How do variance and standard deviation measure the spread of data ?
- Dispersion refers to how spread out or varied the data points in a dataset are. It helps us understand how much the values differ from each other and from the average (mean).
- Variance
  - It measures the average squared deviation of each data point from the mean
 and gives an idea of how spread out the data is. A larger variance means more spread, while a smaller variance means the data is closer to the mean.
- Standard deviation
 - It is simply the square root of the variance
 - It is more intuitive and commonly used in practice since it maintains the same unit as the data.



Q4. What is a box plot, and what can it tell you about the distribution of data??
- A box plot is a graphical representation of the distribution of a dataset. It provides a visual summary of key statistical measures, including the median, quartiles, and potential outliers

-  Box Plot Can Tell You About ▶
1. Central Tendency: The position of the median line within the box indicates the central tendency of the data
2. Outliers: The presence of outliers can be easily identified, which may indicate variability in the data
3. Comparison Between Groups: Box plots are particularly useful for comparing distributions between different groups or categories
4. Skewness – If the median is not centered inside the box, it suggests skewness:
 - Left-skewed (negative skew): Median closer to Q3.
 - Right-skewed (positive skew): Median closer to Q1.

Q5. Discuss the role of random sampling in making inferences about populations ?
- Random sampling is essential for making valid inferences about populations. It ensures that samples are representative, reduces bias, supports statistical validity, and allows for generalization of findings.
- Example of Random Sampling in Inference
A company wants to know the average salary of its 5,000 employees. Instead of surveying everyone, they randomly select 300 employees and calculate their average salary.

Q6. Explain the concept of skewness and its types. How does skewness affect the interpretation of data ?
- Skewness is a statistical measure that describes the asymmetry of the distribution of data points in a dataset. It indicates whether the data is skewed to the left (negatively skewed) or to the right (positively skewed) relative to the mean
- Skewness affects the interpretation by indicating the presence of outliers and the direction of data concentration, which is crucial for choosing the right statistical methods.

Q7. What is the interquartile range (IQR), and how is it used to detect outliers?
- The interquartile range (IQR) is a measure of statistical dispersion that represents the range within which the middle 50% of a dataset lies. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1)
- The following steps outline how to use the IQR for outlier detection:

 - Calculate the IQR as described above.

 - Determine the Lower and Upper Boundaries:

Lower Bound: ( Q1 - 1.5 \times \text{IQR} )
Upper Bound: ( Q3 + 1.5 \times \text{IQR} )
 - Identify Outliers:

Any data point below the lower bound or above the upper bound is considered an outlier.

Q8. Discuss the conditions under which the binomial distribution is used ?
- The binomial distribution is used to model the number of successes in a fixed number of independent trials, where each trial has two possible outcomes (success or failure).

1. Key Conditions:
Fixed Number of Trials (n) – The total number of trials is predetermined.

2. Only Two Outcomes – Each trial results in either a success (p) or a failure (1 - p).

3. Independent Trials – The outcome of one trial does not affect another.

4. Constant Probability (p) – The probability of success remains the same for all trials.

Q9. Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule) ?

 Properties of the Normal Distribution:

1. Symmetry: The curve is symmetric around the mean (
𝜇
μ), meaning the left and right sides are mirror images.

2. Mean = Median = Mode: In a normal distribution, all three measures of central tendency are equal.

3. Bell-Shaped Curve: The highest point is at the mean, and the probability density decreases as you move away from it.

4. Asymptotic: The curve never touches the x-axis but extends infinitely in both directions.

 The empirical rule applies to normally distributed data and states that:

 - 68% of data falls within one standard deviation (
𝜇
±
1
𝜎
μ±1σ) of the mean.

 - 95% of data falls within two standard deviations (
𝜇
±
2
𝜎
μ±2σ) of the mean.

 - 99.7% of data falls within three standard deviations (
𝜇
±
3
𝜎
μ±3σ) of the mean.


Q10. Provide a real-life example of a Poisson process and calculate the probability for a specific event ?
- Suppose buses arrive at a bus stop at an average rate of 2 buses every 30 minutes.
Probability of exactly 1 bus arrives in 30 minutes
- Average rate 𝜆 = 2

- Desired number of arrivals k = 1

- Use the formula:

P(X=1)= e ^-2 ⋅2^1 / 1! = e^-2 . 2

e^-2 ≈ 0.1353

P (X=1) = 0.1353⋅2 = 0.2706
- Final Answer:
There is a 27.1% chance that exactly 1 bus will arrive in 30 minutes.



Q11. Explain what a random variable is and differentiate between discrete and continuous random variables ?
- A random variable is a variable that represents the numerical outcome of a random experiment. It assigns a number to each possible outcome of that experiment.
- Example - You roll a fair six-sided die once.

 Let’s define a random variable X as:

 X= the number that appears on the top face of the die.

  Differentiate between discrete and continuous random variables are ▶

- Discrete Random Variable
1. Values - Countable
2. Examples - Number of students, number of calls
3. Probability Function - Probability Mass Function
4. Probability of a Specific ValueN - Non-zero (can be calculated)

- Continuous Random Variable

1. Values - Uncountable
2. Examples - Height, time, temperature
3. Probability Function Probability Density Function
4. Probability of a Specific Value - Zero (probability is calculated over intervals)





