# **Practice Questions:**

___
___
  **1. Explain the different types of data (qualitative and quantitative) and provide examples of each. Discuss nominal, ordinal, interval, and ratio scales.**

Ans.

  **Qualitative (Categorical):**
  
  Non-numeric data used to describe characteristics or categories.

        
  Nominal Scale: Categories without any specific order.

  Example: Eye color (blue, green, brown).

  Ordinal Scale: Categories with a specific order but without fixed intervals between categories.

  Example: Education level (high school, bachelor's, master's).

__

  **Quantitative (Numerical):**
  
  Numeric data that can be measured or counted.
  
  Interval Scale: Numeric values with meaningful differences but no true zero point.

  Example: Temperature in Celsius.

  Ratio Scale: Numeric values with meaningful differences and a true zero point.

  Example: Weight (in kg), height (in meters).
___
  ___

  **2. What are the measures of central tendency, and when should you use each? Discuss the mean, median, and mode with examples and situations where each is appropriate.**

  Ans.
  
  **Mean:** The average of a dataset. Used when data is continuous and symmetric.
  Example: Average salary of employees in a company.

  **Median:** The middle value of a dataset when arranged in order. Used when data has outliers or is skewed.
  Example: Median household income in a city.

  **Mode:** The most frequent value in a dataset. Used for categorical data or data with repeating values.
  Example: Most common shoe size in a store.
  ___
  ___

**3. Explain the concept of dispersion. How do variance and standard deviation measure the spread of data?**

Ans.

  **Dispersion:** Refers to the spread or variability of a dataset.

**Variance:** Measures the average of the squared differences from the mean.
    Formula:
    Variance=∑(xi−μ)^2/n

  **Standard Deviation:** The square root of variance, indicating how spread out data points are around the mean.
    Formula:
    Standard Deviation=root(Variance)


In [1]:
import numpy as np
data = [10, 12, 23, 23, 16, 23, 21, 16]
variance = np.var(data)
std_deviation = np.std(data)
variance, std_deviation
print(variance, std_deviation)

24.0 4.898979485566356


___
___

  **4. What is a box plot, and what can it tell you about the distribution of data?**

  Ans.
  
  **Box Plot:**
  
  A graphical representation of data showing the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It helps to visualize the spread and identify potential outliers in a dataset.
  ___
  ___

  **5. Discuss the role of random sampling in making inferences about populations.**

Ans.


  **Random Sampling:**
  
  Selecting a subset of individuals from a population such that every individual has an equal chance of being selected. It ensures that the sample is representative, which allows researchers to generalize findings to the entire population and minimize bias.

___
___
  **6. Explain the concept of skewness and its types. How does skewness affect the interpretation of data?**

  Ans.
  
  **Skewness:**
  
   Refers to the asymmetry of a distribution.

 **Positive Skew (Right-Skewed):** The tail on the right side is longer. Most values are concentrated on the left.

  **Negative Skew (Left-Skewed):** The tail on the left side is longer. Most values are concentrated on the right.

  Skewness affects measures of central tendency. For example, in a positively skewed distribution, the mean is greater than the median.

___
___

  **7. What is the interquartile range (IQR), and how is it used to detect outliers?**

  Ans.

  **IQR:**
  
  The range between the first quartile (Q1) and the third quartile (Q3).

  Formula:
  IQR=Q3−Q1IQR=Q3−Q1

___

  **Outlier Detection:**
  
  Data points that are below Q1−1.5×IQRQ1−1.5×IQR or above Q3+1.5×IQRQ3+1.5×IQR are considered outliers.


In [2]:
import numpy as np
data = [10, 12, 23, 23, 16, 23, 21, 16]
q1 = np.percentile(data, 25)
q3 = np.percentile(data, 75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
iqr, lower_bound, upper_bound
print(iqr, lower_bound, upper_bound)

8.0 3.0 35.0


___
___

**8. Discuss the conditions under which the binomial distribution is used.**

Ans.

The binomial distribution is used when:

  - There are a fixed number of trials.

  - Each trial has two possible outcomes (success or failure).

  -The probability of success is the same for each trial.
    The trials are independent.

Example:

Tossing a coin 10 times to count the number of heads.
___
___

**9. Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule).**

Ans.


**Normal Distribution:**

Symmetrical, bell-shaped distribution where the mean, median, and mode are equal.

**Empirical Rule (68-95-99.7 Rule):**


68% of the data falls within 1 standard deviation from the mean.


95% falls within 2 standard deviations.


99.7% falls within 3 standard deviations.
___
___

**10. Provide a real-life example of a Poisson process and calculate the probability for a specific event.**

Ans.


**Poisson Process:**

Models the number of times an event occurs in a fixed interval of time or space.

Example:

A call center receives 5 calls per hour on average. What is the probability that the center will receive 3 calls in an hour?

Formula:

P(x;λ)=λ^x* e^−λ/x!

For λ=5,x=3:

In [3]:
from scipy.stats import poisson
probability = poisson.pmf(3, 5)
probability


0.1403738958142805

___
___
**11. Explain what a random variable is and differentiate between discrete and continuous random variables.**

Ans.

**Random Variable:** A variable that takes on different values based on the outcome of a random event.

**Discrete Random Variable:** Takes on countable values.
Example: Number of heads when flipping a coin three times.

**Continuous Random Variable:** Takes on an infinite number of values within a range.
Example: The height of individuals.

___
___

**12. Provide an example dataset, calculate both covariance and correlation, and interpret the results.**

Example Dataset:

In [4]:
import numpy as np
x = [10, 20, 30, 40, 50]
y = [12, 24, 33, 48, 55]

covariance = np.cov(x, y)[0][1]
correlation = np.corrcoef(x, y)[0][1]
covariance, correlation


(275.0, 0.9954037839433629)

**Covariance:** Measures the direction of the relationship between two variables.

**Correlation:** Measures both the direction and strength of the relationship (between -1 and 1).
___
___