In [None]:
#Question 1

Data can be broadly classified into two types: **qualitative** and **quantitative**.

### Qualitative Data
Qualitative data describes qualities or characteristics. It's non-numerical and often categorical. This type of data is useful for capturing the richness of information that can't be quantified.

**Examples:**
- **Color of cars:** Red, blue, green.
- **Types of cuisine:** Italian, Chinese, Indian.
- **Customer feedback:** Satisfied, neutral, dissatisfied.

### Quantitative Data
Quantitative data is numerical and can be measured. It provides information about quantities and is often used for statistical analysis.

**Examples:**
- **Height of students:** 150 cm, 160 cm, 170 cm.
- **Number of books in a library:** 100, 200, 300.
- **Monthly sales figures:** $5,000, $10,000, $15,000.

### Scales of Measurement
In addition to being qualitative or quantitative, data can also be classified by the level of measurement. There are four main scales: **nominal, ordinal, interval,** and **ratio**.

#### 1. Nominal Scale
- **Definition:** Categorizes data without any order or hierarchy.
- **Example:** Types of animals in a zoo—lion, tiger, bear. There's no inherent order.

#### 2. Ordinal Scale
- **Definition:** Categorizes data with a specific order but without a fixed interval between the categories.
- **Example:** Movie ratings—excellent, good, fair, poor. The order matters, but the difference between ratings is not uniform.

#### 3. Interval Scale
- **Definition:** Numeric scales with equal intervals between values, but no true zero point.
- **Example:** Temperature in Celsius or Fahrenheit. The difference between 20°C and 30°C is the same as between 30°C and 40°C, but 0°C doesn't mean "no temperature."

#### 4. Ratio Scale
- **Definition:** Numeric scales with equal intervals and a true zero point.
- **Example:** Weight—50 kg, 100 kg. Here, 0 kg means no weight, and ratios are meaningful (100 kg is twice as heavy as 50 kg).



In [None]:
#Question 2


Measures of central tendency are statistical measures that describe the center point or typical value of a dataset. The three main measures are **mean**, **median**, and **mode**. Each has its own strengths and is appropriate for different situations.

### Mean
- **Definition:** The mean is the average of a set of numbers. It's calculated by summing all the values and then dividing by the number of values.
- **Example:** In the dataset [2, 4, 6, 8, 10], the mean is (2 + 4 + 6 + 8 + 10) / 5 = 6.
- **When to Use:** The mean is useful when the data is symmetrically distributed without outliers. It takes into account all the values, providing a comprehensive measure of the central tendency.
- **Situations:** Calculating the average exam score of a class, the average temperature over a week.

### Median
- **Definition:** The median is the middle value of a dataset when arranged in ascending or descending order. If there's an even number of values, the median is the average of the two middle values.
- **Example:** In the dataset [2, 4, 6, 8, 10], the median is 6. In [2, 4, 6, 8, 10, 12], the median is (6 + 8) / 2 = 7.
- **When to Use:** The median is useful when the data has outliers or is skewed. It provides a better measure of central tendency in such cases as it is not affected by extreme values.
- **Situations:** Determining the median household income in a region, the median age of participants in a survey.

### Mode
- **Definition:** The mode is the value that appears most frequently in a dataset. A dataset can have one mode, more than one mode, or no mode at all.
- **Example:** In the dataset [2, 4, 4, 6, 8, 10], the mode is 4. In [2, 2, 4, 4, 6, 8, 10], the dataset is bimodal with modes 2 and 4.
- **When to Use:** The mode is useful for categorical data or when identifying the most common value. It helps to understand the frequency distribution of the dataset.
- **Situations:** Finding the most common shoe size sold in a store, the most popular product in a survey.



In [None]:
#Question 3


Dispersion, also known as variability or spread, measures the extent to which data points in a dataset differ from each other and from the central tendency (mean, median, or mode). In simpler terms, it shows how spread out the data values are. Measures of dispersion help us understand the distribution and reliability of the data.

### Measures of Dispersion
Two key measures of dispersion are **variance** and **standard deviation**.

#### Variance
- **Definition:** Variance quantifies the average squared deviation of each data point from the mean. It provides a measure of how much the values in the dataset differ from the mean.
- **Formula:** For a dataset with \( n \) values \((x_1, x_2, \ldots, x_n)\) and mean \(\bar{x}\), the variance (\(\sigma^2\)) is calculated as:
  $$
  \sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2
  $$
- **Example:** For the dataset [2, 4, 6], the mean is 4. The variance is \(\frac{1}{3}[(2-4)^2 + (4-4)^2 + (6-4)^2] = \frac{1}{3}[4 + 0 + 4] = \frac{8}{3} \approx 2.67\).
- **Usefulness:** Variance is particularly useful in statistical analysis and inferential statistics. However, its squared units can sometimes make interpretation less intuitive.

#### Standard Deviation
- **Definition:** Standard deviation is the square root of the variance. It provides a measure of the average distance of each data point from the mean.
- **Formula:** The standard deviation (\(\sigma\)) is calculated as:
  $$
  \sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2}
  $$
- **Example:** For the dataset [2, 4, 6], the variance is \(\approx 2.67\). The standard deviation is \(\sqrt{2.67} \approx 1.63\).
- **Usefulness:** Standard deviation is widely used because it has the same units as the original data, making it easier to interpret. It helps to understand the spread of data and identify outliers.

### When to Use Each Measure
- **Variance:** Use variance when you need to quantify the overall variability of the data, especially in statistical modeling and hypothesis testing.
- **Standard Deviation:** Use standard deviation when you want a measure of dispersion that is easier to interpret and understand. It's commonly used in descriptive statistics and when comparing different datasets.


In [None]:
#Question 4



A **box plot** (or **box-and-whisker plot**) is a graphical representation that summarizes a dataset through its quartiles, highlighting its distribution, central value, and variability. Here's what it can tell you:

### Key Components of a Box Plot
1. **Median (Q2):** The middle value of the dataset, represented by a line inside the box.
2. **Quartiles:**
   - **First Quartile (Q1):** The median of the lower half of the data (25th percentile).
   - **Third Quartile (Q3):** The median of the upper half of the data (75th percentile).
3. **Interquartile Range (IQR):** The range between Q1 and Q3 (IQR = Q3 - Q1), represented by the box.
4. **Whiskers:** Lines extending from the box to the smallest and largest values within 1.5 times the IQR from Q1 and Q3, respectively.
5. **Outliers:** Data points beyond the whiskers, often represented as individual dots.

### What a Box Plot Tells You
- **Central Value:** The median line inside the box shows the dataset's central value.
- **Spread:** The box's width (IQR) indicates how spread out the middle 50% of the data is. A wider box means more variability.
- **Symmetry:** The box and whiskers' position and length show whether the data is symmetric or skewed.
  - **Symmetric Distribution:** Median line is in the middle, and whiskers are of equal length.
  - **Skewed Distribution:** Median line is closer to Q1 or Q3, and whiskers are unequal.
- **Outliers:** Points beyond the whiskers highlight unusual values or potential data entry errors.

### Example
Consider this dataset of test scores: [55, 65, 70, 75, 80, 85, 90, 95, 100]
- **Median (Q2):** 80
- **Q1:** 70
- **Q3:** 90
- **IQR:** 20 (90 - 70)
- **Whiskers:** Extend to 55 (Q1 - 1.5*IQR) and 100 (Q3 + 1.5*IQR)
- **Outliers:** None

A box plot for this data would show the median (80), the box spanning from 70 to 90, and whiskers from 55 to 100.

In [None]:
#Question 5

Random sampling is a fundamental technique in statistics used to make inferences about a population based on a smaller, manageable subset of that population. The goal is to obtain a sample that accurately represents the broader population, enabling researchers to draw conclusions with a known level of confidence.

### Importance of Random Sampling
1. **Representation:** Random sampling ensures that every member of the population has an equal chance of being selected. This helps in creating a sample that is representative of the entire population, reducing bias and improving the validity of the inferences.
2. **Generalization:** By analyzing a randomly selected sample, researchers can generalize their findings to the larger population. This is crucial in studies where surveying the entire population is impractical due to time, cost, or logistical constraints.
3. **Reliability:** Random sampling enhances the reliability and credibility of statistical results. It minimizes the risk of overestimating or underestimating population parameters, leading to more accurate and trustworthy conclusions.

### Types of Random Sampling
1. **Simple Random Sampling:** Each member of the population has an equal chance of being selected. This can be done using random number generators or drawing lots.
   - **Example:** Drawing names from a hat to select participants for a survey.
2. **Stratified Random Sampling:** The population is divided into subgroups (strata) based on certain characteristics (e.g., age, gender). A random sample is then drawn from each stratum.
   - **Example:** Surveying students by randomly selecting equal numbers of boys and girls from each grade level.
3. **Systematic Random Sampling:** A starting point is randomly selected, and then every \(n\)th member of the population is chosen.
   - **Example:** Selecting every 10th person from a list of customers for feedback.
4. **Cluster Sampling:** The population is divided into clusters, and entire clusters are randomly selected. This method is useful when the population is spread over a large area.
   - **Example:** Randomly selecting a few schools from a district and surveying all students in those schools.

### Making Inferences
Random sampling allows researchers to make inferences about population parameters (e.g., mean, proportion) based on sample statistics. Here’s how it works:
1. **Point Estimation:** Using sample data to estimate population parameters. For example, the sample mean can estimate the population mean.
2. **Confidence Intervals:** Providing a range of values within which the population parameter is likely to fall. Confidence intervals account for sampling variability and give an estimate of precision.
3. **Hypothesis Testing:** Assessing whether there is enough evidence in the sample to support or reject a hypothesis about the population.

### Example
Imagine a company wants to know the average job satisfaction level of its employees. Surveying all employees is impractical, so they use random sampling:
- They use simple random sampling to select 100 employees from the entire workforce.
- The average job satisfaction score from the sample is calculated.
- Using this sample mean, they construct a confidence interval to estimate the average job satisfaction for all employees.
- They perform hypothesis testing to determine if the satisfaction level meets a certain threshold.



In [None]:
#Question 6

### Skewness
Skewness is a measure of the asymmetry or lack of symmetry in the distribution of data. It tells us if the data is skewed to the left (negatively skewed), to the right (positively skewed), or if it is symmetric (no skewness). Skewness affects how we interpret the central tendency and the spread of the data.

### Types of Skewness

1. **Positive Skewness (Right Skew)**
   - **Definition:** When the tail on the right side of the distribution is longer or fatter than the left side.
   - **Characteristics:** The mean is usually greater than the median, and the data has a longer right tail.
   - **Example:** Income distribution, where a small number of people earn significantly more than the majority.

2. **Negative Skewness (Left Skew)**
   - **Definition:** When the tail on the left side of the distribution is longer or fatter than the right side.
   - **Characteristics:** The mean is usually less than the median, and the data has a longer left tail.
   - **Example:** Age at retirement, where most people retire at a certain age, but some retire much earlier.

3. **Zero Skewness (Symmetric Distribution)**
   - **Definition:** When the distribution is perfectly symmetrical, having no skewness.
   - **Characteristics:** The mean, median, and mode are equal.
   - **Example:** Heights of adults in a large population, assuming a normal distribution.

### Effect of Skewness on Data Interpretation

1. **Mean, Median, and Mode:**
   - In a positively skewed distribution, the mean is higher than the median. This is because the long right tail pulls the mean to the right.
   - In a negatively skewed distribution, the mean is lower than the median. The long left tail pulls the mean to the left.
   - In a symmetric distribution, the mean, median, and mode are equal or very close to each other.

2. **Data Spread:**
   - Skewness indicates that the data is not evenly distributed around the central value. Understanding the direction and extent of skewness helps in identifying the spread and variability of data.

3. **Data Analysis:**
   - Skewness affects the choice of statistical measures and tests. For example, in skewed data, the median is often a better measure of central tendency than the mean.
   - Statistical models that assume normality (symmetry) may not be appropriate for skewed data, and transformations or non-parametric methods may be needed.

### Examples of Skewness in Real Life
- **Positive Skewness:** Household income, where a few high-income households skew the data.
- **Negative Skewness:** Age of retirement, where most people retire around a certain age, but some retire much earlier.
- **Zero Skewness:** Heights of adults, which tend to follow a normal distribution.



In [None]:
#Question 7


### Interquartile Range (IQR)

The Interquartile Range (IQR) is a measure of statistical dispersion, which is the spread of the data points in a dataset. It represents the range within which the middle 50% of the data lies. The IQR is calculated by subtracting the first quartile (Q1) from the third quartile (Q3):
$$
\text{IQR} = Q3 - Q1
$$

### Quartiles
- **First Quartile (Q1):** The 25th percentile, marking the point below which 25% of the data falls.
- **Third Quartile (Q3):** The 75th percentile, marking the point below which 75% of the data falls.

### Example
For the dataset [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
- Q1 (25th percentile) is 3.
- Q3 (75th percentile) is 8.
- IQR = 8 - 3 = 5.

### Using IQR to Detect Outliers

Outliers are data points that lie significantly outside the range of the rest of the data. The IQR is used to identify these outliers by defining an acceptable range for data points.

1. **Calculate the lower and upper bounds:**
   - **Lower Bound:** \( Q1 - 1.5 \times \text{IQR} \)
   - **Upper Bound:** \( Q3 + 1.5 \times \text{IQR} \)

2. **Identify outliers:**
   - Any data point below the lower bound or above the upper bound is considered an outlier.

### Example
Continuing with our dataset [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
- IQR = 5
- Lower Bound = 3 - 1.5 \times 5 = -4.5
- Upper Bound = 8 + 1.5 \times 5 = 15.5

Since all data points (1 to 10) fall within the range -4.5 to 15.5, there are no outliers in this dataset.

### Real-Life Application
Consider a dataset of student exam scores. If most students score between 60 and 90, but a few score below 30 or above 95, the IQR can help identify these extreme scores as outliers, which may indicate exceptional performance or issues that need investigation.

Using the IQR to detect outliers is a robust method, especially in skewed distributions, where other measures like the mean might not be as reliable.


In [None]:
#Question 8

The **binomial distribution** is a discrete probability distribution that describes the number of successes in a fixed number of independent trials of a binary (yes/no) experiment. For the binomial distribution to be applicable, certain conditions must be met:

### Conditions for Using the Binomial Distribution

1. **Fixed Number of Trials (n):**
   - The experiment consists of a fixed number of trials, denoted by \( n \).
   - Example: Flipping a coin 10 times.

2. **Binary Outcomes:**
   - Each trial has only two possible outcomes: success or failure.
   - Example: Getting a head (success) or a tail (failure) in a coin toss.

3. **Independence:**
   - The outcomes of each trial are independent of each other. The result of one trial does not affect the outcome of another.
   - Example: The result of each coin toss does not influence the next coin toss.

4. **Constant Probability of Success (p):**
   - The probability of success \( p \) remains constant for each trial.
   - Example: The probability of getting a head in a fair coin toss is always 0.5.

### Parameters of the Binomial Distribution

- **n (Number of Trials):** The total number of trials in the experiment.
- **p (Probability of Success):** The probability of success in each trial.
- **q (Probability of Failure):** The probability of failure in each trial, where \( q = 1 - p \).

### Probability Mass Function (PMF)

The probability of observing exactly \( k \) successes in \( n \) trials is given by the binomial probability mass function (PMF):
$$
P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
$$
where \(\binom{n}{k}\) is the binomial coefficient, representing the number of ways to choose \( k \) successes out of \( n \) trials.

### Example

Let's say we have a fair coin (p = 0.5) and we want to find the probability of getting exactly 3 heads in 5 flips:
- \( n = 5 \)
- \( p = 0.5 \)
- \( k = 3 \)

Using the binomial PMF:
$$
P(X = 3) = \binom{5}{3} (0.5)^3 (0.5)^{5-3} = 10 \times 0.125 \times 0.25 = 0.3125
$$

So, the probability of getting exactly 3 heads in 5 coin flips is 0.3125.

### Applications

- **Quality Control:** Checking the number of defective items in a batch.
- **Survey Sampling:** Estimating the proportion of people who favor a certain choice.
- **Medical Trials:** Determining the number of patients who respond to a treatment.


In [None]:
#Question 9

### Properties of the Normal Distribution

The **normal distribution** (or Gaussian distribution) is one of the most important probability distributions in statistics. It is a continuous probability distribution that is symmetric and bell-shaped, and it describes how the values of a variable are distributed.

Here are the key properties of the normal distribution:

1. **Symmetry:**
   - The normal distribution is symmetric about its mean. This means the left half is a mirror image of the right half.
   - The mean, median, and mode of the distribution are all equal and located at the center of the distribution.

2. **Bell-Shaped Curve:**
   - The distribution forms a bell-shaped curve.
   - The tails of the distribution approach the horizontal axis asymptotically, meaning they get closer and closer to the axis but never actually touch it.

3. **Mean and Standard Deviation:**
   - The shape of the normal distribution is defined by two parameters: the mean (μ) and the standard deviation (σ).
   - The mean determines the center of the distribution, while the standard deviation controls the spread or width of the distribution.

4. **68-95-99.7 Rule (Empirical Rule):**
   - This rule states how data is distributed around the mean in a normal distribution.
   - **68%** of the data falls within one standard deviation (σ) of the mean (μ).
   - **95%** of the data falls within two standard deviations (2σ) of the mean.
   - **99.7%** of the data falls within three standard deviations (3σ) of the mean.

### The Empirical Rule (68-95-99.7 Rule)

The Empirical Rule is a quick way to understand the distribution of data in a normal distribution. Here's what each part of the rule tells us:

1. **68% within 1σ:**
   - Approximately 68% of the data lies within one standard deviation above and below the mean (μ ± σ).
   - This range captures most of the data points.

2. **95% within 2σ:**
   - Approximately 95% of the data lies within two standard deviations above and below the mean (μ ± 2σ).
   - This range includes the majority of the data points, providing a broader view of the distribution.

3. **99.7% within 3σ:**
   - Approximately 99.7% of the data lies within three standard deviations above and below the mean (μ ± 3σ).
   - This range almost encompasses the entire dataset, leaving very few outliers.

### Example

Consider the heights of adult men in a population, which are normally distributed with a mean (μ) of 175 cm and a standard deviation (σ) of 10 cm:
- **68% of the men** will have heights between 165 cm and 185 cm (μ ± σ).
- **95% of the men** will have heights between 155 cm and 195 cm (μ ± 2σ).
- **99.7% of the men** will have heights between 145 cm and 205 cm (μ ± 3σ).



In [None]:
#Question 10

### Real-Life Example of a Poisson Process

A **Poisson process** is a statistical process that models the occurrence of events over a fixed period of time or space. These events must happen independently of each other and at a constant average rate. One common real-life example is the number of calls received by a customer support center in an hour.

### Example Scenario: Customer Support Calls

Let's say a customer support center receives an average of 5 calls per hour. We can model this as a Poisson process with a rate (λ) of 5 calls per hour.

### Probability Calculation for a Specific Event

Let's calculate the probability of receiving exactly 3 calls in an hour.

The formula for the Poisson probability mass function (PMF) is:
$$
P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}
$$
where:
- \( P(X = k) \) is the probability of receiving exactly \( k \) calls.
- \( \lambda \) is the average rate (5 calls per hour).
- \( k \) is the number of calls (3 calls).
- \( e \) is the base of the natural logarithm (approximately 2.71828).

### Calculation

Substituting the values into the formula:
- \( \lambda = 5 \)
- \( k = 3 \)

$$
P(X = 3) = \frac{e^{-5} \times 5^3}{3!}
$$

Calculating step-by-step:

1. Calculate \( e^{-5} \):
   $$ e^{-5} \approx 0.00674 $$

2. Calculate \( 5^3 \):
   $$ 5^3 = 125 $$

3. Calculate \( 3! \) (factorial of 3):
   $$ 3! = 3 \times 2 \times 1 = 6 $$

4. Substitute these values into the formula:
   $$ P(X = 3) = \frac{0.00674 \times 125}{6} \approx \frac{0.8425}{6} \approx 0.1404 $$

So, the probability of receiving exactly 3 calls in an hour is approximately 0.1404, or 14.04%.

### Interpretation

This means that there is a 14.04% chance that the customer support center will receive exactly 3 calls in an hour.

The Poisson process and distribution are useful for modeling random events over time or space, such as traffic accidents, arrival of customers, or system failures.

In [None]:
#Question 11

### Random Variable

A **random variable** is a numerical outcome of a random phenomenon or experiment. It assigns a number to each possible outcome in a sample space. Random variables are fundamental in probability and statistics, helping us quantify and analyze uncertainty.

### Types of Random Variables

1. **Discrete Random Variables:**
   - **Definition:** A discrete random variable can take on a countable number of distinct values. These values are often whole numbers.
   - **Examples:**
     - **Number of students** in a classroom: 20, 25, 30.
     - **Number of heads** when flipping a coin 5 times: 0, 1, 2, 3, 4, 5.
   - **Characteristics:**
     - The values are countable and finite.
     - Each value has an associated probability.
   - **Probability Distribution:**
     - Described by a probability mass function (PMF), which gives the probability of each possible value.

2. **Continuous Random Variables:**
   - **Definition:** A continuous random variable can take on an infinite number of possible values within a given range. These values are often real numbers.
   - **Examples:**
     - **Height** of students: 150.5 cm, 160.3 cm, 175.2 cm.
     - **Time** it takes to run a marathon: 3.5 hours, 4.2 hours.
   - **Characteristics:**
     - The values are uncountable and can take any value within a range.
     - Probabilities are assigned to intervals, not individual values.
   - **Probability Distribution:**
     - Described by a probability density function (PDF), which gives the likelihood of the variable falling within a certain range.

### Comparison

| Feature                | Discrete Random Variable       | Continuous Random Variable    |
|------------------------|--------------------------------|-------------------------------|
| **Values**             | Countable, finite              | Uncountable, infinite         |
| **Examples**           | Number of cars, dice rolls     | Height, temperature           |
| **Probability**        | Assigned to specific values    | Assigned to intervals         |
| **Distribution**       | Probability mass function (PMF)| Probability density function (PDF) |

### Example

- **Discrete Random Variable:** Suppose you roll a fair six-sided die. The random variable \( X \) represents the outcome. \( X \) can take on values {1, 2, 3, 4, 5, 6}. The probability of each outcome is \( \frac{1}{6} \).

- **Continuous Random Variable:** Consider measuring the weight of apples in a basket. The random variable \( Y \) represents the weight. \( Y \) can take on any value within a range, such as 150 grams to 200 grams. The probability of \( Y \) falling within an interval (e.g., 160 to 170 grams) can be determined using a PDF.


In [None]:
#Question 12


### Example Dataset
Consider the following dataset with two variables: **X** (hours studied) and **Y** (exam scores):

| Student | X (Hours Studied) | Y (Exam Scores) |
|---------|--------------------|-----------------|
| 1       | 2                  | 75              |
| 2       | 3                  | 80              |
| 3       | 5                  | 90              |
| 4       | 7                  | 85              |
| 5       | 8                  | 95              |

### Step 1: Calculate Covariance
Covariance measures the degree to which two variables change together. The formula for covariance is:
$$
\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})
$$

Where:
- \( n \) is the number of data points.
- \( X_i \) and \( Y_i \) are the individual data points for variables X and Y.
- \( \bar{X} \) and \( \bar{Y} \) are the means of X and Y, respectively.

 calculate the means:
- \( \bar{X} = \frac{2 + 3 + 5 + 7 + 8}{5} = 5 \)
- \( \bar{Y} = \frac{75 + 80 + 90 + 85 + 95}{5} = 85 \)

Now, calculate the covariance:
$$
\text{Cov}(X, Y) = \frac{1}{4} [(2-5)(75-85) + (3-5)(80-85) + (5-5)(90-85) + (7-5)(85-85) + (8-5)(95-85)]
$$
$$
\text{Cov}(X, Y) = \frac{1}{4} [(-3)(-10) + (-2)(-5) + (0)(5) + (2)(0) + (3)(10)]
$$
$$
\text{Cov}(X, Y) = \frac{1}{4} [30 + 10 + 0 + 0 + 30] = \frac{1}{4} [70] = 17.5
$$

### Step 2: Calculate Correlation
Correlation measures the strength and direction of the linear relationship between two variables. The formula for the Pearson correlation coefficient is:
$$
r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
$$

Where:
- \( \text{Cov}(X, Y) \) is the covariance.
- \( \sigma_X \) and \( \sigma_Y \) are the standard deviations of X and Y.

First, let's calculate the standard deviations:
- Standard deviation of X (\(\sigma_X\)):
  $$
  \sigma_X = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2} = \sqrt{\frac{1}{4} [(2-5)^2 + (3-5)^2 + (5-5)^2 + (7-5)^2 + (8-5)^2]}
  $$
  $$
  \sigma_X = \sqrt{\frac{1}{4} [9 + 4 + 0 + 4 + 9]} = \sqrt{\frac{26}{4}} = \sqrt{6.5} \approx 2.55
  $$

- Standard deviation of Y (\(\sigma_Y\)):
  $$
  \sigma_Y = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (Y_i - \bar{Y})^2} = \sqrt{\frac{1}{4} [(75-85)^2 + (80-85)^2 + (90-85)^2 + (85-85)^2 + (95-85)^2]}
  $$
  $$
  \sigma_Y = \sqrt{\frac{1}{4} [100 + 25 + 25 + 0 + 100]} = \sqrt{\frac{250}{4}} = \sqrt{62.5} \approx 7.91
  $$

Now, calculate the correlation coefficient:
$$
r = \frac{17.5}{2.55 \times 7.91} \approx \frac{17.5}{20.17} \approx 0.87
$$

### Interpretation
- **Covariance (17.5):** A positive covariance indicates that as the number of hours studied increases, exam scores also tend to increase.
- **Correlation (0.87):** The correlation coefficient of 0.87 suggests a strong positive linear relationship between hours studied and exam scores. This means that, in general, students who study more tend to score higher on exams.

