### **1. What are descriptive and inferential statistics? How do they differ?**

**Descriptive Statistics:**

* Summarizes or describes characteristics of a dataset.
* Examples: mean, median, mode, standard deviation, frequency.

**Inferential Statistics:**

* Makes predictions or inferences about a population based on a sample.
* Examples: hypothesis testing, confidence intervals, regression.

**Difference:**

| Feature    | Descriptive Statistics | Inferential Statistics          |
| ---------- | ---------------------- | ------------------------------- |
| Purpose    | Describe data          | Draw conclusions from data      |
| Based on   | Entire dataset         | Sample of data                  |
| Techniques | Mean, median, SD       | t-tests, chi-square, regression |

---

### **2. Explain the importance of sampling in statistics.**

**Sampling** allows us to:

* Study large populations efficiently.
* Reduce time and cost.
* Draw valid inferences using representative data.
* Avoid biases if done properly (e.g., via random or stratified sampling).

Example:

* A company surveys 1,000 customers out of 1 million to estimate satisfaction.

---

### **3. Why are measures of central tendency important in data analysis?**

They help to:

* Understand the center/typical value of a dataset.
* Compare different datasets.
* Make quick summaries of large data.
* Identify skewness or outliers.

Used in:

* Business KPIs, test scores, average revenue, etc.

---

### **4. What is the significance of the mean, median, and mode in understanding data?**

| Measure | Usefulness    | Sensitive to Outliers | Best Used When   |
| ------- | ------------- | --------------------- | ---------------- |
| Mean    | Shows average | Yes                   | Symmetric data   |
| Median  | Middle value  | No                    | Skewed data      |
| Mode    | Most frequent | No                    | Categorical data |

---

### **5. What is the difference between descriptive and inferential statistics? Provide examples.**

(Repeated, but restated concisely):

* **Descriptive**: E.g., “The average age of 500 students is 20.”
* **Inferential**: E.g., “Based on a survey of 500 people, we estimate 70% of voters support X.”

---

### **6. How do parametric and non-parametric statistics differ? When would you use each?**

| Criteria    | Parametric                           | Non-Parametric                |
| ----------- | ------------------------------------ | ----------------------------- |
| Assumptions | Requires distribution (e.g., normal) | No distribution assumptions   |
| Data type   | Interval/ratio                       | Ordinal/nominal or non-normal |
| Examples    | t-test, ANOVA                        | Mann-Whitney, Kruskal-Wallis  |

**Use non-parametric when:**

* Data isn’t normally distributed
* Sample size is small
* Data is ordinal

---

### **7. Explain qualitative vs. quantitative statistics with examples.**

* **Qualitative (Categorical):** Describes categories

  * Examples: Gender, Brand preference, Color
* **Quantitative (Numerical):** Measures quantities

  * Examples: Age, Salary, Height

---

### **8. When would you use inferential statistics instead of descriptive statistics?**

Use **inferential** when:

* You want to **generalize** from a sample to a population.
* You need to **test hypotheses** or **predict** outcomes.
* Descriptive is insufficient to draw decisions.

Example:

> You measure average app usage in 1,000 users to infer about 1 million users.

---

### **9. What are the different types of sampling techniques in statistics?**

1. **Simple Random Sampling**
2. **Systematic Sampling**
3. **Stratified Sampling**
4. **Cluster Sampling**
5. **Convenience Sampling**
6. **Snowball Sampling**

| Type       | Use Case                                |
| ---------- | --------------------------------------- |
| Random     | General surveys                         |
| Stratified | Divide population by groups (e.g., age) |
| Cluster    | When natural groups exist (cities)      |

---

### **10. Explain the difference between simple random sampling and stratified sampling.**

| Feature     | Simple Random Sampling      | Stratified Sampling                               |
| ----------- | --------------------------- | ------------------------------------------------- |
| Definition  | Every item has equal chance | Population divided into strata, sampled from each |
| When to Use | Homogeneous population      | Heterogeneous population                          |
| Example     | Pick 100 names randomly     | Pick 50 males, 50 females from respective groups  |

---

Great! Here are the detailed **answers with examples** for **Statistics Basics Interview Questions 11–20**, especially useful for roles at Flipkart, Swiggy, TCS, Amazon, and Infosys.

---

### **11. How does cluster sampling work, and when would you use it?**

**Definition:**

* Cluster sampling involves dividing the population into clusters (often based on geography or groups), randomly selecting **some clusters**, and collecting data **from all members** within those clusters.

**Example:**

* To survey schools in India, randomly select 5 states (clusters) and survey all schools in those states.

**When to Use:**

* When population is too large and scattered.
* When cost/time restrictions exist.

---

### **12. What are the advantages and disadvantages of systematic sampling?**

**Definition:**

* Select every *k-th* item from a population after a random start.

**Advantages:**

* Simple to implement.
* More evenly distributed samples than random.

**Disadvantages:**

* Not suitable if data has periodic patterns (can cause bias).
* Requires complete population list.

**Example:**

* Starting at person 3, select every 10th person in a queue.

---

### **13. What is the difference between nominal, ordinal, interval, and ratio data?**

| Type     | Example             | Order | Equal Interval | True Zero |
| -------- | ------------------- | ----- | -------------- | --------- |
| Nominal  | Gender, Color       | ❌     | ❌              | ❌         |
| Ordinal  | Rank, Satisfaction  | ✅     | ❌              | ❌         |
| Interval | Temperature (°C)    | ✅     | ✅              | ❌         |
| Ratio    | Age, Salary, Height | ✅     | ✅              | ✅         |

---

### **14. Explain the importance of identifying the data type before performing statistical analysis.**

**Why it's important:**

* Determines valid operations (e.g., mean only for numerical).
* Guides choice of plots (bar chart for categorical, histogram for continuous).
* Influences statistical test selection (t-test vs chi-square).

**Example:**

* You can't compute average for "City Names" (nominal).

---

### **15. How would you differentiate between continuous and discrete data?**

| Type       | Characteristics             | Examples                    |
| ---------- | --------------------------- | --------------------------- |
| Discrete   | Countable, finite values    | Number of students, cars    |
| Continuous | Measurable, infinite values | Height, weight, temperature |

**Tip:** Continuous data can be divided infinitely; discrete cannot.

---

### **16. Why is it crucial to know whether data is categorical or numerical in statistics?**

**Because:**

* It influences analysis method (e.g., mode for categorical, mean for numerical).
* Affects choice of statistical tests (ANOVA for numerical, chi-square for categorical).
* Drives visualization type (pie chart vs histogram).

---

### **17. Explain the concept of central tendency in statistics and its importance.**

**Definition:**

* Measures that represent the center of a dataset.

**Importance:**

* Gives a quick summary of data.
* Helps compare different groups.
* Supports decision-making and benchmarking.

**Key Measures:** Mean, Median, Mode

---

### **18. What are the three main measures of central tendency, and when would you use each?**

| Measure | Use When Data Is…            | Example                     |
| ------- | ---------------------------- | --------------------------- |
| Mean    | Symmetric, no outliers       | Average salary of engineers |
| Median  | Skewed or has outliers       | Household income            |
| Mode    | Categorical or most frequent | Most common product sold    |

---

### **19. When is the median a better measure than the mean?**

* When data is **skewed** or contains **outliers**.
* Median is not affected by extreme values.

**Example:**

* Income distribution: Mean = ₹40K, but one CEO earns ₹10 crore — use **median**.

---

### **20. How does mode help in understanding the distribution of data?**

* Shows the **most frequent** value(s).
* Helps understand patterns in **categorical** or **discrete** data.
* Useful in marketing, trend analysis, and product demand.

**Example:**

* In customer data, the mode of age group tells who buys most.

---

Perfect! Here's the continuation with **Interview Questions 21–30** focusing on **Python implementation of central tendency** and **measure of dispersion** — especially helpful for roles at Microsoft, Swiggy, Flipkart, TCS, Amazon, Uber, Oracle.

---

### **21. How do you calculate mean, median, and mode in Python using Pandas?**

```python
import pandas as pd
from scipy import stats

data = pd.Series([10, 20, 20, 30, 40, 50])

mean = data.mean()
median = data.median()
mode = data.mode()[0]

print(f"Mean: {mean}, Median: {median}, Mode: {mode}")
```

---

### **22. Write a Python code to compute the mean and median from a list of numbers.**

```python
import statistics

nums = [1, 2, 3, 4, 5, 5]

mean = statistics.mean(nums)
median = statistics.median(nums)

print(f"Mean: {mean}, Median: {median}")
```

---

### **23. How can you handle missing data when calculating central tendency in Python?**

**Using Pandas:**

```python
import pandas as pd
data = pd.Series([1, 2, None, 4, 5])

mean = data.mean(skipna=True)
median = data.median(skipna=True)

print(f"Mean: {mean}, Median: {median}")
```

You can also fill missing values:

```python
data_filled = data.fillna(data.mean())
```

---

### **24. Explain how to use the `scipy.stats` module to find the mode in Python.**

```python
from scipy import stats

data = [1, 2, 2, 3, 4, 4, 4, 5]
mode_result = stats.mode(data, keepdims=True)

print(f"Mode: {mode_result.mode[0]}, Count: {mode_result.count[0]}")
```

---

### **25. What is the significance of measuring dispersion in data analysis?**

* It shows **how spread out** the data values are.
* Helps assess **data consistency** and **risk**.
* Crucial in comparing different datasets even if they have the same average.

**Example:**
Two classes may have same average marks but very different variance.

---

### **26. Explain the difference between range, variance, and standard deviation.**

| Metric             | Description                   |
| ------------------ | ----------------------------- |
| Range              | Max - Min                     |
| Variance           | Average of squared deviations |
| Standard Deviation | Square root of variance       |

**Note:** SD is in the same unit as original data, variance is not.

---

### **27. Why is it important to know the variability in a dataset?**

* To identify **data stability**.
* Understand **risk, consistency**, or **deviation** from the norm.
* High variability may indicate **unpredictability** or **noise**.

---

### **28. How do you interpret a large standard deviation in a dataset?**

* Indicates that data points are **widely spread** from the mean.
* Possible presence of **outliers** or high variability.
* Low SD means data is **clustered near the average**.

---

### **29. How do you compute variance and standard deviation in a dataset?**

**Using Pandas:**

```python
import pandas as pd

data = pd.Series([10, 20, 30, 40, 50])
variance = data.var()
std_dev = data.std()

print(f"Variance: {variance}, Standard Deviation: {std_dev}")
```

**Note:** By default, Pandas calculates **sample variance (n-1)**.

---

### **30. What does a higher standard deviation indicate about your data?**

* Greater **spread**, more **variability**.
* Less **predictability**.
* In business: Higher SD in revenue may mean **unstable income**.

---






In [None]:
import pandas as pd

data = pd.Series([10, 20, 30, 40, 50])
variance = data.var()
std_dev = data.std()

print(f"Variance: {variance}, Standard Deviation: {std_dev}")


Variance: 250.0, Standard Deviation: 15.811388300841896


In [None]:
from scipy import stats

data = [1, 2, 2, 3, 4, 4, 4, 5]
mode_result = stats.mode(data, keepdims=True)

print(f"Mode: {mode_result.mode[0]}, Count: {mode_result.count[0]}")


Mode: 4, Count: 3


In [None]:
import pandas as pd
data = pd.Series([1, 2, None, 4, 5])

mean = data.mean(skipna=True)
median = data.median(skipna=True)

print(f"Mean: {mean}, Median: {median}")
data_filled = data.fillna(data.mean())


Mean: 3.0, Median: 3.0


In [None]:
import statistics

nums = [1, 2, 3, 4, 5, 5]

mean = statistics.mean(nums)
median = statistics.median(nums)

print(f"Mean: {mean}, Median: {median}")


Mean: 3.3333333333333335, Median: 3.5


In [None]:
import pandas as pd
from scipy import stats

data = pd.Series([10, 20, 20, 30, 40, 50])

mean = data.mean()
median = data.median()
mode = data.mode()[0]

print(f"Mean: {mean}, Median: {median}, Mode: {mode}")


Mean: 28.333333333333332, Median: 25.0, Mode: 20


Great! Let’s now cover **Interview Questions 31–40**, focusing on **Python implementation of variance**, **standard deviation**, and **Skewness concepts** — important for Data Science and Analyst roles at companies like Flipkart, Amazon, Microsoft, Paytm, Uber, and Oracle.

---

### **31. How does variance differ from standard deviation?**

| Aspect           | Variance                      | Standard Deviation         |
| ---------------- | ----------------------------- | -------------------------- |
| Definition       | Average of squared deviations | Square root of variance    |
| Unit             | Squared unit of original data | Same unit as original data |
| Interpretability | Harder to interpret           | Easier to understand       |
| Use Case         | Often intermediate step       | Directly used in analysis  |

---

### **32. Why is the standard deviation often preferred over the variance in interpreting data?**

* Because it is in the **same unit** as the data.
* Makes it **intuitive and easier to communicate**.
* For example, saying “the average deviation is 15°C” is more understandable than “225 (°C²)”.

---

### **33. Why is `n-1` used for sample variance?**

* Called **Bessel’s correction**.
* It provides an **unbiased estimate** of population variance from a sample.
* Dividing by `n` **underestimates** true variance.
* Using `n-1` **corrects** this by slightly increasing the value.

---

### **34. Write Python code to calculate the variance of a given list of numbers.**

```python
import statistics

data = [10, 20, 30, 40, 50]
sample_variance = statistics.variance(data)
print("Sample Variance:", sample_variance)
```

---

### **35. How would you use NumPy to compute the variance and standard deviation of an array in Python?**

```python
import numpy as np

data = np.array([10, 20, 30, 40, 50])

variance = np.var(data, ddof=1)  # ddof=1 for sample variance
std_dev = np.std(data, ddof=1)

print(f"Variance: {variance}, Standard Deviation: {std_dev}")
```

---

### **36. What is the difference between `np.var()` and `np.std()` in Python?**

* `np.var()` → returns **variance**
* `np.std()` → returns **standard deviation**
* Both accept `ddof=0` (population) or `ddof=1` (sample)

```python
np.var(data)     # variance
np.std(data)     # standard deviation
```

---

### **37. How do you handle missing values when calculating variance in Python?**

```python
import pandas as pd
data = pd.Series([10, 20, None, 40, 50])

variance = data.var(skipna=True)
std_dev = data.std(skipna=True)

print(f"Variance: {variance}, Standard Deviation: {std_dev}")
```

Or use:

```python
data = data.fillna(data.mean())  # Impute missing values
```

---

## 🧭 **Skewness (Measure of Symmetry)**

---

### **38. Explain the concept of skewness in statistics and its importance in understanding data distribution.**

* Skewness measures **asymmetry** of data distribution.
* Important to detect **outliers** and **data bias**.
* Skewness ≈ 0 → symmetric data.
* Positive skew → right-tailed (e.g., income).
* Negative skew → left-tailed (e.g., test scores).

---

### **39. What are the different types of skewness, and how do you identify them?**

| Type              | Description          | Visual Shape       |
| ----------------- | -------------------- | ------------------ |
| Symmetrical       | Mean = Median = Mode | Bell curve         |
| Positively skewed | Mean > Median > Mode | Long tail on right |
| Negatively skewed | Mean < Median < Mode | Long tail on left  |

---

### **40. How would you interpret a positively skewed distribution?**

* Most values are **concentrated on the left**, few large outliers on the right.
* Example: income distribution where **most earn low, few earn very high**.
* Mean is **greater than** median.

---

### **41. How do you calculate skewness using Python libraries like Pandas or Scipy?**

```python
import pandas as pd
from scipy.stats import skew

data = pd.Series([10, 12, 13, 15, 22, 27, 70])

# Using Pandas
print("Skewness (Pandas):", data.skew())

# Using SciPy
print("Skewness (SciPy):", skew(data))
```

---




In [None]:
import pandas as pd
from scipy.stats import skew

data = pd.Series([10, 12, 13, 15, 22, 27, 70])

# Using Pandas
print("Skewness (Pandas):", data.skew())

# Using SciPy
print("Skewness (SciPy):", skew(data))


Skewness (Pandas): 2.2279999009531055
Skewness (SciPy): 1.718939242161366


In [None]:
import pandas as pd
data = pd.Series([10, 20, None, 40, 50])

variance = data.var(skipna=True)
std_dev = data.std(skipna=True)

print(f"Variance: {variance}, Standard Deviation: {std_dev}")
data = data.fillna(data.mean())  # Impute missing values


Variance: 333.3333333333333, Standard Deviation: 18.257418583505537


In [None]:
import numpy as np

data = np.array([10, 20, 30, 40, 50])

variance = np.var(data, ddof=1)  # ddof=1 for sample variance
std_dev = np.std(data, ddof=1)

print(f"Variance: {variance}, Standard Deviation: {std_dev}")


Variance: 250.0, Standard Deviation: 15.811388300841896


In [None]:
import statistics

data = [10, 20, 30, 40, 50]
sample_variance = statistics.variance(data)
print("Sample Variance:", sample_variance)


Sample Variance: 250


In [None]:
import pandas as pd
from scipy.stats import skew

data = pd.Series([10, 12, 13, 15, 22, 27, 70])

# Using Pandas
print("Skewness (Pandas):", data.skew())

# Using SciPy
print("Skewness (SciPy):", skew(data))


Skewness (Pandas): 2.2279999009531055
Skewness (SciPy): 1.718939242161366


In [None]:
import pandas as pd
data = pd.Series([10, 20, None, 40, 50])

variance = data.var(skipna=True)
std_dev = data.std(skipna=True)

print(f"Variance: {variance}, Standard Deviation: {std_dev}")
data = data.fillna(data.mean())  # Impute missing values


Variance: 333.3333333333333, Standard Deviation: 18.257418583505537


Great! Let’s now move on to **Interview Questions 42–48**, which focus on **Set theory and its application in statistics and Python**. These are important for **Data Engineer** and **Data Analyst** roles at companies like **Google** and **TCS**.

---

## 📘 **Set Theory in Statistics**

---

### **42. Explain the difference between union, intersection, and complement of sets in statistics.**

* **Union (A ∪ B):** Combines elements from both sets A and B.
  → Example: Students who play football **or** cricket.

* **Intersection (A ∩ B):** Only includes common elements in both A and B.
  → Example: Students who play both football **and** cricket.

* **Complement (A′):** Includes all elements **not** in set A.
  → Example: Students who **do not** play football.

---

### **43. Why are sets important in understanding probability distributions?**

* They define **events** in probability.
* Used in operations like:

  * **P(A ∪ B)**: Probability of either event A or B.
  * **P(A ∩ B)**: Joint probability.
* Foundation of **Venn diagrams**, **sample space**, and **event algebra**.

---

### **44. How are sets used in real-life data science problems?**

* **Filtering** data by attributes.
* **Set operations** to find overlaps in user behavior (e.g., users who bought X and Y).
* **Deduplication** and **grouping**.
* **Segmentation**: Marketing lists, fraud detection, etc.

---

### **45. How do you create and manipulate sets in Python using the built-in `set()` function?**

```python
a = set([1, 2, 3, 4])
b = set([3, 4, 5, 6])

print("Union:", a | b)
print("Intersection:", a & b)
print("Difference:", a - b)
print("Symmetric Difference:", a ^ b)
```

---

### **46. Write a Python code to compute the union and intersection of two sets.**

```python
set1 = {10, 20, 30, 40}
set2 = {30, 40, 50, 60}

union_set = set1.union(set2)
intersection_set = set1.intersection(set2)

print("Union:", union_set)
print("Intersection:", intersection_set)
```

---

### **47. Explain how to remove duplicates from a list using Python sets.**

```python
data = [1, 2, 2, 3, 4, 4, 5]
unique_data = list(set(data))
print("Unique items:", unique_data)
```

---

### **48. How do you check if one set is a subset of another in Python?**

```python
a = {1, 2, 3}
b = {1, 2, 3, 4, 5}

print("Is a subset of b?", a.issubset(b))
```

---




In [None]:
a = {1, 2, 3}
b = {1, 2, 3, 4, 5}

print("Is a subset of b?", a.issubset(b))


Is a subset of b? True


In [None]:
data = [1, 2, 2, 3, 4, 4, 5]
unique_data = list(set(data))
print("Unique items:", unique_data)


Unique items: [1, 2, 3, 4, 5]


In [None]:
set1 = {10, 20, 30, 40}
set2 = {30, 40, 50, 60}

union_set = set1.union(set2)
intersection_set = set1.intersection(set2)

print("Union:", union_set)
print("Intersection:", intersection_set)


Union: {40, 10, 50, 20, 60, 30}
Intersection: {40, 30}


In [None]:
a = set([1, 2, 3, 4])
b = set([3, 4, 5, 6])

print("Union:", a | b)
print("Intersection:", a & b)
print("Difference:", a - b)
print("Symmetric Difference:", a ^ b)


Union: {1, 2, 3, 4, 5, 6}
Intersection: {3, 4}
Difference: {1, 2}
Symmetric Difference: {1, 2, 5, 6}


Excellent! Let's now cover **Questions 49–56**, focusing on **Covariance and Correlation** — crucial for roles like **Data Scientist** and **Business Analyst** at companies such as **Google**, **Uber**, **Flipkart**, and **Amazon**.

---

## 📘 **Covariance & Correlation: Theory + Python**

---

### **49. Explain the difference between covariance and correlation.**

| Feature    | Covariance                                  | Correlation                                                   |
| ---------- | ------------------------------------------- | ------------------------------------------------------------- |
| Definition | Measures how two variables change together. | Measures both strength **and direction** of the relationship. |
| Scale      | Not standardized (depends on units).        | Standardized (range: -1 to +1).                               |
| Value      | Can be any number.                          | Always between -1 and 1.                                      |

---

### **50. When would you use covariance instead of correlation?**

* When you only want to know **whether two variables move together** (positive or negative), **without needing the strength** or normalization.
* Used in **portfolio theory** in finance.

---

### **51. Why is correlation preferred over covariance in many cases?**

* It’s **unit-free** and **easier to interpret**.
* Helps in comparing relationships across different pairs of variables, especially in **feature selection**.

---

### **52. How does correlation help in understanding relationships between variables?**

* Indicates:

  * **Strength** of the relationship (closer to ±1 = stronger).
  * **Direction** of the relationship (positive or negative).
* Useful in:

  * **Multicollinearity detection**
  * **Predictive modeling**
  * **Exploratory Data Analysis (EDA)**

---

## 🐍 **Python Code: Covariance & Correlation**

---

### **53. Write Python code to calculate the covariance between two variables using NumPy.**

```python
import numpy as np

x = np.array([2, 4, 6, 8])
y = np.array([1, 3, 5, 7])

cov_matrix = np.cov(x, y)
print("Covariance matrix:\n", cov_matrix)
print("Covariance (x & y):", cov_matrix[0][1])
```

---

### **54. How would you calculate Pearson correlation in Python using Pandas?**

```python
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 5, 6, 7]
})

corr = df['A'].corr(df['B'], method='pearson')
print("Pearson Correlation:", corr)
```

---

### **55. Explain how to create a correlation matrix in Python using Pandas.**

```python
import pandas as pd

data = {
    'X': [1, 2, 3, 4],
    'Y': [2, 4, 6, 8],
    'Z': [5, 3, 1, 0]
}

df = pd.DataFrame(data)
print("Correlation Matrix:\n", df.corr())
```

---

### **56. What is the difference between Pearson and Spearman correlation, and how do you implement both in Python?**

* **Pearson**: Assumes linear relationship, affected by outliers.
* **Spearman**: Rank-based, good for non-linear relationships and robust to outliers.

```python
import pandas as pd

df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [15, 25, 35, 45]
})

# Pearson
print("Pearson Correlation:", df['A'].corr(df['B'], method='pearson'))

# Spearman
print("Spearman Correlation:", df['A'].corr(df['B'], method='spearman'))
```

---




In [None]:
import pandas as pd

df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [15, 25, 35, 45]
})

# Pearson
print("Pearson Correlation:", df['A'].corr(df['B'], method='pearson'))

# Spearman
print("Spearman Correlation:", df['A'].corr(df['B'], method='spearman'))


Pearson Correlation: 1.0
Spearman Correlation: 1.0


In [None]:
import pandas as pd

data = {
    'X': [1, 2, 3, 4],
    'Y': [2, 4, 6, 8],
    'Z': [5, 3, 1, 0]
}

df = pd.DataFrame(data)
print("Correlation Matrix:\n", df.corr())


Correlation Matrix:
           X         Y         Z
X  1.000000  1.000000 -0.989778
Y  1.000000  1.000000 -0.989778
Z -0.989778 -0.989778  1.000000


In [None]:
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 5, 6, 7]
})

corr = df['A'].corr(df['B'], method='pearson')
print("Pearson Correlation:", corr)


Pearson Correlation: 1.0


In [None]:
import numpy as np

x = np.array([2, 4, 6, 8])
y = np.array([1, 3, 5, 7])

cov_matrix = np.cov(x, y)
print("Covariance matrix:\n", cov_matrix)
print("Covariance (x & y):", cov_matrix[0][1])


Covariance matrix:
 [[6.66666667 6.66666667]
 [6.66666667 6.66666667]]
Covariance (x & y): 6.666666666666666
