                                        # Practice Theory Question

### 1. What is statistics, and why is it important?

Statistics is the science of collecting, analyzing, interpreting, and presenting data. It helps in making informed decisions using data insights.

### 2. What are the two main types of statistics?

1. Descriptive Statistics: Summarize and describe data.
2. Inferential Statistics: Make predictions or inferences about a population based on a sample.

### 3. What are descriptive statistics?

Descriptive statistics summarize features of a dataset using measures such as mean, median, mode, range, and standard deviation.

### 4. What is inferential statistics?

Inferential statistics use a random sample of data to make estimates, decisions, or predictions about a population.

### 5. What is sampling in statistics?

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

### 6. What are the different types of sampling methods?

1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling
5. Convenience Sampling

### 7. What is the difference between random and non-random sampling?

Random sampling ensures each member has an equal chance of selection. Non-random sampling relies on subjective methods like convenience or judgment.

### 8. Define and give examples of qualitative and quantitative data

Qualitative: Descriptive (e.g., colors, names)
Quantitative: Numerical (e.g., height, weight)

### 9. What are the different types of data in statistics?

1. Qualitative (Categorical)
2. Quantitative (Numerical)
   - Discrete
   - Continuous

### 10. Explain nominal, ordinal, interval, and ratio levels of measurement

1. Nominal: Categories with no order (e.g., gender)
2. Ordinal: Categories with order (e.g., ranks)
3. Interval: Ordered, equal spacing, no true zero (e.g., temperature)
4. Ratio: Like interval + true zero (e.g., weight)

### 11. What is the measure of central tendency?

It refers to the central point of a dataset: mean, median, and mode.

### 12. Define mean, median, and mode

Mean: Average value
Median: Middle value
Mode: Most frequent value

### 13. What is the significance of the measure of central tendency?

It gives a summary of data distribution and is useful for comparison and decision-making.

### 14. What is variance, and how is it calculated?

Variance measures how far data values are from the mean. Formula: Var(X) = Σ(xi - mean)² / n

### 15. What is standard deviation, and why is it important?

Standard deviation shows how spread out the values are. It’s the square root of variance and helps understand variability.

### 16. Define and explain the term range in statistics

Range is the difference between the highest and lowest values in a dataset.

### 17. What is the difference between variance and standard deviation?

Variance is the average squared deviation, while standard deviation is its square root, giving dispersion in original units.

### 18. What is skewness in a dataset?

Skewness measures the asymmetry of the data distribution. It can be positive, negative, or zero.

### 19. What does it mean if a dataset is positively or negatively skewed?

Positive skew: Tail on the right, Mean > Median
Negative skew: Tail on the left, Mean < Median

### 20. Define and explain kurtosis

Kurtosis measures the 'tailedness' of a distribution. High kurtosis = heavy tails; low = light tails.

### 21. What is the purpose of covariance?

Covariance shows the direction of the relationship between two variables.

### 22. What does correlation measure in statistics?

Correlation measures the strength and direction of a linear relationship between two variables.

### 23. What is the difference between covariance and correlation?

Covariance indicates direction only; correlation standardizes it, showing both strength and direction (-1 to +1).

### 24. What are some real-world applications of statistics?

Statistics are used in healthcare, business, sports, politics, economics, machine learning, and more for data-driven decisions.

                                    # Practical Question

### 1. How do you calculate the mean, median, and mode of a dataset?

In [None]:
import numpy as np
from scipy import stats

data = [12, 15, 12, 18, 19, 21, 24, 24, 30, 34, 36, 36, 38, 40]
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data, keepdims=True).mode[0])

### 2. Write a Python program to compute the variance and standard deviation of a dataset.

In [None]:
print("Variance:", np.var(data, ddof=1))
print("Standard Deviation:", np.std(data, ddof=1))

### 3. Create a dataset and classify it into nominal, ordinal, interval, and ratio types.

In [None]:
print("""
Nominal: Colors ['Red', 'Blue', 'Green']
Ordinal: Ratings ['Poor', 'Average', 'Good', 'Excellent']
Interval: Temperature in Celsius [20, 25, 30]
Ratio: Weight in kg [55.5, 60.2, 70.0]
""")

### 4. Implement sampling techniques like random sampling and stratified sampling.

In [None]:
import pandas as pd
population = np.arange(1, 101)
random_sample = np.random.choice(population, size=10, replace=False)
print("Random Sample:", random_sample)
df = pd.DataFrame({'ID': range(1, 101), 'Group': np.random.choice(['A', 'B'], 100)})
stratified_sample = df.groupby('Group', group_keys=False).apply(lambda x: x.sample(5))
print("Stratified Sample:\n", stratified_sample)

### 5. Write a Python function to calculate the range of a dataset.

In [None]:
print("Range:", np.max(data) - np.min(data))

### 6. Create a dataset and plot its histogram to visualize skewness.

In [None]:
import matplotlib.pyplot as plt
plt.hist(data, bins=10)
plt.title("Histogram")
plt.show()

### 7. Calculate skewness and kurtosis of a dataset using Python libraries.

In [None]:
print("Skewness:", stats.skew(data))
print("Kurtosis:", stats.kurtosis(data))

### 8. Generate a dataset and demonstrate positive and negative skewness.

In [None]:
import seaborn as sns
pos_skew = [1, 2, 2, 3, 4, 10]
neg_skew = [10, 6, 5, 4, 3, 2, 1]
sns.histplot(pos_skew, kde=True)
plt.title("Positive Skew")
plt.show()
sns.histplot(neg_skew, kde=True)
plt.title("Negative Skew")
plt.show()

### 9. Write a Python script to calculate covariance between two datasets.

In [None]:
data2 = np.random.randint(10, 50, len(data))
print("Covariance:", np.cov(data, data2))

### 10. Write a Python script to calculate the correlation coefficient between two datasets.

In [None]:
print("Correlation Coefficient:", np.corrcoef(data, data2)[0, 1])

### 11. Create a scatter plot to visualize the relationship between two variables.

In [None]:
plt.scatter(data, data2)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

### 12. Implement and compare simple random sampling and systematic sampling.

In [None]:
simple_random = np.random.choice(population, 10, replace=False)
systematic = population[::10]
print("Simple Random:", simple_random)
print("Systematic:", systematic)

### 13. Calculate the mean, median, and mode of grouped data.

In [None]:
grouped = {'Class Interval': ['0-10', '10-20', '20-30'], 'Frequency': [5, 10, 15]}
df = pd.DataFrame(grouped)
midpoints = [5, 15, 25]
freq = df['Frequency']
print("Grouped Mean:", np.average(midpoints, weights=freq))

### 14. Simulate data using Python and calculate its central tendency and dispersion.

In [None]:
sim_data = np.random.normal(50, 10, 1000)
print("Mean:", np.mean(sim_data))
print("Median:", np.median(sim_data))
print("Mode:", stats.mode(sim_data, keepdims=True).mode[0])
print("STD:", np.std(sim_data, ddof=1))

### 15. Use NumPy or pandas to summarize a dataset’s descriptive statistics.

In [None]:
df_summary = pd.DataFrame({'Data': data})
print(df_summary.describe())

### 16. Plot a boxplot to understand the spread and identify outliers.

In [None]:
sns.boxplot(data)
plt.title("Boxplot")
plt.show()

### 17. Calculate the interquartile range (IQR) of a dataset.

In [None]:
q75, q25 = np.percentile(data, [75 ,25])
print("IQR:", q75 - q25)

### 18. Implement Z-score normalization and explain its significance.

In [None]:
z_scores = stats.zscore(data)
print("Z-Scores:", z_scores)

### 19. Compare two datasets using their standard deviations.

In [None]:
data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 20, 100)
print("STD Dataset 1:", np.std(data1, ddof=1))
print("STD Dataset 2:", np.std(data2, ddof=1))

### 20. Write a Python program to visualize covariance using a heatmap.

In [None]:
df_cov = pd.DataFrame({'A': data, 'B': data2})
sns.heatmap(df_cov.cov(), annot=True)
plt.title("Covariance Heatmap")
plt.show()

### 21. Use seaborn to create a correlation matrix for a dataset.

In [None]:
sns.heatmap(df_cov.corr(), annot=True)
plt.title("Correlation Matrix")
plt.show()

### 22. Visualize skewness and kurtosis using Python libraries like matplotlib or seaborn.

In [None]:
sns.histplot(data, kde=True)
plt.title("Skewness and Kurtosis Visualization")
plt.show()

### 23. Implement the Pearson and Spearman correlation coefficients for a dataset.

In [None]:
df_corr = pd.DataFrame({'X': np.random.rand(100), 'Y': np.random.rand(100)})
print("Pearson:\n", df_corr.corr(method='pearson'))
print("Spearman:\n", df_corr.corr(method='spearman'))