# Statistics Basics

1. What is statistics, and why is it important?
-  Statistics is the science of collecting, analyzing, interpreting, and presenting data. It is important because it helps in making informed decisions, identifying patterns, and predicting future trends.

2. What are the two main types of statistics?
-  Descriptive statistics and inferential statistics.

3. What are descriptive statistics?
-  Descriptive statistics summarize and describe the main features of a dataset through measures like mean, median, mode, variance, and visualizations.

4. What is inferential statistics?
-  Inferential statistics use data from a sample to make generalizations, predictions, or inferences about a population.

5. What is sampling in statistics?
-  Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

6. What are the different types of sampling methods?
-  Simple random sampling, stratified sampling, systematic sampling, cluster sampling, and convenience sampling.

7. What is the difference between random and non-random sampling?
-  Random sampling gives every member of the population an equal chance of selection, while non-random sampling does not.

8. Define and give examples of qualitative and quantitative data.
-  Qualitative data describes categories or qualities (e.g., colors, names). Quantitative data represents numerical values (e.g., height, weight).

9. What are the different types of data in statistics?
-  Nominal, ordinal, interval, and ratio data.

10. Explain nominal, ordinal, interval, and ratio levels of measurement.
- - Nominal: categories without order (e.g., colors).
 - Ordinal: ordered categories (e.g., rankings).
 - Interval: numeric scales with equal intervals but no true zero (e.g., temperature in °C).
 - Ratio: numeric scales with equal intervals and a true zero (e.g., weight).

11. What is the measure of central tendency?
-  It is a measure that identifies the center or typical value of a dataset.

12. Define mean, median, and mode.
- - Mean: average of values.
 - Median: middle value when data is ordered.
 - Mode: most frequent value.

13. What is the significance of the measure of central tendency?
-  It provides a single value that represents the entire dataset, useful for comparison and summarization.

14. What is variance, and how is it calculated?
-  Variance measures the average squared deviation from the mean. Calculated as the sum of squared differences from the mean divided by the number of observations (or n-1 for samples).

15. What is standard deviation, and why is it important?
-  Standard deviation is the square root of variance and measures how spread out the values are. It is important for understanding variability in data.

16. Define and explain the term range in statistics.
-  Range is the difference between the maximum and minimum values in a dataset.

17. What is the difference between variance and standard deviation?
-  Variance is in squared units; standard deviation is in the original units of the data.

18. What is skewness in a dataset?
-  Skewness measures the asymmetry of the data distribution.

19. What does it mean if a dataset is positively or negatively skewed?
-  Positive skew: tail on the right side is longer.
Negative skew: tail on the left side is longer.

20. Define and explain kurtosis.
-  Kurtosis measures the "tailedness" of the distribution compared to a normal distribution.

21. What is the purpose of covariance?
-  To measure how two variables change together.

22. What does correlation measure in statistics?
-  It measures the strength and direction of the relationship between two variables.

23. What is the difference between covariance and correlation?
-  Covariance measures joint variability; correlation standardizes it to a range between -1 and 1.

24. What are some real-world applications of statistics?
-  Business forecasting, quality control, medical research, market analysis, sports performance analysis.

#  Practical

1. How do you calculate the mean, median, and mode of a dataset.

In [None]:
import numpy as np
from scipy import stats

data = [12, 15, 20, 22, 25, 28, 30]
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True).mode[0]
print(mean, median, mode)


2. Write a Python program to compute the variance and standard deviation of a dataset.

In [None]:
variance = np.var(data, ddof=1)  # sample variance
std_dev = np.std(data, ddof=1)   # sample std deviation
print(variance, std_dev)


3. Create a dataset and classify it into nominal, ordinal, interval, and ratio types.

In [None]:
nominal = ["Red", "Blue", "Green"]      # categories
ordinal = ["Small", "Medium", "Large"]  # ranked order
interval = [10, 20, 30]                 # equal intervals, no true zero
ratio = [5, 10, 15]                      # equal intervals, true zero
print(nominal, ordinal, interval, ratio)


4. Implement sampling techniques like random sampling and stratified sampling.

In [None]:
import pandas as pd

df = pd.DataFrame({'id': range(1, 11), 'group': ['A','B']*5})
random_sample = df.sample(n=4)
stratified_sample = df.groupby('group', group_keys=False).apply(lambda x: x.sample(2))
print(random_sample, stratified_sample)


5. Write a Python function to calculate the range of a dataset.

In [None]:
def data_range(x):
    return max(x) - min(x)
print(data_range(data))


6. Create a dataset and plot its histogram to visualize skewness.

In [None]:
import matplotlib.pyplot as plt
plt.hist(data, bins=5, edgecolor='black')
plt.show()


 7. Calculate skewness and kurtosis of a dataset using Python libraries.

In [None]:
print(stats.skew(data))
print(stats.kurtosis(data))


8. Generate a dataset and demonstrate positive and negative skewness.

In [None]:
pos_skew = np.random.exponential(size=1000)
neg_skew = -np.random.exponential(size=1000)
plt.hist(pos_skew, bins=30); plt.show()
plt.hist(neg_skew, bins=30); plt.show()


9. Write a Python script to calculate covariance between two datasets.

In [None]:
data2 = [5, 7, 12, 14, 18, 20, 24]
cov = np.cov(data, data2)[0, 1]
print(cov)


10. Write a Python script to calculate the correlation coefficient between two datasets.

In [None]:
corr = np.corrcoef(data, data2)[0, 1]
print(corr)


11. Create a scatter plot to visualize the relationship between two variables.

In [None]:
plt.scatter(data, data2)
plt.xlabel("Data 1")
plt.ylabel("Data 2")
plt.show()


12. Implement and compare simple random sampling and systematic sampling.

In [None]:
random_sample = np.random.choice(data, size=3, replace=False)
systematic_sample = data[::2]
print(random_sample, systematic_sample)


13. Calculate the mean, median, and mode of grouped data.

In [None]:

df_grouped = pd.DataFrame({'values': data})
print(df_grouped['values'].mean(), df_grouped['values'].median(), df_grouped['values'].mode()[0])


14. Simulate data using Python and calculate its central tendency and dispersion.

In [None]:
sim_data = np.random.randint(1, 100, 20)
print(np.mean(sim_data), np.median(sim_data), stats.mode(sim_data, keepdims=True).mode[0])
print(np.var(sim_data, ddof=1), np.std(sim_data, ddof=1))


15. Use NumPy or pandas to summarize a dataset’s descriptive statistics.

In [None]:
df = pd.DataFrame({'Data': data})
print(df.describe())


16. Plot a boxplot to understand the spread and identify outliers.

In [None]:
import seaborn as sns
sns.boxplot(x=data)
plt.show()


17. Calculate the interquartile range (IQR) of a dataset.

In [None]:

Q1, Q3 = np.percentile(data, [25, 75])
IQR = Q3 - Q1
print(IQR)


18. Implement Z-score normalization and explain its significance.

In [None]:
z_scores = stats.zscore(data)
print(z_scores)


19. Compare two datasets using their standard deviations.

In [None]:
print(np.std(data, ddof=1), np.std(data2, ddof=1))


20. Write a Python program to visualize covariance using a heatmap.

In [None]:

cov_matrix = np.cov(data, data2)
sns.heatmap(cov_matrix, annot=True, cmap="coolwarm")
plt.show()


21. Use seaborn to create a correlation matrix for a dataset.

In [None]:
df_corr = pd.DataFrame({'Data1': data, 'Data2': data2})
sns.heatmap(df_corr.corr(), annot=True, cmap="coolwarm")
plt.show()


22. Generate a dataset and implement both variance and standard deviation computations.

In [None]:
rand_data = np.random.randint(1, 50, 10)
print(np.var(rand_data, ddof=1), np.std(rand_data, ddof=1))


23. Visualize skewness and kurtosis using Python libraries like matplotlib or seaborn.

In [None]:
sns.histplot(data, kde=True)
plt.show()
print(stats.skew(data), stats.kurtosis(data))


24. Implement the Pearson and Spearman correlation coefficients for a dataset.

In [None]:
pearson_corr, _ = stats.pearsonr(data, data2)
spearman_corr, _ = stats.spearmanr(data, data2)
print(pearson_corr, spearman_corr)
