1. What is statistics, and why is it important?
   Statistics is the study of collecting, analyzing, interpreting, presenting, and organizing data. It's important for making informed decisions based on data.

2. What are the two main types of statistics?
   Descriptive statistics and inferential statistics.

3. **What are descriptive statistics?**
   Descriptive statistics summarize and describe the features of a dataset (e.g., mean, median, mode).

4. What is inferential statistics?
   Inferential statistics use a sample to make predictions or generalizations about a population.

5. What is sampling in statistics?
   Sampling is selecting a subset of individuals from a population to estimate characteristics of the whole group.

6. What are the different types of sampling methods?
   Random, stratified, systematic, cluster, and convenience sampling.

7. What is the difference between random and non-random sampling?
   Random sampling gives every individual an equal chance; non-random sampling does not.

8.Define and give examples of qualitative and quantitative data.

   * Qualitative: Categorical data (e.g., colors, names).
   * Quantitative: Numerical data (e.g., height, weight).

9. What are the different types of data in statistics?
   Nominal, ordinal, interval, and ratio.

10. Explain nominal, ordinal, interval, and ratio levels of measurement.

* Nominal: Categories (e.g., gender).
* Ordinal: Order matters (e.g., ranks).
* Interval: Equal spacing, no true zero (e.g., temperature).
* Ratio: Has a true zero (e.g., age, height).

11. What is the measure of central tendency?
    It indicates the center or average of a dataset.

12. Define mean, median, and mode.

* Mean: Average of values.
* Median: Middle value.
* Mode: Most frequent value.

13.What is the significance of the measure of central tendency?
    It helps understand the typical value in a dataset.

14. What is variance, and how is it calculated?
    Variance measures data spread; it's the average of squared differences from the mean.

15. What is standard deviation, and why is it important?
    Standard deviation is the square root of variance; it shows how spread out the data is.

16. Define and explain the term range in statistics.
    Range is the difference between the highest and lowest values.

17. What is the difference between variance and standard deviation?
    Variance is squared units; standard deviation is in original units

18. What is skewness in a dataset?
    Skewness measures the asymmetry of the data distribution.

19. What does it mean if a dataset is positively or negatively skewed?

* Positively skewed: Tail on the right.
* Negatively skewed: Tail on the left.

20. Define and explain kurtosis.
    Kurtosis measures the "tailedness" of the distribution (sharpness of peak).

21. What is the purpose of covariance?
    Covariance shows how two variables change together.

22. What does correlation measure in statistics?
    Correlation measures the strength and direction of a linear relationship between two variables.

23. What is the difference between covariance and correlation?
    Covariance is unstandardized; correlation is standardized (ranges from -1 to 1).

24. What are some real-world applications of statistics?
    Medicine, economics, education, business, government policies, sports analytics, and more.


In [None]:
#1. How do you calculate the mean, median, and mode of a dataset
import statistics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.model_selection import train_test_split

data = [5, 8, 6, 8, 10, 2, 8]
mean = statistics.mean(data)
median = statistics.median(data)
mode = statistics.mode(data)
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode)

#2. Compute the variance and standard deviation of a dataset
variance = statistics.variance(data)
stdev = statistics.stdev(data)
print("Variance:", variance)
print("Standard Deviation:", stdev)

#3. Create a dataset and classify into nominal, ordinal, interval, and ratio
print("\n#3 Data Types:")
print("Nominal: ['Red', 'Blue', 'Green']")
print("Ordinal: ['Low', 'Medium', 'High']")
print("Interval: [10°C, 20°C, 30°C] (No true zero)")
print("Ratio: [5kg, 10kg, 15kg] (Has true zero)")

#4. Implement random and stratified sampling
df = pd.DataFrame({'Gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],
                   'Score': [88, 92, 85, 95, 90, 91, 87, 93]})
random_sample = df.sample(n=4, random_state=1)
stratified_sample = df.groupby('Gender', group_keys=False).apply(lambda x: x.sample(2))
print("\nRandom Sample:\n", random_sample)
print("Stratified Sample:\n", stratified_sample)

#5. Function to calculate range
def calculate_range(data):
    return max(data) - min(data)
print("\nRange:", calculate_range(data))

#6. Plot histogram to visualize skewness
skewed_data = [1, 2, 2, 3, 3, 3, 4, 5, 6, 20]
plt.hist(skewed_data, bins=10)
plt.title("Histogram to Visualize Skewness")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

#7. Calculate skewness and kurtosis
print("Skewness:", stats.skew(skewed_data))
print("Kurtosis:", stats.kurtosis(skewed_data))

#8. Generate and demonstrate positive and negative skewness
pos_skew = np.random.exponential(scale=2, size=1000)
neg_skew = np.random.beta(a=5, b=1, size=1000)

plt.hist(pos_skew, bins=30)
plt.title("Positively Skewed")
plt.show()

plt.hist(neg_skew, bins=30)
plt.title("Negatively Skewed")
plt.show()

#9. Calculate covariance between two datasets
x = [2, 4, 6, 8, 10]
y = [1, 3, 2, 5, 4]
cov_matrix = np.cov(x, y)
print("Covariance:\n", cov_matrix)

#10. Correlation coefficient between two datasets
corr_coeff = np.corrcoef(x, y)
print("Correlation Coefficient:\n", corr_coeff)

#11. Create scatter plot to visualize relationship
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

#12. Compare simple random sampling and systematic sampling
pop = list(range(1, 21))
simple_random = np.random.choice(pop, 5, replace=False)
systematic = [pop[i] for i in range(0, len(pop), 4)]
print("Simple Random Sample:", simple_random)
print("Systematic Sample:", systematic)

#13. Mean, median, and mode of grouped data
grouped_data = {'0-10': 5, '10-20': 8, '20-30': 10, '30-40': 7}
mid_points = [5, 15, 25, 35]
frequencies = [5, 8, 10, 7]
mean_grouped = np.average(mid_points, weights=frequencies)
print("Grouped Mean:", mean_grouped)

#14. Simulate data and calculate central tendency and dispersion
sim_data = np.random.normal(loc=50, scale=10, size=100)
print("Simulated Mean:", np.mean(sim_data))
print("Simulated Median:", np.median(sim_data))
print("Simulated Std Dev:", np.std(sim_data))

#15. Use NumPy or pandas to summarize a dataset’s descriptive stats
df2 = pd.DataFrame({'Values': sim_data})
print("\nDescriptive Statistics:\n", df2.describe())

#16. Plot boxplot to understand spread and identify outliers
sns.boxplot(data=sim_data)
plt.title("Boxplot")
plt.show()

#17. Calculate IQR
Q1 = np.percentile(sim_data, 25)
Q3 = np.percentile(sim_data, 75)
IQR = Q3 - Q1
print("Interquartile Range (IQR):", IQR)

#18. Implement Z-score normalization
z_scores = stats.zscore(sim_data)
print("Z-score of first 5 values:\n", z_scores[:5])

#19. Compare two datasets using std deviation
data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 15, 100)
print("Std Dev Data1:", np.std(data1))
print("Std Dev Data2:", np.std(data2))

#20. Visualize covariance using a heatmap
df_cov = pd.DataFrame({'X': x, 'Y': y})
sns.heatmap(df_cov.cov(), annot=True)
plt.title("Covariance Heatmap")
plt.show()

#21. Create correlation matrix using seaborn
sns.heatmap(df_cov.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()

#22. Generate dataset and compute variance and std deviation
sample_data = np.random.randint(10, 100, size=20)
print("Sample Data:", sample_data)
print("Variance:", np.var(sample_data))
print("Standard Deviation:", np.std(sample_data))

#23. Visualize skewness and kurtosis with seaborn
sns.histplot(sample_data, kde=True)
plt.title("Skewness and Kurtosis Visualization")
plt.show()

#24. Pearson and Spearman correlation coefficients
pearson_corr, _ = stats.pearsonr(x, y)
spearman_corr, _ = stats.spearmanr(x, y)
print("Pearson Correlation:", pearson_corr)
print("Spearman Correlation:", spearman_corr)
