
# Question and Answers:

# 1. What is statistics, and why is it important?
   - Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. It helps in making informed decisions, understanding trends, and predicting future outcomes based on data.

# 2. What are the two main types of statistics?

* Descriptive Statistics
* Inferential Statistics

# 3. What are descriptive statistics?
  - Descriptive statistics summarize and describe the main features of a dataset using measures like mean, median, mode, range, and standard deviation.

# 4. What is inferential statistics?
   Inferential statistics involve making predictions or inferences about a population based on a sample of data. It uses techniques like hypothesis testing and confidence intervals.

# 5. What is sampling in statistics?
  - Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

# 6. What are the different types of sampling methods?

* Random Sampling
* Stratified Sampling
* Systematic Sampling
* Cluster Sampling
* Convenience Sampling
* Judgmental or Purposive Sampling

# 7. What is the difference between random and non-random sampling?

* Random Sampling: Every individual has an equal chance of being selected. (e.g., lottery draw)
* Non-Random Sampling: Selection is based on criteria or convenience. (e.g., surveying only nearby people)

# 8. Define and give examples of qualitative and quantitative data.

* Qualitative Data: Non-numerical information (e.g., color of cars, types of cuisine)
* Quantitative Data: Numerical data (e.g., height in cm, number of students)

# 9. What are the different types of data in statistics?

* Qualitative (Categorical): Nominal and Ordinal
* Quantitative (Numerical): Discrete and Continuous

# 10. Explain nominal, ordinal, interval, and ratio levels of measurement.

* Nominal: Categories without order (e.g., gender, blood group)
* Ordinal: Categories with a specific order (e.g., ranking in a race)
* Interval: Numeric scales with equal intervals, no true zero (e.g., temperature in °C)
* Ratio: Like interval, but with a true zero (e.g., weight, height)

# 11. What is the measure of central tendency?
  - It refers to the central or typical value in a dataset, including mean, median, and mode.

# 12. Define mean, median, and mode.

* Mean: Average of all values
* Median: Middle value when data is ordered
* Mode: Most frequently occurring value

# 13. What is the significance of the measure of central tendency?
-  It helps to summarize a large dataset with a single representative value, making data easier to understand and compare.

# 14. What is variance, and how is it calculated?
- Variance measures the spread of data points from the mean.  
  Formula:
    Variance = Σ(x − x̄)² / n
    where x = individual value, x̄ = mean, n = number of observations

# 15. What is standard deviation, and why is it important?
-  Standard deviation is the square root of variance. It shows how much data varies from the mean. A low standard deviation indicates data is close to the mean.

# 16. Define and explain the term range in statistics.
-  Range is the difference between the highest and lowest values in a dataset.
   Formula:
    Range = Maximum value − Minimum value

# 17. What is the difference between variance and standard deviation?

* Variance: Measures average squared deviation from the mean
* Standard Deviation: Square root of variance; in the same units as the data

# 18. What is skewness in a dataset?
-  Skewness measures the asymmetry of a distribution. A dataset is skewed if it’s not symmetrical.

# 19. What does it mean if a dataset is positively or negatively skewed?

* Positively Skewed: Tail on the right; most data is on the left
* Negatively Skewed: Tail on the left; most data is on the right

# 20. Define and explain kurtosis.
-  Kurtosis measures the "tailedness" of the distribution:

* High kurtosis: More outliers
* Low kurtosis: Fewer outliers

# 21. What is the purpose of covariance?
-  Covariance shows the direction of the relationship between two variables — whether they increase together or one increases while the other decreases.

# 22. What does correlation measure in statistics?
-  Correlation quantifies the strength and direction of a relationship between two variables, typically ranging from -1 to +1.

# 23. What is the difference between covariance and correlation?

* Covariance: Indicates direction of relationship
* Correlation: Indicates both direction and strength; standardized (unitless)

# 24. What are some real-world applications of statistics?

* Business: sales forecasting, market research
* Healthcare: clinical trials, disease tracking
* Government: census, policy planning
* Education: exam analysis, school performance
* Sports: player stats, game strategies
* Environment: climate studies, pollution control
* Manufacturing: quality control, process improvement
* Finance: risk analysis, investment decisions
* Technology: AI, data analysis
* Social research: surveys, public opinion studies



# Practical Questions:

In [None]:
1. How do you calculate the mean, median, and mode of a dataset?

Mean = sum of values ÷ number of values

Median = middle value when data is sorted

Mode = value that appears most frequently



In [1]:
# 2. Write a Python program to compute the variance and standard deviation of a dataset.
import numpy as np

data = [10, 20, 30, 40, 50]
variance = np.var(data)
std_dev = np.std(data)

print("Variance:", variance)
print("Standard Deviation:", std_dev)


Variance: 200.0
Standard Deviation: 14.142135623730951


In [None]:
# 3. Create a dataset and classify it into nominal, ordinal, interval, and ratio types.
data = {
    "Gender": ["Male", "Female"],            # Nominal
    "Education Level": ["High School", "Bachelor", "Master"],  # Ordinal
    "Temperature (°C)": [30, 35, 40],         # Interval
    "Weight (kg)": [60, 72, 85]              # Ratio
}


In [None]:
# 4. Implement sampling techniques like random sampling and stratified sampling.
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.DataFrame({'Group': ['A', 'B', 'A', 'B', 'A', 'B'], 'Value': [10, 20, 15, 25, 14, 22]})

# Random sampling
random_sample = df.sample(n=3)

# Stratified sampling
stratified_sample = df.groupby('Group', group_keys=False).apply(lambda x: x.sample(1))

print("Random Sample:\n", random_sample)
print("Stratified Sample:\n", stratified_sample)


In [None]:
# 5. Write a Python function to calculate the range of a dataset.
def calculate_range(data):
    return max(data) - min(data)

print("Range:", calculate_range([3, 7, 2, 10, 6]))


In [None]:
# 6. Create a dataset and plot its histogram to visualize skewness.
import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 4, 5, 6, 8, 15]
plt.hist(data, bins=5)
plt.title("Histogram to Visualize Skewness")
plt.show()


In [None]:
# 7. Calculate skewness and kurtosis of a dataset using Python libraries.
from scipy.stats import skew, kurtosis

data = [2, 3, 4, 5, 6, 7, 20]
print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))


In [None]:
# 8. Generate a dataset and demonstrate positive and negative skewness.
from scipy.stats import skewnorm
import matplotlib.pyplot as plt

# Positive skew
pos_skew = skewnorm.rvs(a=10, size=1000)
# Negative skew
neg_skew = skewnorm.rvs(a=-10, size=1000)

plt.hist(pos_skew, bins=30, alpha=0.6, label='Positive Skew')
plt.hist(neg_skew, bins=30, alpha=0.6, label='Negative Skew')
plt.legend()
plt.title("Positive vs Negative Skewness")
plt.show()


In [None]:
# 9. Write a Python script to calculate covariance between two datasets.
import numpy as np

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

cov_matrix = np.cov(x, y)
print("Covariance:", cov_matrix[0, 1])


In [None]:
# 10. Write a Python script to calculate the correlation coefficient between two datasets.
correlation = np.corrcoef(x, y)
print("Correlation Coefficient:", correlation[0, 1])


In [None]:
# 11. Create a scatter plot to visualize the relationship between two variables.
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


In [None]:
# 12. Implement and compare simple random sampling and systematic sampling
import numpy as np

data = np.arange(1, 21)

# Simple random
random_sample = np.random.choice(data, size=5, replace=False)

# Systematic
step = len(data) // 5
systematic_sample = data[::step]

print("Random Sample:", random_sample)
print("Systematic Sample:", systematic_sample)


In [None]:
# 13. Calculate the mean, median, and mode of grouped data.
import statistics as stats

data = [10, 20, 20, 30, 40, 40, 40, 50]
print("Mean:", stats.mean(data))
print("Median:", stats.median(data))
print("Mode:", stats.mode(data))


In [None]:
# 14. Simulate data using Python and calculate its central tendency and dispersion.
data = np.random.normal(loc=50, scale=10, size=1000)
print("Mean:", np.mean(data))
print("Std Dev:", np.std(data))
print("Variance:", np.var(data))


In [None]:
# 15. Use NumPy or pandas to summarize a dataset’s descriptive statistics.
import pandas as pd

df = pd.DataFrame({'Scores': [23, 45, 67, 89, 34, 56]})
print(df.describe())


In [None]:
# 16. Plot a boxplot to understand the spread and identify outliers.
plt.boxplot(df['Scores'])
plt.title("Boxplot")
plt.show()


In [None]:
# 17. Calculate the interquartile range (IQR) of a dataset.
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
print("IQR:", IQR)


In [None]:
# 18. Implement Z-score normalization and explain its significance.
z_scores = (data - np.mean(data)) / np.std(data)
print("Z-scores:", z_scores[:5])
# Significance: Standardizes data to mean 0 and std dev 1


In [None]:
# 19. Compare two datasets using their standard deviations.
a = [10, 12, 14, 16]
b = [8, 15, 20, 30]

print("SD of A:", np.std(a))
print("SD of B:", np.std(b))
# Higher SD indicates more spread


In [None]:
# 20. Write a Python program to visualize covariance using a heatmap.
import seaborn as sns

df = pd.DataFrame({'A': a, 'B': b})
sns.heatmap(df.cov(), annot=True, cmap='coolwarm')
plt.title("Covariance Heatmap")
plt.show()


In [None]:
# 21. Use seaborn to create a correlation matrix for a dataset.
sns.heatmap(df.corr(), annot=True, cmap='YlGnBu')
plt.title("Correlation Matrix")
plt.show()


In [None]:
# 22. Generate a dataset and implement both variance and standard deviation computations.
sample = np.random.randint(10, 100, 15)
print("Dataset:", sample)
print("Variance:", np.var(sample))
print("Std Dev:", np.std(sample))


In [None]:
# 23. Visualize skewness and kurtosis using Python libraries like matplotlib or seaborn.
sns.histplot(sample, kde=True)
plt.title("Histogram with KDE Curve")
plt.show()


In [2]:
# 24. Implement the Pearson and Spearman correlation coefficients for a dataset.
from scipy.stats import pearsonr, spearmanr

x = [1, 2, 3, 4, 5]
y = [5, 6, 7, 8, 7]

print("Pearson:", pearsonr(x, y))
print("Spearman:", spearmanr(x, y))


Pearson: PearsonRResult(statistic=np.float64(0.8320502943378436), pvalue=np.float64(0.0805095732984986))
Spearman: SignificanceResult(statistic=np.float64(0.8207826816681233), pvalue=np.float64(0.08858700531354381))
