# New section

# What is statistics, and why is it important
Statistics is the science of collecting, analyzing, interpreting, and presenting data. It is important because it helps in making informed decisions using data.

# What are the two main types of statistics
Descriptive Statistics and Inferential Statistics.

# What are descriptive statistics
Descriptive statistics summarize and organize data using measures like mean, median, mode, and standard deviation.

# What is inferential statistics
Inferential statistics use a sample of data to make generalizations, predictions, or decisions about a population.

# What is sampling in statistics
Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

# What are the different types of sampling methods
Simple random sampling, stratified sampling, systematic sampling, cluster sampling, and convenience sampling.

# What is the difference between random and non-random sampling
Random sampling gives all individuals equal chance of selection, while non-random sampling does not, leading to potential bias.

# Define and give examples of qualitative and quantitative data
Qualitative data describes qualities (e.g., colors, names), while quantitative data involves numbers (e.g., height, age).

# What are the different types of data in statistics
Nominal, Ordinal, Interval, and Ratio.

# Explain nominal, ordinal, interval, and ratio levels of measurement
Nominal: categories (e.g., gender); Ordinal: order matters (e.g., rankings); Interval: equal intervals, no true zero (e.g., temperature); Ratio: true zero (e.g., weight).

# What is the measure of central tendency
It is a value that represents the center point of a dataset (mean, median, mode).

# Define mean, median, and mode
Mean: average; Median: middle value; Mode: most frequent value.

# What is the significance of the measure of central tendency
It provides a summary statistic to understand where most values in a dataset lie.

# What is variance, and how is it calculated
Variance measures data spread; calculated as the average of squared deviations from the mean.

# What is standard deviation, and why is it important
Standard deviation is the square root of variance; it shows how much data varies from the mean.

# Define and explain the term range in statistics
Range is the difference between the maximum and minimum values in a dataset.

# What is the difference between variance and standard deviation
Variance is the average squared deviation, while standard deviation is its square root and in the same unit as the data.

# What is skewness in a dataset
Skewness measures the asymmetry of the data distribution.

# What does it mean if a dataset is positively or negatively skewed
Positive skew: tail on the right; Negative skew: tail on the left.

# Define and explain kurtosis
Kurtosis measures the "tailedness" of a distribution—how heavy or light the tails are compared to a normal distribution.

# What is the purpose of covariance
Covariance shows how two variables change together (direction of relationship).

# What does correlation measure in statistics
Correlation measures the strength and direction of the linear relationship between two variables.

# What is the difference between covariance and correlation
Covariance measures direction of relation; correlation standardizes this to a range from -1 to 1.

# What are some real-world applications of statistics?
Statistics is used in medicine (clinical trials), economics (market trends), sports (performance analysis), and business (decision-making).


In [None]:
# How do you calculate the mean, median, and mode of a dataset
import statistics as stats
data = [10, 20, 20, 30, 40]
mean = stats.mean(data)
median = stats.median(data)
mode = stats.mode(data)


In [None]:
# Write a Python program to compute the variance and standard deviation of a dataset
import statistics as stats
data = [10, 20, 30, 40, 50]
variance = stats.variance(data)
std_dev = stats.stdev(data)


In [None]:
# Create a dataset and classify it into nominal, ordinal, interval, and ratio types
dataset = {
    "Nominal": ["Red", "Blue", "Green"],
    "Ordinal": ["Poor", "Average", "Good", "Excellent"],
    "Interval": [10, 20, 30],  # No true zero
    "Ratio": [0, 5, 10, 15]    # True zero exists
}


In [None]:
# Implement sampling techniques like random sampling and stratified sampling
import random
population = list(range(1, 101))
random_sample = random.sample(population, 10)
group_A = list(range(1, 51))
group_B = list(range(51, 101))
stratified_sample = random.sample(group_A, 5) + random.sample(group_B, 5)


In [None]:
# Write a Python function to calculate the range of a dataset
def calculate_range(data):
    return max(data) - min(data)
calculate_range([5, 10, 15, 20, 25])


In [None]:
# Create a dataset and plot its histogram to visualize skewness
import matplotlib.pyplot as plt
import numpy as np
data = np.random.exponential(scale=2, size=1000)
plt.hist(data, bins=30)
plt.title("Histogram to Visualize Skewness")
plt.show()


In [None]:
# Calculate skewness and kurtosis of a dataset using Python libraries
from scipy.stats import skew, kurtosis
data = np.random.normal(0, 1, 1000)
data_skewness = skew(data)
data_kurtosis = kurtosis(data)


In [None]:
# Generate a dataset and demonstrate positive and negative skewness
pos_skew = np.random.exponential(scale=2, size=1000)
neg_skew = -np.random.exponential(scale=2, size=1000)
plt.hist(pos_skew, bins=30, alpha=0.5, label="Positive Skew")
plt.hist(neg_skew, bins=30, alpha=0.5, label="Negative Skew")
plt.legend()
plt.title("Positive and Negative Skewness")
plt.show()


In [None]:
# Write a Python script to calculate covariance between two datasets
def covariance(x, y):
    mean_x = np.mean(x)
    mean_y = np.mean(y)
    cov = np.mean((x - mean_x) * (y - mean_y))
    return cov
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
covariance(x, y)


In [None]:
# Write a Python script to calculate the correlation coefficient between two datasets
from scipy.stats import pearsonr
corr_coef, _ = pearsonr(x, y)


In [None]:
# Create a scatter plot to visualize the relationship between two variables
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


In [None]:
# Implement and compare simple random sampling and systematic sampling
simple_sample = random.sample(population, 10)
k = 10
systematic_sample = [population[i] for i in range(0, len(population), k)]


In [None]:
# Calculate the mean, median, and mode of grouped data
intervals = [15, 25, 35, 45, 55]
frequencies = [4, 6, 10, 5, 3]
grouped_mean = sum([i * f for i, f in zip(intervals, frequencies)]) / sum(frequencies)


In [None]:
# Simulate data using Python and calculate its central tendency and dispersion
sim_data = np.random.randint(10, 100, size=50)
sim_mean = np.mean(sim_data)
sim_median = np.median(sim_data)
sim_std = np.std(sim_data)
sim_var = np.var(sim_data)


In [None]:
# Use NumPy or pandas to summarize a dataset’s descriptive statistics
import pandas as pd
df = pd.DataFrame(sim_data, columns=["Values"])
summary = df.describe()


In [None]:
# Plot a boxplot to understand the spread and identify outliers
plt.boxplot(sim_data)
plt.title("Boxplot")
plt.show()


In [None]:
# Calculate the interquartile range (IQR) of a dataset
Q1 = np.percentile(sim_data, 25)
Q3 = np.percentile(sim_data, 75)
IQR = Q3 - Q1


In [None]:
# Implement Z-score normalization and explain its significance
z_scores = (sim_data - np.mean(sim_data)) / np.std(sim_data)
# Z-score normalization is used to scale data for comparison or machine learning algorithms.


In [None]:
# Compare two datasets using their standard deviations
data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 15, 100)
std1 = np.std(data1)
std2 = np.std(data2)


In [None]:
# Write a Python program to visualize covariance using a heatmap
import seaborn as sns
df2 = pd.DataFrame({"x": data1, "y": data2})
cov_matrix = df2.cov()
sns.heatmap(cov_matrix, annot=True)
plt.title("Covariance Heatmap")
plt.show()


In [None]:
# Visualize skewness and kurtosis using Python libraries like matplotlib or seaborn
sns.histplot(rand_data, kde=True)
plt.title("Distribution with KDE")
plt.show()
print("Skewness:", skew(rand_data))
print("Kurtosis:", kurtosis(rand_data))


In [None]:
# Implement the Pearson and Spearman correlation coefficients for a dataset
from scipy.stats import spearmanr
pearson_corr, _ = pearsonr(data1, data2)
spearman_corr, _ = spearmanr(data1, data2)
