# Cell 1: Descriptive Statistics
In this cell, we will cover basic descriptive statistics including mean, median, mode, standard deviation, and variance.

**Mean:** The average of all the numbers in the dataset. It is calculated by summing all the values and dividing by the count of values.

**Median:** The middle value of the dataset when the numbers are sorted in ascending order. If the count of numbers is even, the median is the average of the two middle numbers.

**Mode:** The most frequently occurring value in the dataset. A dataset can have more than one mode if multiple values have the highest frequency.

**Standard Deviation:** A measure of the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.

**Variance:** The average of the squared differences from the mean. It is the square of the standard deviation.

In [None]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {'Scores': [88, 92, 79, 93, 85, 91, 87, 94, 78, 81]}
df = pd.DataFrame(data)

# Calculating Mean
mean = df['Scores'].mean()
print(f'Mean: {mean}')

# Calculating Median
median = df['Scores'].median()
print(f'Median: {median}')

# Calculating Mode
mode = df['Scores'].mode()[0]
print(f'Mode: {mode}')

# Calculating Standard Deviation
std_dev = df['Scores'].std()
print(f'Standard Deviation: {std_dev}')

# Calculating Variance
variance = df['Scores'].var()
print(f'Variance: {variance}')

# Cell 2: Probability Distributions
In this cell, we will cover some common probability distributions such as normal distribution and binomial distribution.

**Normal Distribution:** A continuous probability distribution that is symmetrical around its mean, meaning that it has a bell-shaped curve. It is characterized by its mean and standard deviation.

**Binomial Distribution:** A discrete probability distribution that models the number of successes in a fixed number of trials, each with the same probability of success. It is characterized by the number of trials (n) and the probability of success (p).

In [None]:
from scipy.stats import norm, binom

# Normal Distribution
mean, std_dev = 0, 1
x = np.linspace(-3, 3, 100)
pdf = norm.pdf(x, mean, std_dev)
print('Normal Distribution PDF:')
print(pdf)

# Binomial Distribution
n, p = 10, 0.5
binom_dist = binom(n, p)
x = np.arange(0, 11)
pmf = binom_dist.pmf(x)
print('\nBinomial Distribution PMF:')
print(pmf)

# Cell 3: Hypothesis Testing
In this cell, we will cover the basics of hypothesis testing, including null and alternative hypotheses, p-values, and performing a t-test.

**Null Hypothesis (H0):** A statement that there is no effect or no difference, and it is what we seek to test against.

**Alternative Hypothesis (H1):** A statement that indicates the presence of an effect or a difference.

**P-value:** The probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true. A low p-value (typically < 0.05) indicates strong evidence against the null hypothesis, so we reject the null hypothesis.

**T-test:** A statistical test used to determine whether there is a significant difference between the means of two groups. In this cell, we perform a one-sample t-test.

In [None]:
from scipy.stats import ttest_1samp

# Sample Data
data = [88, 92, 79, 93, 85, 91, 87, 94, 78, 81]
# Hypothesis Testing
t_stat, p_value = ttest_1samp(data, 85)
print(f'T-statistic: {t_stat}')
print(f'P-value: {p_value}')

# Interpreting the p-value
alpha = 0.05
if p_value < alpha:
    print('Reject the null hypothesis')
else:
    print('Fail to reject the null hypothesis')

# Cell 4: Correlation and Regression
In this cell, we will cover correlation and simple linear regression, including calculating the correlation coefficient and fitting a regression line.

**Correlation Coefficient:** A measure of the strength and direction of the relationship between two variables. It ranges from -1 to 1, where 1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 indicates no correlation.

**Linear Regression:** A method to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. In simple linear regression, we use one independent variable to predict the dependent variable.

In [None]:
import seaborn as sns
from scipy.stats import linregress

# Sample Data
data = {'Hours Studied': [1, 2, 3, 4, 5], 'Scores': [60, 61, 64, 68, 70]}
df = pd.DataFrame(data)

# Calculating Correlation Coefficient
correlation = df.corr().iloc[0, 1]
print(f'Correlation Coefficient: {correlation}')

# Performing Linear Regression
slope, intercept, r_value, p_value, std_err = linregress(df['Hours Studied'], df['Scores'])
print(f'Slope: {slope}')
print(f'Intercept: {intercept}')
print(f'R-squared: {r_value**2}')

# Plotting the Regression Line
sns.regplot(x='Hours Studied', y='Scores', data=df)

# Cell 5: Interquartile Range (IQR)
In this cell, we will cover the concept of Interquartile Range (IQR).

**Interquartile Range (IQR):** A measure of statistical dispersion, or how spread out the data is. It is calculated as the difference between the 75th percentile (Q3) and the 25th percentile (Q1). The IQR is used to identify outliers in the data.

In [None]:
import numpy as np

# Sample Data
data = [7, 15, 36, 39, 40, 41, 42, 43, 47, 49]
# Calculating Q1 (25th percentile) and Q3 (75th percentile)
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
# Calculating IQR
IQR = Q3 - Q1
print(f'Q1: {Q1}')
print(f'Q3: {Q3}')
print(f'IQR: {IQR}')

# Cell 6: Z-Score
In this cell, we will cover the concept of Z-Score.

**Z-Score:** A measure of how many standard deviations a data point is from the mean. It is calculated as the difference between the value and the mean, divided by the standard deviation. Z-Scores are used to identify outliers and understand the position of a value within a distribution.

In [None]:
import scipy.stats as stats

# Sample Data
data = [88, 92, 79, 93, 85, 91, 87, 94, 78, 81]
# Calculating Z-Scores
z_scores = stats.zscore(data)
print('Z-Scores:')
print(z_scores)

# Cell 7: Confidence Intervals
In this cell, we will cover the concept of Confidence Intervals.

**Confidence Interval:** A range of values that is likely to contain the true population parameter with a certain level of confidence. It is calculated from the sample data and provides an estimate of the uncertainty around the sample estimate. Common confidence levels are 90%, 95%, and 99%.

In [None]:
import scipy.stats as stats

# Sample Data
data = [88, 92, 79, 93, 85, 91, 87, 94, 78, 81]
# Calculating the mean and standard error of the mean (SEM)
mean = np.mean(data)
sem = stats.sem(data)
# Calculating the 95% confidence interval
confidence = 0.95
h = sem * stats.t.ppf((1 + confidence) / 2, len(data) - 1)
confidence_interval = (mean - h, mean + h)
print(f'95% Confidence Interval: {confidence_interval}')