## Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with summarizing and describing the main features of a dataset. In Python, descriptive statistics can be calculated using various libraries such as NumPy, Pandas, and SciPy. These libraries provide functions for calculating common descriptive statistics such as mean, median, mode, standard deviation, variance, minimum, maximum, quartiles, percentiles, skewness, kurtosis, and more.

### using Pandas

Pandas is a popular Python library for data analysis that provides various functions for descriptive statistics. Here is an example of how to use some of these functions in Pandas:

In the example below, we create a sample data frame named df with two columns, Score and Subject. We then calculate various descriptive statistics such as mean, median, mode, range, variance, standard deviation, and quartiles using the appropriate Pandas functions. The results of these calculations are printed to the console.

In [None]:
import pandas as pd

# create a sample data frame
data = {'Score': [90, 95, 85, 92, 88, 80],
        'Subject': ['Math', 'English', 'History', 'Science', 'Geography', 'Art']}
df = pd.DataFrame(data)


In [None]:
# calculate the mean
mean = df['Score'].mean()
print("Mean:", mean)

In [None]:
# calculate the median
median = df['Score'].median()
print("Median:", median)

In [None]:
# calculate the mode
mode = df['Score'].mode().values[0]
print("Mode:", mode)

In [None]:
# calculate the range
rng = df['Score'].max() - df['Score'].min()
print("Range:", rng)

In [None]:
# calculate the variance
var = df['Score'].var()
print("Variance:", var)

In [None]:
# calculate the standard deviation
std = df['Score'].std()
print("Standard Deviation:", std)

In [None]:
# calculate the quartiles
quartiles = df['Score'].quantile([0.25, 0.5, 0.75])
print("Quartiles: \n", quartiles)

### using Numpy and Scipy

In [None]:
import numpy as np
from scipy import stats

In [None]:
# create a sample data array
data = np.array([90, 95, 85, 92, 88, 80])

# calculate the mean
mean = np.mean(data)
print("Mean:", mean)

# calculate the median
median = np.median(data)
print("Median:", median)

# calculate the mode
mode = stats.mode(data)
print("Mode:", mode.mode[0])

# calculate the range
rng = np.ptp(data)
print("Range:", rng)

# calculate the variance
var = np.var(data)
print("Variance:", var)

# calculate the standard deviation
std = np.std(data)
print("Standard Deviation:", std)

# calculate the quartiles
quartiles = np.percentile(data, [25, 50, 75])
print("Quartiles:", quartiles)


##  probability distributions using Scipy

Scipy provides functions for working with probability distributions, including functions for generating random numbers from specific distributions, fitting data to a specific distribution, and calculating the cumulative distribution function (CDF) or probability density function (PDF) for a given distribution. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, expon, uniform

# generate random numbers from a normal distribution
mean = 0
std = 1
r = norm.rvs(mean, std, size=100)

# plot the histogram
plt.hist(r, bins=25, density=True)

# plot the PDF
x = np.linspace(-5, 5, 100)
pdf = norm.pdf(x, mean, std)
plt.plot(x, pdf, 'k-', lw=2, label='Normal PDF')

# add labels and title
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution')
plt.legend()

# display the plot
plt.show()

In [None]:
# generate random numbers from an exponential distribution
scale = 1
r = expon.rvs(scale=scale, size=100)

# plot the histogram
plt.hist(r, bins=25, density=True)

# plot the PDF
x = np.linspace(0, 5, 100)
pdf = expon.pdf(x, scale=scale)
plt.plot(x, pdf, 'k-', lw=2, label='Exponential PDF')

# add labels and title
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Exponential Distribution')
plt.legend()

# display the plot
plt.show()



In [None]:
# generate random numbers from a uniform distribution
loc = 0
scale = 1
r = uniform.rvs(loc, scale, size=100)

# plot the histogram
plt.hist(r, bins=25, density=True)

# plot the PDF
x = np.linspace(-2, 2, 100)
pdf = uniform.pdf(x, loc, scale)
plt.plot(x, pdf, 'k-', lw=2, label='Uniform PDF')

# add labels and title
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Uniform Distribution')
plt.legend()

# display the plot
plt.show()


### cumulative distribution function (cdf)
In the example below, use Scipy's norm function to generate random numbers from a normal distribution. We then plot a histogram of the generated data and overlay the CDF for the normal distribution on the histogram. The resulting plot will display the cumulative probability of observing values less than or equal to a given value for the normal distribution.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# generate random numbers from a normal distribution
mean = 0
std = 1
r = norm.rvs(mean, std, size=100)

# plot the histogram
plt.hist(r, bins=25, density=True, cumulative=True, histtype='step', color='red', alpha=0.8, label='Histogram')

# plot the CDF
x = np.linspace(-5, 5, 100)
cdf = norm.cdf(x, mean, std)
plt.plot(x, cdf, 'k-', lw=2, label='CDF')

# add labels and title
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.title('CDF of Normal Distribution')
plt.legend()

# display the plot
plt.show()