# Math in AI

## 1. Import Libraries

In [None]:
# Essential libraries for math in AI
import numpy as np  # Fundamental package for scientific computing
import scipy as sp  # Library for scientific and technical computing
import pandas as pd  # Data analysis and manipulation library

# Visualization libraries
import matplotlib.pyplot as plt  # Plotting library
import seaborn as sns  # Statistical data visualization library
import plotly.express as px  # Interactive visualization library

### 2.1 Make Dataset

In [None]:
np.random.seed(42)  # For reproducibility
data = {
    'Math': np.random.randint(0, 100, 50),
    'Science': np.random.randint(0, 100, 50),
    'English': np.random.randint(0, 100, 50)
}
df = pd.DataFrame(data)
df.head()

In [None]:
### 2.2 Statistical Parameters

#### 2.2.1 Explaination

- **Mean:** $$\mu = \frac{1}{n} \sum_{i=1}^{n} x_i$$
- **Median:** The middle value when the data is sorted.
- **Mode:** The value that appears most frequently in the dataset.
- **Standard Deviation:** $$\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}$$


#### 2.2.2 Code

# Mean
mean_math = df['Math'].mean()
print(f"Mean of Math scores: {mean_math}")

# Median
median_math = df['Math'].median()
print(f"Median of Math scores: {median_math}")

# Mode
mode_math = df['Math'].mode()[0]
print(f"Mode of Math scores: {mode_math}")

# Standard Deviation
std_math = df['Math'].std()
print(f"Standard Deviation of Math scores: {std_math}")

# Convert DataFrame column to NumPy array
math_scores = df['Math'].to_numpy()

# Mean
mean_math = np.mean(math_scores)
print(f"Mean of Math scores: {mean_math}")

# Median
median_math = np.median(math_scores)
print(f"Median of Math scores: {median_math}")

# Mode using SciPy
mode_math = sp.stats.mode(math_scores)[0]
print(f"Mode of Math scores: {mode_math}")

# Standard Deviation
std_math = np.std(math_scores)
print(f"Standard Deviation of Math scores: {std_math}")

#### 2.2.3 Visualization

# Histogram for Math scores
sns.histplot(df['Math'], bins=10, kde=True)
plt.title('Histogram of Math Scores')
plt.xlabel('Scores')
plt.ylabel('Frequency')
plt.show()

# Boxplot for Math scores
sns.boxplot(data=df, x='Math')
plt.title('Boxplot of Math Scores')
plt.ylabel('Scores')
plt.show()

fig = px.histogram(df, x='Math', nbins=10, title='Interactive Histogram of Math Scores', marginal="box", histnorm='probability')
fig.show()

FM 1 : Calculate Statistical Parameters For Other Features And Write Explain About That Feature 🧑

**Answer**

FM 2: why numpy std and pandas std is difference?

**Answer**

In [None]:
### 2.3 Standard Distribution

#### 2.3.1 Explaination

**Formulation:**
- **Normal Distribution:** $$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$

#### 2.3.2 Code

normal_data = np.random.normal(loc=0, scale=1, size=1000)

mean_normal = np.mean(normal_data)
std_normal = np.std(normal_data)
print(f"Mean of normal distribution: {mean_normal}")
print(f"Standard Deviation of normal distribution: {std_normal}")

#### 2.3.3 Visualization

sns.histplot(normal_data, bins=30, kde=True)
plt.title('Histogram of Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

In [None]:
### 2.4 Percentile, Quantiles, Quartiles, Boxplots, and Histograms

#### 2.4.1 Explaination

Steps to Calculate Percentile:

1. **Sort the Data**: Arrange the data in ascending order.
2. **Calculate the Rank**: Use the formula to find the rank of the percentile:
   $
   R = \frac{P}{100} \times (N + 1)
   $
   where $P$ is the desired percentile, and $N$ is the number of observations.
3. **Find the Percentile Value**:
   - If $R$ is an integer, the percentile value is the data point at position $R$.
   - If $R$ is not an integer, interpolate between the data points at positions $\lfloor R \rfloor$ and $\lceil R \rceil$.

Example:

Let's calculate the 30th percentile for the following dataset:

$[3, 7, 8, 5, 12, 14, 21, 13, 18]$

1. **Sort the Data**:

   $[3, 5, 7, 8, 12, 13, 14, 18, 21]$

2. **Calculate the Rank**:

   $
   N = 9 \\
   R = \frac{30}{100} \times (9 + 1) = 3
   $

3. **Find the Percentile Value**:

   Since $(R = 3)$ is an integer, the 30th percentile is the value at the 3rd position in the sorted list:

   $
   \text{30th percentile} = 7
   $

Another Example with Interpolation:

Let's calculate the 25th percentile for the same dataset:

1. **Sorted Data** (already sorted from previous step):

   $[3, 5, 7, 8, 12, 13, 14, 18, 21]$

2. **Calculate the Rank**:

   $
   R = \frac{25}{100} \times (9 + 1) = 2.5
   $

3. **Find the Percentile Value**:

   Since $(R = 2.5)$ is not an integer, interpolate between the data points at positions $\lfloor 2.5 \rfloor = 2$ and $\lceil 2.5 \rceil = 3$:

   $
   \text{Value at 2nd position} = 5 \\
   \text{Value at 3rd position} = 7
   $

   Interpolation formula:

   $
   \text{Percentile value} = \text{Value at 2nd position} + (R - \lfloor R \rfloor) \times (\text{Value at 3rd position} - \text{Value at 2nd position}) \\
   = 5 + (2.5 - 2) \times (7 - 5) \\
   = 5 + 0.5 \times 2 \\
   = 5 + 1 \\
   = 6
   $

   So, the 25th percentile is $6$.

Difference Between Quantile and Percentile:

- **Quantile**:
  - A quantile is a value at or below which a certain fraction of the data falls. It divides the data into equal-sized, contiguous intervals.
  - For example, quartiles divide data into four equal parts, deciles into ten equal parts, and percentiles into one hundred equal parts.
  - Quantiles can be generalized to any number of intervals (e.g., deciles, quintiles).

- **Percentile**:
  - A percentile is a specific type of quantile that divides the data into 100 equal parts.
  - It indicates the value below which a given percentage of observations fall.
  - For example, the 30th percentile is the value below which 30% of the data falls.

Quantiles are more general terms that include percentiles as a specific case. While percentiles specifically refer to divisions of 100, quantiles can refer to any number of equal divisions of the dataset.

Quantile Example:

Let's calculate the quartiles (25th, 50th, and 75th percentiles) for the dataset:

$[3, 7, 8, 5, 12, 14, 21, 13, 18]$

1. **Sort the Data**:

   $[3, 5, 7, 8, 12, 13, 14, 18, 21]$

2. **Calculate the Ranks for Quartiles**:
   
   - For the 25th percentile (1st quartile):

     $$
     R = \frac{25}{100} \times (9 + 1) = 2.5
     $$
  
     Interpolate between 2nd and 3rd values:
     $$
     \text{Value at 2nd position} = 5 \
     $$

     $$
     \text{Value at 3rd position} = 7 \
     $$

     $$
     \text{1st quartile (25th percentile)} = 5 + 0.5 \times (7 - 5) = 6
     $$
    

   - For the 50th percentile (2nd quartile or median):
   $$
     R = \frac{50}{100} \times (9 + 1) = 5
   $$

     The 5th value in sorted data:
   $$     
     \text{2nd quartile (50th percentile)} = 12     
   $$

   - For the 75th percentile (3rd quartile):
     $$
     R = \frac{75}{100} \times (9 + 1) = 7.5
     $$
     
     Interpolate between 7th and 8th values:
     $$
     \text{Value at 7th position} = 14
     $$

     $$
     \text{Value at 8th position} = 18
     $$

     $$
     \text{3rd quartile (75th percentile)} = 14 + 0.5 \times (18 - 14) = 16
     $$

![image](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Boxplot_vs_PDF.svg/800px-Boxplot_vs_PDF.svg.png)

#### 2.4.2 Code

quantiles = df['Math'].quantile([0.05,0.1,0.25, 0.5, 0.75])
print(f"Quantiles of Math scores:\n{quantiles}")

#### 2.4.3 Visualization

sns.boxplot(data=df, x='Math')
plt.title('Boxplot of Math Scores')
plt.ylabel('Scores')
plt.show()

sns.histplot(df['Math'], bins=10, kde=True)
plt.title('Histogram of Math Scores')
plt.xlabel('Scores')
plt.ylabel('Frequency')
plt.show()

fig = px.box(data, title='Boxplot with Plotly')
fig.show()

FM 3: Write your analysis about this class

**Answer**