# Spread Locator: Statistical Distribution Analysis Model
E-commerce Transaction Data Analysis

# Part A – Theoretical Foundations
### 1. Statistical Distributions
A statistical distribution describes how values of a variable are spread or distributed.

### 2. Q-Q Plot
A Q-Q plot compares sample data quantiles with theoretical quantiles to test normality.

### 3. Discrete vs Continuous Distribution
Discrete distributions apply to countable outcomes (e.g., number of transactions).
Continuous distributions apply to measurable quantities (e.g., transaction amount).

### 4. Bernoulli Distribution
Models binary outcomes (Success/Failure) with probability p.

### 5. Binomial Distribution
Models number of successes in fixed number of independent trials.

### 6. Log-Normal Distribution
A variable follows log-normal distribution if its logarithm is normally distributed.

### 7. Power Law Distribution
Describes heavy-tailed distributions where large values occur with small probability.

### 8. Box-Cox Transformation
A transformation used to stabilize variance and reduce skewness.

### 9. Poisson Distribution
Models number of events occurring in a fixed interval of time.

### 10. Z-Score Probability
Z-score measures how many standard deviations a value is from the mean.

### 11. PDF vs CDF
PDF gives probability density at a point. CDF gives cumulative probability up to a value.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.stats import boxcox
df = pd.read_csv('spread_locator_dataset.csv')
df.head()

## 1️⃣ Fit Bernoulli Distribution (Transaction Success)

In [None]:
success = (df['transaction_status']=='Success').astype(int)
p = success.mean()
p

Interpretation: p represents probability of transaction success.

## 2️⃣ Fit Poisson Distribution (Transaction Count)

In [None]:
lambda_est = df['transaction_count'].mean()
lambda_est

Interpretation: Lambda is average number of weekly transactions.

## 3️⃣ Log-Normal Distribution Fit (Transaction Amount)

In [None]:
shape, loc, scale = stats.lognorm.fit(df['transaction_amount'], floc=0)
shape, scale

Interpretation: Transaction amounts show right-skewed behavior typical of log-normal distribution.

## 4️⃣ Q-Q Plot for Normality

In [None]:
plt.figure()
stats.probplot(df['transaction_amount'], dist='norm', plot=plt)
plt.title('Q-Q Plot')
plt.show()

Interpretation: If points deviate from line, data is not normally distributed.

## 5️⃣ Box-Cox Transformation

In [None]:
transformed, lambda_bc = boxcox(df['transaction_amount'])
lambda_bc

Interpretation: Box-Cox helps stabilize variance and reduce skewness.

## 6️⃣ Z-Score & Probability of Amount > ₹5000

In [None]:
mean_amt = df['transaction_amount'].mean()
std_amt = df['transaction_amount'].std()
z = (5000 - mean_amt)/std_amt
prob = 1 - stats.norm.cdf(z)
z, prob

Interpretation: Probability shows proportion of transactions exceeding ₹5000 under normal assumption.

## 7️⃣ PDF Plot

In [None]:
plt.figure()
x = np.linspace(mean_amt-3*std_amt, mean_amt+3*std_amt, 100)
plt.plot(x, stats.norm.pdf(x, mean_amt, std_amt))
plt.title('Normal PDF')
plt.show()

## 8️⃣ CDF Plot

In [None]:
plt.figure()
plt.plot(x, stats.norm.cdf(x, mean_amt, std_amt))
plt.title('Normal CDF')
plt.show()

## Final Conclusion
Transaction amounts are positively skewed and better modeled using Log-Normal distribution.
Box-Cox transformation improves normality for further parametric analysis.

## Statistical Decision Explanation
Based on distribution fitting and Q-Q plot analysis, transaction amounts show positive skewness.
Log-Normal distribution better captures this behavior compared to normal distribution.
Box-Cox transformation improves symmetry, allowing parametric statistical methods.
Therefore, for decision-making, log-normal modeling is more appropriate for transaction amounts.
