# Predictive & Advanced Analytics: User-Friendly Examples
This notebook demonstrates predictive, advanced, and geospatial analytics concepts using clear data and step-by-step explanations.
Each section includes:
- A concept explanation
- Sample input data (displayed as a DataFrame)
- Step-by-step code with comments
- Output shown with print/display


## 1. Predictive Analytics & Machine Learning (Classification Example)
**Goal:** Train a simple decision tree classifier to predict customer churn.

In [None]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
# Sample input
data = pd.DataFrame({
    'Age': [25, 45, 35, 33],
    'Annual Spend ($)': [500, 200, 300, 700],
    'Churned': [0, 1, 0, 1]
})
display(data)
# Train model
X = data[['Age', 'Annual Spend ($)']]
y = data['Churned']
model = DecisionTreeClassifier().fit(X, y)
# Predict for a new customer
prediction = model.predict([[30, 400]])
print('Predicted Churn (0=No, 1=Yes):', prediction[0])

---
## 2. Propensity Modelling (Logistic Regression)
**Goal:** Estimate the probability of a customer purchasing using logistic regression.

In [None]:
from sklearn.linear_model import LogisticRegression
# Sample input
data = pd.DataFrame({
    'Feature 1 (Email Clicked)': [1, 0, 1, 0],
    'Feature 2 (Visits)': [10, 20, 15, 25],
    'Purchased': [1, 0, 1, 0]
})
display(data)
X = data[['Feature 1 (Email Clicked)', 'Feature 2 (Visits)']]
y = data['Purchased']
model = LogisticRegression().fit(X, y)
# Propensity score for a new user
score = model.predict_proba([[1, 18]])[0,1]
print(f'Propensity to Purchase: {score:.2%}')

---
## 3. ROC Curve & Lift Test
**Goal:** Plot ROC curve and calculate AUC for a binary classifier.

In [None]:
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
# Sample input
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
fpr, tpr, _ = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure(figsize=(5,4))
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0,1],[0,1],'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

---
## 4. Causal Inference (Difference-in-Differences)
**Goal:** Estimate treatment effect using difference-in-differences.

In [None]:
# Sample experiment data
data = pd.DataFrame({
    'Group': ['Control', 'Control', 'Treated', 'Treated'],
    'Pre': [10, 12, 14, 13],
    'Post': [11, 13, 18, 17]
})
display(data)
control_diff = data[data['Group']=='Control']['Post'].mean() - data[data['Group']=='Control']['Pre'].mean()
treated_diff = data[data['Group']=='Treated']['Post'].mean() - data[data['Group']=='Treated']['Pre'].mean()
treatment_effect = treated_diff - control_diff
print(f'Treatment Effect: {treatment_effect:.2f}')

---
## 5. A/B Testing (t-test Example)
**Goal:** Test if two groups have significantly different means.

In [None]:
from scipy.stats import ttest_ind
# Sample group data
group_a = [10, 12, 11, 14]
group_b = [13, 15, 14, 16]
print('Group A:', group_a)
print('Group B:', group_b)
t_stat, p_val = ttest_ind(group_a, group_b)
print(f't-statistic: {t_stat:.2f}, p-value: {p_val:.3f}')

---
# Geospatial & Specialized Analytics


## 6. Geospatial Analysis (Mapping Points)
**Goal:** Plot latitude/longitude points on a map. Requires geopandas and matplotlib.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
gdf = gpd.GeoDataFrame({
    'City': ['A', 'B'],
    'Latitude': [40.7, 34.1],
    'Longitude': [-74.0, -118.2]
}, geometry=gpd.points_from_xy([-74.0, -118.2], [40.7, 34.1]))
display(gdf)
gdf.plot(figsize=(5,5), color='blue', marker='o')
plt.title('City Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

---
## 7. In Store Optimization (Location Selection)
**Goal:** Find the best store location based on customer density (simple centroid).

In [None]:
import numpy as np
# Sample customer locations
locations = np.array([[1, 2], [2, 3], [3, 4]])
print('Customer Locations:')
print(locations)
centroid = locations.mean(axis=0)
print('Optimal Store Location (centroid):', centroid)

---
## 8. Fraud Detection (Anomaly Detection)
**Goal:** Detect outliers using z-score.

In [None]:
from scipy.stats import zscore
import numpy as np
# Sample transaction amounts
amounts = np.array([100, 105, 98, 500])
print('Transaction Amounts:', amounts)
z_scores = zscore(amounts)
outliers = np.where(np.abs(z_scores) > 2)[0]
print('Outlier Indices:', outliers)