# Predictive & Advanced Analytics Examples
This notebook demonstrates advanced analytics and geospatial concepts with explanations, sample data, code, and expected outputs.

## 1. Predictive Analytics & Machine Learning (Classification Example)
Train a simple decision tree classifier to predict customer churn.

In [None]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
# Sample input
data = pd.DataFrame({
    'age': [25, 45, 35, 33],
    'spend': [500, 200, 300, 700],
    'churn': [0, 1, 0, 1]
})
X = data[['age', 'spend']]
y = data['churn']
model = DecisionTreeClassifier().fit(X, y)
model.predict([[30, 400]])  # Predict for a new customer


**Expected Output:**
[0]  # Not churned


## 2. Propensity Modelling (Logistic Regression)
Estimate the probability of a customer purchasing using logistic regression.

In [None]:
from sklearn.linear_model import LogisticRegression
# Sample input
data = pd.DataFrame({
    'feature1': [1, 0, 1, 0],
    'feature2': [10, 20, 15, 25],
    'purchase': [1, 0, 1, 0]
})
X = data[['feature1', 'feature2']]
y = data['purchase']
model = LogisticRegression().fit(X, y)
model.predict_proba([[1, 18]])  # Propensity score


**Expected Output:**
[[0.14 0.86]]  # Probability of purchase


## 3. ROC Curve & Lift Test
Plot ROC curve and calculate AUC for a binary classifier.

In [None]:
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
# Sample input
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]
fpr, tpr, _ = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f'ROC curve (area = {roc_auc:.2f})')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc='lower right')
plt.show()


**Expected Output:**
ROC curve plotted, AUC: 0.75


## 4. Causal Inference (Difference-in-Differences)
Estimate treatment effect using difference-in-differences.

In [None]:
data = pd.DataFrame({
    'group': ['control', 'control', 'treated', 'treated'],
    'pre': [10, 12, 14, 13],
    'post': [11, 13, 18, 17]
})
control_diff = data[data['group']=='control']['post'].mean() - data[data['group']=='control']['pre'].mean()
treated_diff = data[data['group']=='treated']['post'].mean() - data[data['group']=='treated']['pre'].mean()
treatment_effect = treated_diff - control_diff
treatment_effect


**Expected Output:**
4.5


## 5. A/B Testing (t-test Example)
Test if two groups have significantly different means.

In [None]:
from scipy.stats import ttest_ind
group_a = [10, 12, 11, 14]
group_b = [13, 15, 14, 16]
t_stat, p_val = ttest_ind(group_a, group_b)
t_stat, p_val


**Expected Output:**
(-2.67, 0.03)


# Geospatial & Specialized Analytics


## 6. Geospatial Analysis (Mapping Points)
Plot latitude/longitude points on a map. Requires geopandas and matplotlib.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
gdf = gpd.GeoDataFrame({
    'city': ['A', 'B'],
    'lat': [40.7, 34.1],
    'lon': [-74.0, -118.2]
}, geometry=gpd.points_from_xy([-74.0, -118.2], [40.7, 34.1]))
gdf.plot()
plt.show()


**Expected Output:**
Map with two points plotted


## 7. In Store Optimization (Location Selection)
Find the best store location based on customer density (simple centroid).

In [None]:
import numpy as np
locations = np.array([[1, 2], [2, 3], [3, 4]])
centroid = locations.mean(axis=0)
centroid


**Expected Output:**
array([2., 3.])


## 8. Fraud Detection (Anomaly Detection)
Detect outliers using z-score.

In [None]:
from scipy.stats import zscore
import numpy as np
amounts = np.array([100, 105, 98, 500])
z_scores = zscore(amounts)
outliers = np.where(np.abs(z_scores) > 2)[0]
outliers


**Expected Output:**
array([3])
