## Logistic Regression Model based on Spatial Proximity of Violence

_Example_: From the **spatial pattern analysis**, we observe that **violence tends to form clustered regions in the city**, i.e. **spatial proximity** seems involved. Accordingly, we tailor our **Logistic Regression Model** based on **spatial features** that capture spatial proximity. We use:

- **Presence of Violent Crime** (true/false, target variable y);
- **Presence of Violent Crime in the 1st NN** (true if at least one 1stNN is true, first predictor variable);
- **Presence of Violent Crime in the 2nd NN** (true if at least one 2ndNN is true except from 1stNN, second predictor variable);
  
And obtain the **probability of violent crime occurrence** as:

<img src="https://i.ibb.co/hX2XJ41/Screenshot-2024-06-06-at-11-01-40.png" alt="LogisticRegression" width="200"/>

Datasets in the code below refer to the following spatial coloured networks:

<img src="https://i.ibb.co/xgRPdf6/Screenshot-2024-06-05-at-18-45-54.png" alt="Datasets"/>

Where colour **red** refers to the presence of violent crime in that urban area and colour grey is assigned to urban area 7, which is the one we want to predict. 

**WARNING!** The dataset is small, so the accuracy of the model is affected by the small size of the data - all these results are only to show the logistic regression model in a toy example with spatial features involved.

In [162]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from scipy.special import expit  

# Each dataset refers to one of the coloured spatial networks above.
# new_data refers to Urban Area 7

datasets = {
    'A': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [1, 0, 0, 0, 1, 0],
            'Crime1stNN': [0, 1, 1, 1, 0, 0],
            'Crime2ndNN': [1, 0, 0, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [0]}
    },
    'B': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [1, 0, 1, 0, 0, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [0, 0, 0, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [0], 'Crime2ndNN': [1]}
    },
    'C': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 0, 1, 0, 1, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 0, 0, 0, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    },
    'D': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 1, 1, 0, 1, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 0, 1, 0, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    },
    'E': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 0, 1, 0, 1, 1],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 1, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    }
}

# Initialize a list to store the results
results = []

# Iterate over each dataset
for dataset_name, dataset in datasets.items():
    df = pd.DataFrame(dataset['data'])
    X = df[['Crime1stNN', 'Crime2ndNN']]
    y = df['CrimeHere']
    
    # Initialize and train the Logistic Regression model
    model = LogisticRegression()
    model.fit(X, y)
    
    # Prepare new data as a DataFrame with the correct feature names
    new_data_df = pd.DataFrame(dataset['new_data'])
    
    # Predict the probability of crime occurrence in a new urban area (in our example, Urban Area 7)
    prediction = model.predict_proba(new_data_df)
    
    # Round the probability to 2 decimal places
    rounded_prediction = round(prediction[0][1], 2)
    
    # Obtain the coefficients
    coef = model.coef_[0]
    intercept = model.intercept_[0]
    
    # Calculate baseline probability (with no neighbouring crimes)
    baseline_logit = intercept
    baseline_prob = expit(baseline_logit)
    
    # Calculate corrected probability with contributions from predictors
    logit_with_predictors = intercept + coef[0] * new_data_df['Crime1stNN'][0] + coef[1] * new_data_df['Crime2ndNN'][0]
    prob_with_predictors = expit(logit_with_predictors)
    
    # Print the results
    print(f"Dataset {dataset_name}:")
    print(f"  Predicted probability of violent crime occurrence: {rounded_prediction}")
    print(f"  Coefficients: Intercept = {intercept:.2f}, Crime1stNN Coef = {coef[0]:.2f}, Crime2ndNN Coef = {coef[1]:.2f}")
    
    # Show baseline and corrected probabilities
    interpretation = (
        f"  Interpretation:\n"
        f"    - Baseline probability: {baseline_prob:.2f}\n"
        f"    - Corrected probability with predictors: {prob_with_predictors:.2f}\n"
    )
    print(interpretation)
    
    # Append the results to the list
    results.append({
        'Dataset': dataset_name,
        'Predicted Probability': rounded_prediction,
        'Intercept': round(intercept, 2),
        'Crime1stNN Coef': round(coef[0], 2),
        'Crime2ndNN Coef': round(coef[1], 2),
    })

# Convert the results list to a DataFrame
results_df = pd.DataFrame(results)

# Print the results table
print("\nResults Table:")
print(results_df)


Dataset A:
  Predicted probability of violent crime occurrence: 0.2
  Coefficients: Intercept = -0.76, Crime1stNN Coef = -0.61, Crime2ndNN Coef = 0.61
  Interpretation:
    - Baseline probability: 0.32
    - Corrected probability with predictors: 0.20

Dataset B:
  Predicted probability of violent crime occurrence: 0.23
  Coefficients: Intercept = -0.73, Crime1stNN Coef = 0.23, Crime2ndNN Coef = -0.50
  Interpretation:
    - Baseline probability: 0.32
    - Corrected probability with predictors: 0.23

Dataset C:
  Predicted probability of violent crime occurrence: 0.27
  Coefficients: Intercept = -0.73, Crime1stNN Coef = 0.23, Crime2ndNN Coef = -0.50
  Interpretation:
    - Baseline probability: 0.32
    - Corrected probability with predictors: 0.27

Dataset D:
  Predicted probability of violent crime occurrence: 0.38
  Coefficients: Intercept = 0.28, Crime1stNN Coef = 0.31, Crime2ndNN Coef = -1.07
  Interpretation:
    - Baseline probability: 0.57
    - Corrected probability with pred