## Logistic Regression Model based on Spatial Proximity for Violent Crime Probability Prediction

_Example_: From the **spatial pattern analysis**, we observe that **violence tends to form clustered regions in the city**, i.e. **spatial proximity** seems involved. Accordingly, we tailor our **Logistic Regression Model** based on **spatial features** that capture spatial proximity, such as:

- **Presence of Violent Crime** (true/false);
- **Presence of Violent Crime in the 1st NN** (true if at least one 1stNN is true);
- **Presence of Violent Crime in the 2nd NN** (true if at least one 2ndNN is true, except from 1stNN);
  
And obtain the **probability of violent crime occurrence**.

Datasets in the code below refer to the following spatial coloured networks:
![Datasets](https://i.ibb.co/xgRPdf6/Screenshot-2024-06-05-at-18-45-54.png)
Where colour **red** refers to the presence of violent crime in that urban area and colour grey is assigned to urban area 7, which is the one we want to predict. 

In [67]:
import pandas as pd
from sklearn.linear_model import LogisticRegression

# Define datasets
datasets = {
    'A': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [1, 0, 0, 0, 1, 0],
            'Crime1stNN': [0, 1, 1, 1, 0, 0],
            'Crime2ndNN': [1, 0, 0, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [0]}
    },
    'B': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [1, 0, 1, 0, 0, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [0, 0, 0, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [0], 'Crime2ndNN': [1]}
    },
    'C': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 0, 1, 0, 1, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 0, 0, 0, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    },
    'D': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 1, 1, 0, 1, 0],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 0, 1, 0, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    },
    'E': {
        'data': {
            'Urban Area': [0, 1, 2, 3, 4, 5],
            'CrimeHere': [0, 0, 1, 0, 1, 1],
            'Crime1stNN': [1, 1, 1, 1, 1, 0],
            'Crime2ndNN': [1, 0, 1, 0, 1, 1]
        },
        'new_data': {'Crime1stNN': [1], 'Crime2ndNN': [1]}
    }
}

# Initialize a list to store the results
results = []

# Iterate over each dataset
for dataset_name, dataset in datasets.items():
    df = pd.DataFrame(dataset['data'])
    X = df[['Crime1stNN', 'Crime2ndNN']]
    y = df['CrimeHere']
    
    # Initialize and train the Logistic Regression model
    model = LogisticRegression()
    model.fit(X, y)
    
    # Prepare new data as a DataFrame with the correct feature names
    new_data_df = pd.DataFrame(dataset['new_data'])
    
    # Predict the probability of crime occurrence in a new urban area
    prediction = model.predict_proba(new_data_df)
    
    # Round the probability to 2 decimal places
    rounded_prediction = round(prediction[0][1], 2)
    
    # Print the result
    print(f"Dataset {dataset_name}: Predicted probability of violent crime occurrence:", rounded_prediction)
    
    # Append the result to the list
    results.append({
        'Dataset': dataset_name,
        'Predicted Probability': rounded_prediction
    })

# Convert the results list to a DataFrame
results_df = pd.DataFrame(results)



Dataset A: Predicted probability of violent crime occurrence: 0.2
Dataset B: Predicted probability of violent crime occurrence: 0.23
Dataset C: Predicted probability of violent crime occurrence: 0.27
Dataset D: Predicted probability of violent crime occurrence: 0.38
Dataset E: Predicted probability of violent crime occurrence: 0.55
