<a href="https://colab.research.google.com/github/0v3r-9000/trc204.github.io/blob/main/Crime_Rates_Boston.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Crime Rates in Boston: A Multiple Linear Regression Analysis



## Abstract

This study investigated the factors associated with crime rates in Boston using multiple linear regression analysis. The analysis utilized the Boston Housing dataset, exploring the relationships between various socioeconomic, environmental, and accessibility variables and per capita crime rates. The findings revealed that accessibility to radial highways (RAD) and the percentage of lower status population (LSTAT) were significant predictors of higher crime rates. Conversely, distance to employment centers (DIS) and, notably, a higher proportion of Black people by town (B) were associated with lower crime rates. While other variables did not show statistically significant relationships, these key findings highlight the complex interplay of factors influencing crime patterns in urban environments. The results have implications for urban planning, community development, and crime prevention strategies, emphasizing the need to address socioeconomic disparities and consider the spatial distribution of resources and infrastructure to create safer communities.

## Introduction

Crime is a complex social issue with significant impacts on individuals and communities. Understanding the factors that contribute to crime is crucial for developing effective prevention and intervention strategies. This study focuses on analyzing crime rates in Boston, Massachusetts, using a multiple linear regression approach. The Boston Housing dataset provides a rich source of information on various neighborhood characteristics, including socioeconomic indicators, environmental factors, and accessibility to resources. By exploring the relationships between these variables and crime rates, we aim to identify key predictors and gain insights into the spatial patterns of crime in the city.

The analysis focuses on identifying significant associations between predictor variables and per capita crime rates. We investigate the roles of factors such as highway accessibility, socioeconomic status, proximity to employment centers, and racial demographics in influencing crime patterns. The findings of this study have implications for urban planning and development, community-based interventions, and law enforcement strategies. By understanding the factors that contribute to higher or lower crime rates, we can inform policies and initiatives aimed at creating safer and more equitable communities for all residents.




## Methodology



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
!pip install gradio transformers

In [None]:
x = pd.read_csv('data2.csv')
print(x)

        CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD  TAX  \
0    0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296   
1    0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242   
2    0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242   
3    0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222   
4    0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222   
..       ...   ...    ...   ...    ...    ...   ...     ...  ...  ...   
506  0.98765   0.0  12.50     0  0.561  6.980  89.0  2.0980    3  320   
507  0.23456   0.0  12.50     0  0.561  6.980  76.0  2.6540    3  320   
508  0.44433   0.0  12.50     0  0.561  6.123  98.0  2.9870    3  320   
509  0.77763   0.0  12.70     0  0.561  6.222  34.0  2.5430    3  329   
510  0.65432   0.0  12.80     0  0.561  6.760  67.0  2.9870    3  345   

     PTRATIO       B  LSTAT  MEDV  
0       15.3  396.90   4.98  24.0  
1       17.8  396.90   9.14  21.6  
2       17.8  3

# Method 2

In [None]:
import pandas as pd
import statsmodels.formula.api as smf

In [None]:
data = pd.read_csv('data2.csv')

In [None]:
formula = 'CRIM ~ LSTAT + MEDV + TAX + PTRATIO + INDUS + AGE + DIS + RAD + ZN'

In [None]:
model = smf.ols(formula, data=data).fit()

In [None]:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                   CRIM   R-squared:                       0.443
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     44.27
Date:                Thu, 16 Jan 2025   Prob (F-statistic):           2.88e-58
Time:                        21:47:47   Log-Likelihood:                -1672.5
No. Observations:                 511   AIC:                             3365.
Df Residuals:                     501   BIC:                             3407.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      8.7700      4.037      2.172      0.0

In [None]:
import pandas as pd
import statsmodels.formula.api as smf

In [None]:
data = pd.read_csv('data2.csv')

In [None]:
formula = 'CRIM ~ LSTAT + RAD + ZN'

In [None]:
model = smf.ols(formula, data=data).fit()

In [None]:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                   CRIM   R-squared:                       0.443
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     44.27
Date:                Thu, 16 Jan 2025   Prob (F-statistic):           2.88e-58
Time:                        21:50:05   Log-Likelihood:                -1672.5
No. Observations:                 511   AIC:                             3365.
Df Residuals:                     501   BIC:                             3407.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      8.7700      4.037      2.172      0.0

In [None]:
formula = 'CRIM ~ LSTAT + MEDV + DIS + RAD + ZN'

In [None]:
model = smf.ols(formula, data=data).fit()

In [None]:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                   CRIM   R-squared:                       0.436
Model:                            OLS   Adj. R-squared:                  0.431
Method:                 Least Squares   F-statistic:                     78.19
Date:                Thu, 16 Jan 2025   Prob (F-statistic):           1.19e-60
Time:                        21:50:20   Log-Likelihood:                -1675.5
No. Observations:                 511   AIC:                             3363.
Df Residuals:                     505   BIC:                             3388.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.3178      1.800      1.288      0.1

In [None]:
# Assuming your dataset has the following columns:
# 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'
formula = 'CHAS ~ ZN + INDUS + DIS + CRIM + RM + AGE + NOX + RAD + TAX + PTRATIO + B + LSTAT'


In [None]:
model = smf.ols(formula, data=data).fit()

In [None]:
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                   CHAS   R-squared:                       0.068
Model:                            OLS   Adj. R-squared:                  0.046
Method:                 Least Squares   F-statistic:                     3.011
Date:                Thu, 16 Jan 2025   Prob (F-statistic):           0.000432
Time:                        21:50:49   Log-Likelihood:                -6.1439
No. Observations:                 506   AIC:                             38.29
Df Residuals:                     493   BIC:                             93.23
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.1059      0.255      0.415      0.6

In [None]:
# Define LLM pipeline
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

def generate_response(user_input):
  # Format the user input as a question for the LLM
  question = f"Based on the regression analysis of crime rates in Boston, {user_input}"

  # Use the LLM to generate a response
  response = qa_pipeline(question=question, context="""
Overall, the analysis indicates that highway accessibility might be a factor to consider when examining crime patterns in urban environments. Overall, the analysis suggests that socioeconomic disadvantage might be a significant factor to consider when examining crime patterns in urban environments. Overall, the analysis suggests that distance to employment centers might be a factor to consider when examining crime patterns in urban environments. Areas farther from employment centers tend to have lower crime rates. In conclusion, while the analysis might reveal associations between racial demographics and crime rates, these associations require careful and nuanced interpretation. It's crucial to acknowledge the historical and systemic factors that drive racial disparities in crime and avoid drawing simplistic conclusions about individual behavior or inherent racial traits. The connection between zoning for large lots and distance to employment centers can contribute to spatial inequality, where wealthier households tend to live in larger homes further from job centers, while lower-income households might have limited access to affordable housing near employment opportunities.

  """)

  return response['answer']

# Define input and output components
# Change from gr.inputs.Radio to gr.Radio
input = gr.Radio(
    choices=[
        '1. The impact of highway accessibility on crime rates.',
        '2. The relationship between socioeconomic disadvantage and crime.',
        '3. The effect of distance to employment centers on crime.',
        '4. The association between racial demographics and crime (requires careful interpretation).',
        '5. The connection between zoning for large lots and distance to employment centers.',
    ],
    label='Select a finding to learn more about:',
)

output = gr.Textbox()

# Create the Gradio interface
iface = gr.Interface(
    fn=generate_response,
    inputs=[input],
    outputs=output,
    title="Exploring Crime Rates in Boston",
    description="Learn about the factors influencing crime using an interactive LLM.",
)

# Launch the app
iface.launch()

Device set to use cpu


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://99ecd62fe7f6675bfb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


