**Major Crimes in Pittsburgh Neighbourhoods**  
Dataset: "Arrests for Major Crimes, 1972" (WPRDC)  
File: 8ce92a4b-fa62-45c3-8cee-cc58fefede75.csv  

In [None]:
# Imports (Also did !pip install (name), drop the ! if your doing it in terminal)
import pandas as pd 
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sens



In [None]:
#Loading the dataset
crime = pd.read_csv("8ce92a4b-fa62-45c3-8cee-cc58fefede75.csv")
crime.head()


In [None]:
#Accounting for complications with dataset

#Accounting for blanks (filled with zero)
crime = crime.fillna(0)

#Determining the "Weighted" scale of each crime, kinda subjective

crime['weighted_crime'] = (
    crime['number_arrests_murder'] * 5 +
    crime['number_arrests_rape'] * 4 +
    crime['number_arrests_robbery'] * 3 +
    crime['number_arrests_assault'] * 2 +
    crime['number_arrests_burglary'] * 1.5 +
    crime['number_arrests_larceny'] * 1
)

# Sorts by the highest to lowest weighted crime score (that I assigned above)

crime_sorted = crime[['neighborhood', 'weighted_crime']].sort_values(
    by='weighted_crime', ascending=False
).reset_index(drop=True)

#Display stuff
print(f"Total Neighborhoods: {crime.shape[0]}")
print(crime_sorted.to_string(index=False))

print("\nNote: Higher weighted crime scores indicate that the neighborhood has more frequent severe crimes")

In [None]:
G = nx.Graph()

#Nodes Nodes Nodes!
for _, row in crime.iterrows():
    G.add_node(row['neighborhood'], weight=row['weighted_crime'])


#Edges

for i, row_i in crime.iterrows():
    for j, row_j in crime.iterrows():
        if i < j:
            similarity = 1 - abs(row_i['weighted_crime'] - row_j['weighted_crime']) / max(crime['weighted_crime'])
            if similarity > 0.8:
                G.add_edge(row_i['neighborhood'], row_j['neighborhood'], weight=similarity)

#Graph visuals
plt.figure(figsize=(16,12))
pos = nx.spring_layout(G, seed=42, k=0.7)

#Node colouring that vary based on the weighted crime levels
node_weights = [crime.loc[crime['neighborhood'] == node, 'weighted_crime'].values[0] for node in G.nodes]
node_sizes = [w / 2 for w in node_weights]
node_colors = node_weights

#Network itself
nodes = nx.draw_networkx_nodes(
    G, pos, node_size=node_sizes, node_color=node_colors, cmap=plt.cm.Reds, alpha=0.9
)
edges = nx.draw_networkx_edges(G, pos, alpha=0.4)
labels = nx.draw_networkx_labels(G, pos, font_size=7, font_color='blue')

#Titling and Colour Grades

plt.colorbar(nodes, label="Weighted Crime Severity (See previous graph")
plt.title("Pittsburgh Neighborhood Major Crime Similiarity Graph", fontsize=14)
plt.axis('off')
plt.show()


**Explanation of Tables and Graphs**

This particular dataset, titled “Arrests for Major Crimes, 1972,” records the number of arrests for serious crimes across 70 Pittsburgh neighbourhoods. Each crime type, including murder, rape, robbery, assault, burglary, and larceny, was assigned a severity weight so that I could calculate a single weighted crime score for each neighbourhood. The higher the score, the worse off the neighbourhood is in terms of crime.

For the first table, I simply sorted the neighbourhoods using this weighted system. This helps visualise the data more clearly for someone unfamiliar with the neighbourhoods, since they do not need any prior knowledge of the area’s history or context, only the table itself.

This actually leads into the next graph, which visualises the neighbourhoods as nodes within a crime similarity network, similar to the networking project. Neighbourhoods with similar crime patterns are connected and positioned close together, forming clusters that reflect how similar their crime severity levels are. It is important to note that the layout is not geographic and is purely based on similarity, which confused me at first even as the one who made it. Essentially, nodes that are positioned closer together have more alike crime profiles.

I made this graph to address the limitations of looking only at the numbers in the table above. It can be difficult to grasp what those figures mean in reality. When you simply see a large number, you might assume the area is overwhelmed by crime, when in truth it may not be much higher than others with lower overall crime rates. Additionally, this method allows for the overlaying of other information, as you can look up the population of an area or any other metric and see if it lines up with other similar neighbourhoods in regards to those metrics.

In addition, this type of graph is most ideal, as other graphs suffer from the same terrible tendencies that often confuse the onlooker. I originally considered something more akin to a bar graph but ultimately chose this due to the reasoning mentioned above.



**Conclusion**  

Purely in terms of lowest crime rate the best neighbourhood from this dataset would be East Carnegie, it had the lowest score and was the most connected towards other neighbourhoods with consistent low crimes rates, additionally when you observe the tsv file you find it also has the lowest amount of "low" level crimes in terms of weight.