Quinns notebook

**Introduction**

Our project is to figure out which is the best neighborhood in Pittsburgh... for rats. My approach is to find which neighborhood is the best for rats by analysising the amount of condemned or dead-end properties in each area, and then factoring in the properties' inspection score into our rating. The reasoning behind this is that if there are alot of condemned buildings that recieved bad inspection scores in a neighborhood, then that area has alot of unkept buildings that are probably run down and trashy. Perfect for rats. Additionaly, if there are lots of condemned buildings in general, that means there would be plently of space for the rats to live in peace.

**The Metric**

My metric is the amount of condemned properties per neighborhood, weighted by the inspection scores of each property.

A link to my dataset can be found here: https://data.wprdc.org/dataset/condemned-properties/resource/0a963f26-eb4b-4325-bbbc-3ddf6a871410

**The Best Neighborhood**

First lets import pandas to help with our data and pull in our data.

In [3]:
import pandas as pd
data = pd.read_csv('Condemned.tsv', delimiter='\t')

Now we will pull out the zip codes from every row in the "Address" column. This is so that we can find how many condemned or dead end properties exist in each neighborhood by zip code. The higher number of these properties in one neighborhood, the more places there are to live as a rat.

In [4]:
data['Zip Code'] = data['address'].str.extract(r'(\d{5})$')
property_count = data.groupby('Zip Code').size().sort_values(ascending=False)
print(property_count)

Zip Code
15210    537
15212    411
15208    291
15219    283
15206    250
15214    204
15207    199
15204    134
15220     83
15224     82
15221     80
15233     70
15203     65
15211     57
15201     56
15216     41
15213     41
15205     37
15226     36
15217     33
15227     17
15235     16
15222      7
15234      4
15232      3
15120      2
15218      1
15106      1
dtype: int64


Now that we've displayed how many condemned or dead-end properties there are in each neighborhood, I will print the names of the top 5 for display.

1. St. Clair - 537 condemned properties
2. Perry South - 411 condemned properties
3. North Point Breeze - 291 condemned properties
4. Central Business District - 283 condemned properties
5. Larimer - 250 condemned properties

We could go ahead and conclude that these are top 5 best neighborhoods for a rat looking to move into Pittsburgh, but just because there are many homes in these areas, doesn't mean that they are high value homes. A rat would want only the finest and most disgusting place to settle down. So, we will now take the inspection score of each property into account.

As listed on the data sheet, "A score of 0 indicates that the property passed inspection. Higher integer values indicate the severity of a failed inspection." Therefore, we will be adding up the inspection scores of all properties in each zip code, and then sorting them from highest total score to lowest. This is a significant statistic because if an area has a lot of closed buildings, but they are all passing inspections, then the buildings may not be as suitable for a rat looking for some disgusting corner to camp in. 

One thing to note is that not every property in the dataset received a score, therefore those properties will simply be counted as 0.


In [5]:
data['latest_inspection_score'] = pd.to_numeric(data['latest_inspection_score'].replace('none', '0'), errors='coerce').fillna(0)

total_scores = data.groupby('Zip Code')['latest_inspection_score'].sum()
total_scores = total_scores.astype(int)

sorted_scores = total_scores.sort_values(ascending=False)

print(sorted_scores)

Zip Code
15210    866
15212    555
15219    437
15208    372
15206    303
15214    291
15207    224
15233    194
15204    131
15224    124
15201    104
15221     90
15220     85
15203     80
15211     48
15216     47
15213     44
15205     40
15235     26
15217     20
15226     17
15227     11
15222     10
15234      4
15120      2
15106      2
15232      1
15218      0
Name: latest_inspection_score, dtype: int64


We can now see that the ranking didn't change by a whole lot. Let's run some code to check why that may be.

In [6]:
score_counts = data['latest_inspection_score'].value_counts().sort_index()
print(score_counts)

0.0     1567
1.0      991
2.0      937
3.0      194
4.0       42
6.0        2
7.0        4
8.0        3
9.0        3
10.0       4
11.0       4
12.0       9
13.0       2
14.0       3
15.0       6
16.0       1
17.0       1
18.0       3
19.0       1
20.0       1
21.0       2
22.0       1
23.0       2
24.0       1
25.0       2
26.0       1
27.0       2
28.0       2
29.0       4
31.0       2
32.0       3
33.0       2
34.0       2
36.0       5
37.0       2
38.0       2
39.0       1
40.0       3
42.0       1
43.0       1
44.0       3
45.0       1
46.0       1
49.0       2
50.0       1
Name: latest_inspection_score, dtype: int64


Now we can see why the data didn't seem to change in ranking much. With few outliers compared to the amount of 0s, 1s, 2s, and 3s, it makes sense that the neighborhoods with significantly more properties would still have a much higher total inspection score value.

However, even if this data is not significant enough to make a conclusion yet, this is a vital piece of the puzzle needed to find our final metric.

To finish our data analysis, we will need to find a ratio between the amount of properties and the total inspection scores that determines which neighborhood would be best for a rat. 

To do that, we will take the average score of the properties in each neighborhood, but there is one thing we will need to take into account. Specifically, if we are going to use the average, we need to make sure that neighborhoods with few properties in general are removed from our list. This is because we cant decide that a neighborhood is the best just because it has 5 really bad properties in the whole area. The rats need places to live, and for that reason I've decided that any neighborhoods with less than 75 condemned properties are not eligible to be considered the best.

In [10]:
total_scores = data.groupby('Zip Code')['latest_inspection_score'].sum()
property_count = data.groupby('Zip Code').size().sort_values(ascending=False)

zip_analysis = pd.DataFrame({
    'Total Score': total_scores,
    'Property Count': property_count
})

min_properties = 75
zip_analysis = zip_analysis[zip_analysis['Property Count'] >= min_properties]

zip_analysis['Average Score'] = zip_analysis['Total Score'] / zip_analysis['Property Count']

worst_zip = zip_analysis.sort_values(by='Average Score', ascending=False)
print(worst_zip)

          Total Score  Property Count  Average Score
Zip Code                                            
15210           866.0             537       1.612663
15219           437.0             283       1.544170
15224           124.0              82       1.512195
15214           291.0             204       1.426471
15212           555.0             411       1.350365
15208           372.0             291       1.278351
15206           303.0             250       1.212000
15207           224.0             199       1.125628
15221            90.0              80       1.125000
15220            85.0              83       1.024096
15204           131.0             134       0.977612


And here we have it. The top 11 best neighborhoods to live in as a rat are:

1. St. Clair 
2. Central Business District
3. Bloomfield
4. Perry North
5. Perry South
6. North Point Breeze
7. Larimer
8. Hazelwood
9. Wilkinsburgh
10. Elliot
11. Sheraden

I picked this to be our metric over the total properties or score because a rat doesn't exactly have realtors telling them which available home is the best for them to live in. Therefore, under the assumption that a rat may choose a random condemned or dead-end property to live in, the most rat friendly neighborhood would be the one with the highest average inspection scores.

To conclude, the best neighborhood to live in, as a rat, is St. Clair (15210). They have the highest inspection score of 1.6, and they have the most condemned or dead-end properties in general, with 537 properties.