# Introducing my Submetric #

In terms of transportation, traffic data can be an important indicator of how easy or difficult transportation can become. The less cars and other modes of transportation that are there to block your way, the less likely you are to have an accident, and the easier it becomes to travel in that neighborhood.

About the dataset itself, there's a total of 420 trackers that Pittburgh tried to use to enumerate the amount of cars and/or bicycles. However, not all of the traffic counters actually shown data on the amount of cars, so they won't be included in the analysis. To determine the neighborhood that's the easiest to travel in according to traffic amounts, I'll be using the column that has the neighborhood's name on it, in which multiple trackers might have been placed in the same neighborhood. With that aside, let's get into analyzing the data.

### Data Analysis

In [1]:
import pandas as pd
print("Here's the first few elements to give an example of what the chart shows.")

traffic = pd.read_csv("traffic.tsv", sep="\t")
traffic.head()

Here's the first few elements to give an example of what the chart shows.


Unnamed: 0,_id,id,device_id,record_oid,count_start_date,count_end_date,average_daily_car_traffic,average_daily_bike_traffic,counter_number,counter_type,...,longitude,latitude,neighborhood,council_district,ward,tract,public_works_division,pli_division,police_zone,fire_zone
0,1,1011743669,85,1445865000.0,2019-04-18,2019-04-26,4949.0,,6.0,StatTrak,...,-79.967772,40.455733,Polish Hill,7.0,6.0,42003060500,6.0,6.0,2.0,2-6
1,2,1026101993,140,1121444000.0,2019-01-24,,,,,Intersection Study,...,-79.952249,40.466157,Central Lawrenceville,7.0,9.0,42003090200,2.0,9.0,2.0,3-6
2,3,1032382575,11,1539893000.0,2018-08-28,2018-09-04,,,,,...,-80.076469,40.460717,Windgap,2.0,28.0,42003563000,5.0,28.0,6.0,1-16
3,4,103627606,9,734195100.0,2018-07-17,2018-08-01,2741.0,,,StatTrak,...,-79.914335,40.437379,Squirrel Hill South,5.0,14.0,42003140800,3.0,14.0,4.0,2-18
4,5,1039546167,144,,,,,,,,...,-80.019211,40.490794,Perry North,1.0,26.0,42003260200,1.0,26.0,1.0,1-15


In the 'average_daily_car_traffic', only the first and the fourth counters had any data in that column. I'll do my best to use the data that's given to me.

In [2]:
print("Here's the first ten elements with data in the average_daily_car_traffic column via query mask.")
query_traffic = traffic['average_daily_car_traffic'] >= 0.1
traffData = traffic[query_traffic]
traffData.head(10)

Here's the first ten elements with data in the average_daily_car_traffic column via query mask.


Unnamed: 0,_id,id,device_id,record_oid,count_start_date,count_end_date,average_daily_car_traffic,average_daily_bike_traffic,counter_number,counter_type,...,longitude,latitude,neighborhood,council_district,ward,tract,public_works_division,pli_division,police_zone,fire_zone
0,1,1011743669,85,1445865000.0,2019-04-18,2019-04-26,4949.0,,6.0,StatTrak,...,-79.967772,40.455733,Polish Hill,7.0,6.0,42003060500,6.0,6.0,2.0,2-6
3,4,103627606,9,734195100.0,2018-07-17,2018-08-01,2741.0,,,StatTrak,...,-79.914335,40.437379,Squirrel Hill South,5.0,14.0,42003140800,3.0,14.0,4.0,2-18
5,6,1041392556,76,571347200.0,2019-03-22,2019-03-29,1046.0,,2.0,StatTrak,...,-80.014234,40.458106,Central Northside,6.0,25.0,42003250300,1.0,25.0,1.0,1-21
7,8,1053645660,87,804208900.0,2019-04-18,2019-04-26,3015.0,,4.0,StatTrak,...,-79.964592,40.457119,Polish Hill,7.0,6.0,42003060500,6.0,6.0,2.0,2-6
10,11,1102260248,8,1088318000.0,2018-07-16,2018-08-01,5592.0,,,StatTrak,...,-79.911925,40.43562,Squirrel Hill South,5.0,14.0,42003140800,3.0,14.0,4.0,2-21
13,14,1130122192,65,1341476000.0,2019-02-05,2019-02-14,1455.0,,,StatTrak,...,-79.979554,40.436482,Bluff,6.0,1.0,42003010300,3.0,1.0,2.0,2-1
14,15,115502120,176,1215643000.0,2019-08-09,2019-08-19,11500.0,,5.0,StatTrak,...,-80.052403,40.451926,Crafton Heights,2.0,20.0,42003281400,5.0,20.0,6.0,1-16
15,16,1155507145,72,1207664000.0,2019-02-14,2019-02-21,6793.0,,1.0,StatTrak,...,-79.943745,40.452709,Shadyside,8.0,7.0,42003070900,2.0,7.0,4.0,3-22
17,18,1158858353,36,1131387000.0,2018-06-25,2018-07-12,7688.0,,,StatTrak,...,-79.923316,40.478161,Highland Park,7.0,11.0,42003110200,2.0,11.0,5.0,3-9
18,19,1159852619,118,1024536000.0,2019-07-09,2019-07-12,10350.0,,12.0,StatTrak,...,-80.001937,40.447849,North Shore,1.0,22.0,42003563200,6.0,22.0,1.0,1-20


In [5]:
print("This is the first ten trackers in a sorted chart with increasing numbers from the average_daily_car_traffic column")
traffDataSorted = traffData.sort_values(by='average_daily_car_traffic')
traffDataSorted.head(10)

This is the first ten trackers in a sorted chart with increasing numbers from the average_daily_car_traffic column


Unnamed: 0,_id,id,device_id,record_oid,count_start_date,count_end_date,average_daily_car_traffic,average_daily_bike_traffic,counter_number,counter_type,...,longitude,latitude,neighborhood,council_district,ward,tract,public_works_division,pli_division,police_zone,fire_zone
182,183,856876185,77,675961300.0,2019-03-22,2019-03-31,23.0,,3,StatTrak,...,-80.012946,40.456612,Central Northside,1.0,22.0,42003220600,1.0,22.0,1.0,1-21
251,474,1271744444,269,903509200.0,2020-09-24,2020-10-02,58.0,,#6,StatTrak,...,-79.922844,40.468838,East Liberty,9.0,11.0,42003111300,2.0,11.0,5.0,3-8
307,727,938946316,326,224999000.0,2020-11-19,2020-12-02,63.0,,4,StatTrak,...,-79.997312,40.464395,Spring Hill-City View,1.0,24.0,42003262000,1.0,24.0,1.0,1-24
247,470,950865722,265,1160608000.0,2020-09-16,2020-09-24,77.0,,#2,StatTrak,...,-79.948895,40.465446,Bloomfield,7.0,9.0,42003090300,2.0,9.0,5.0,3-6
258,481,550216999,276,860508900.0,2020-07-06,2020-07-15,79.0,,,StatTrak,...,-79.921974,40.470257,Highland Park,7.0,11.0,42003110200,2.0,11.0,5.0,3-9
244,467,799096979,262,1169829000.0,2020-09-04,2020-09-11,79.0,,#5,StatTrak,...,-79.946953,40.465129,Bloomfield,7.0,8.0,42003080900,2.0,8.0,5.0,3-6
321,794,179318306,340,1151539000.0,2021-01-08,2021-01-16,109.0,,5,StatTrak,...,-80.026536,40.474191,Marshall-Shadeland,1.0,27.0,42003271500,1.0,27.0,1.0,1-14
310,730,614417587,329,1406278000.0,2020-12-03,2020-12-11,115.0,,1,StatTrak,...,-79.981096,40.431002,South Side Flats,3.0,17.0,42003170200,3.0,17.0,3.0,4-24
318,763,1844854594,337,37034470.0,2020-12-15,2020-12-23,126.0,,3,StatTrak,...,-79.915459,40.372514,Lincoln Place,5.0,31.0,42003310200,3.0,31.0,4.0,4-20
158,159,615223154,175,1247631000.0,2019-08-09,2019-08-19,131.0,,4,StatTrak,...,-80.073656,40.456401,Windgap,2.0,28.0,42003563000,5.0,28.0,6.0,1-16


It seems that Central Northside, East Liberty, and Spring Hill-City have strong cases with being the easiest transportation neighborhoods. However, recall that multiple trackers can exist in the same neighborhood. I'll try sorting the total average_daily_car_traffic using a dictionary.

In [14]:
car_count = dict()
lowest = 'temp'
car_count[lowest] = 100000
for index, row in traffData.iterrows():
    neighborhood = row['neighborhood']
    cars = row['average_daily_car_traffic']
    if neighborhood not in car_count:
        car_count[neighborhood] = cars
    else:
        car_count[neighborhood] = car_count[neighborhood] + cars
    if car_count[neighborhood] < car_count[lowest]:
        lowest = neighborhood
car_count.pop('temp')
sorted_cars = {k: v for k, v in sorted(car_count.items(), key=lambda item: item[1])}
print(sorted_cars)

{'South Side Flats': 115.0, 'Lincoln Place': 126.0, 'Lincoln-Lemington-Belmar': 195.0, 'Fineview': 419.0, 'Allegheny West': 477.0, 'New Homestead': 666.0, 'Overbrook': 777.0, 'Summer Hill': 937.0, 'Spring Garden': 1226.0, 'Beltzhoover': 1577.0, 'East Allegheny': 1924.0, 'Central Business District': 2305.0, 'Allegheny Center': 2386.0, 'St. Clair': 2436.0, 'Chartiers City': 2571.0, 'Hazelwood': 2579.0, 'West Oakland': 2585.0, 'Spring Hill-City View': 2586.0, 'Upper Hill': 2860.0, 'Windgap': 3062.0, 'Lower Lawrenceville': 3472.0, 'Elliott': 4028.0, nan: 4184.0, 'Beechview': 4255.0, 'Arlington': 4571.0, 'Friendship': 4887.0, 'Bluff': 5365.0, 'Duquesne Heights': 5831.0, 'Central Northside': 5945.0, 'Homewood North': 7027.0, 'Greenfield': 7158.0, 'Brighton Heights': 7547.0, 'Knoxville': 7646.0, 'Sheraden': 8532.0, 'Point Breeze North': 9928.0, 'North Shore': 10350.0, 'Central Oakland': 11306.0, 'Crafton Heights': 11500.0, 'Perry South': 12075.0, 'Central Lawrenceville': 13499.0, 'Carrick': 1

My reasoning ended up being effective, as the three neighborhoods I mentioned in the above comment ended up being completely different than the three neighborhoods that actually had the least recorded average_daily_car_traffic number. According to the dictionary, South Side Flats, Lincoln Place, and Lincoln-Lemington-Belmar holds the least recorded cars per day.

You might have noticed that there was also a column with bikes. I only decided for that to be a tie-breaker just incase two of the top neighborhoods ended up with the same number. I decided this, because car traffic affects more than just cars. Pedestrian transportation are also influenced by car traffic, as they cross streets, preferably at a time with less cars. Bicycles can also affect pedestrian traffic; however, there is an apparent disparity between the number of cars and bicycles accounted from the trackers in the chart.

## Conclusion

It was a close call between a few neighborhoods, but South Side Flats ended up on top as the neighborhood with the least traffic, at a daily 115 cars, leading to the easiest transportation. I must disclose something, however, as the dataset I chose is not up to date, due to a broken data feed since 2021. Because of this, the data might be inaccurate. The instructions requests me to compare this to my favorite neighborhood. The only neighborhood I recognize in the short time I've been at college is Shadyside, which I'm not surprised to see as the second most traffic-heavy neighborhood. This is my best attempt to find the best neighborhood for transportation.