# The Best Neighborhood in Pittsburgh: Halloween Trick-or-Treating
## This neighborhood... 
* has the best sense of community
* great for families and children
* will provide the highest volume/best quality candy
* is very walkable

### Submetric 1: Homeownership

Find the neighborhood with higher volume of property owners.  
1. Owning property is characteristic of settling down  
2. Majority of homeowners are 33+ (typical age of parents)  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a. important to eliminate neighborhoods with large amounts of college students
  
GOAL: Find the neighborhood with the best Halloween candy selection.

#### Import libraries and read in data

In [5]:
import pandas as pd
import numpy as np
homeownership = pd.read_csv("Homeownership.csv")
neighborhood = pd.read_csv("neighborhood.csv")

#### Check data types of dataframes

In [6]:
print("homeownership dftypes BEFORE\n",homeownership.dtypes)
print("\nneighborhood dtype BEFORE :", neighborhood['tractce10'].dtype)

homeownership dftypes BEFORE
 Census Tract       float64
TotalPopulation     object
OwnedMortgage       object
OwnedFree           object
RenterOccupied      object
dtype: object

neighborhood dtype BEFORE : object


#### -> Format homeownership data to be numeric

In [7]:
# fix types to be numeric
homeownership[['TotalPopulation','OwnedMortgage','OwnedFree','RenterOccupied']]\
= homeownership.apply(lambda x: x[['TotalPopulation','OwnedMortgage','OwnedFree','RenterOccupied']].str.replace(',','').astype(float), axis=1)

#### -> Format neighborhood data to be numeric

In [8]:
# utilize Pandas Series to convert all elements into one datatype
census_tract = pd.Series(neighborhood.iloc[:,4]) # index 4 is tractce10 column

for i in range(len(census_tract)):
    string = census_tract[i]
    # if not empty string
    if (string != ''):
        firstChar = string[0]
        try:
            float(firstChar)
            # if the firstChar is a number that starts with 0 as a placeholder (e.g., 024500)
            if (firstChar == '0' and len(string) > 1):
                # remove the first character
                string = string[1:]
            # remove extra zeros at the end
            census_tract[i] = string[0:len(string)-2]
            
        # first char cannot be converted to a float
        except ValueError:
            census_tract[i] = 0
    
    # if empty string
    else:
        census_tract[i] = 0
# end of for loop

# create dataframe and cast elements to float
census_tracts_mapping = pd.DataFrame(census_tract).astype(float, errors = 'raise')

#### Check data types (again)

In [9]:
print("homeownership dftypes AFTER\n",homeownership.dtypes)
print("\ndtype neighborhood dtype AFTER:", census_tracts_mapping['tractce10'].dtypes)

homeownership dftypes AFTER
 Census Tract       float64
TotalPopulation    float64
OwnedMortgage      float64
OwnedFree          float64
RenterOccupied     float64
dtype: object

dtype neighborhood dtype AFTER: float64


#### Find total number of homeowners (payed mortgage + making payments at time of census)

In [10]:
# sum OwnedMortgage and OwnedFree columns
homeowners = homeownership['OwnedMortgage'] + homeownership['OwnedFree']

#### Calculate ratio of homeowners to renters in each census tract

In [14]:
# find ratio of homeowners to renters
ratioSettled2Renting = homeowners/homeownership['RenterOccupied']

#### Create a new dataframe

In [16]:
# create new dataframe
ratio = pd.DataFrame(ratioSettled2Renting, columns=["Settled/renting"])
census_tracts = pd.DataFrame(homeownership['Census Tract'])
population = pd.DataFrame(homeownership['TotalPopulation'])
my_df = census_tracts.join(ratio).join(population)

# SORT & FILTER
my_dfsorted = my_df.sort_values(by=['Settled/renting'], ascending=False)
my_dfFiltered = my_dfsorted[my_dfsorted['TotalPopulation'] >= 500]

my_dfFiltered.head(10)

Unnamed: 0,Census Tract,Settled/renting,TotalPopulation
152,4268.0,43.741667,5369.0
244,4753.03,33.151261,4064.0
384,5641.0,32.923077,882.0
122,4100.0,30.648649,1171.0
237,4742.01,28.545455,2600.0
239,4742.03,27.30625,4529.0
125,4120.02,26.331461,4865.0
318,5190.0,23.680328,3011.0
149,4263.0,22.647059,6030.0
130,4134.0,21.97861,4297.0


#### Map census tract to neighborhood via neighborhood.csv
##### Filter out Allegheny census tracts that are not City of Pittsburgh census tracts.

In [20]:
# list of 90 census tracts mapped to city of Pittsburgh neighborhood
city_of_pgh_cetracts = census_tracts_mapping['tractce10']
# list of census tracts included in the homeowners dataset for allegheny county
allegheny_cetracts = my_dfFiltered['Census Tract']
# create a dictionary of city of pgh census tract -> city of pgh neighborhood
dict = {'census tract': city_of_pgh_cetracts, 'pgh neighborhood': neighborhood['hood']}
dict = pd.DataFrame(dict)

best_tracts_pgh = [0]*len(allegheny_cetracts)
best_nbhds= []
best_match = 6000
limit = 5 # ensure we find the closest census tracts to represent the neighborhood 
cnt = 0
# loop through allegheny census tracts
for i in allegheny_cetracts:
    best_match = 6000
    # loop through all 90 pgh cetracts
    for j in city_of_pgh_cetracts:
        best_tracts_pgh[cnt] = 0
        diff = i - j
        # if diff is 0 then we found the neighborhood
        if diff == 0:
            # add i to best_tracts_pgh
            best_tracts_pgh[cnt] = j
            break 
        elif abs(diff) < best_match and abs(diff) < limit:
            best_match = diff
            best_tracts_pgh[cnt] = j
    cnt = cnt + 1

#### Retrieve the top ten neighborhoods

In [23]:
for w in best_tracts_pgh:
    index = 0
    if (w!=0):
        for p in dict['census tract']:
            if (w==p):
                best_nbhds.append(dict['pgh neighborhood'][index])
                break
            index = index + 1
                
best_nbhds_homeownership = pd.DataFrame(best_nbhds[0:10])
best_nbhds_homeownership.columns = ['Neighborhood']
best_nbhds_homeownership     

Unnamed: 0,Neighborhood
0,New Homestead
1,Stanton Heights
2,Lincoln Place
3,Swisshelm Park
4,Overbrook
5,Summer Hill
6,Regent Square
7,Brookline
8,Squirrel Hill North
9,Brighton Heights


### Submetric 2:

description

### Submetric 3: 

description

### Combining submetrics

method

### Conclusion

response 1 (name):

response 2 (name):

response 3 (name):