# Basic Concepts in Probability #

In this lab session we'll focus on implementing and computing basic concepts in probability. More specifically we'll focus on the following three rules from the probability theory:  
  
  
* **Sum rule:**  
$\Large p(X) = \sum_Y{p(X,Y)}$
  
  
  
* **Product rule (also known as the chain rule):**  
$\Large p(X,Y) = p(Y|X)p(X)$

The elements in the above equations are reffered to as:  
* $\Large p(X,Y)=p(Y,X)$ - Joint probability
* $\Large p(Y|X)$ - Conditional probability  
If two random variables $X$ and $Y$ are independent of each other then their conditionaly probability $p(X|Y)$ is the same as the marginal $p(X)$.  
* $\Large p(X)$ - Marginal or Prior probability  
Under the sum rule this probability is known as the marginal probability since it is obtained by marganizling or summing out the other variable (Y) in the joint probability $p(X,Y)$.    

  
  
* **Bayes' Rule:**  
$\Large p(Y|X) = \frac{p(X,Y)}{p(X)} = \frac{p(X|Y)p(Y)}{p(X)}$  
  
  
Using the sum rule we could further expand the denominator in the Bayes' rule:  

$\Large p(Y|X) = \frac{p(X|Y)p(Y)}{p(X)} = \frac{p(X|Y)p(Y)}{\sum_Y{p(X,Y)}}$  

The denominator in the Bayes' rule plays an important factor in that it makes sure that the probability values across all Y variables sum to one and therefore it is oftern referred to as the normalizing constant.  
Bayes' rule plays an important role in machine learning. Under this theorem each probability component in the rule has a precise definition and it is referred to as: 

* $\Large p(Y|X)$ - Posterior probability  

* $\Large p(X|Y)$ - Likelihood 

* $\Large p(Y)$ - Prior probability  

* $\Large p(X)=\sum_Y{p(X,Y)}$ - Marginalized likelihood

One fundemental use of this rule is in Bayesian model estimation where we have two random variable, the model $M$ and the data sample $D$. We are interested in the most probable model that explaines the data:  

$\Large argmax_M{p(M|D)} = argmax_M{\frac{p(D|M)p(M)}{p(D)}}$  

The model is selected based on how well it explains the data sample $p(D|M)$ and in general how good of a model it is. The latter information is contained in the prior probability $p(M)$. The marginalizing constant doesn't play an important role in the process of finding the most suitable model as it is computed by marginalizing over all possible models. 


## Using the Above Rules ##

Let's start with a simple example. Let's assume that we have two boxes, a red one and a blue one. Let's also assume that both boxes are filled with balls with two colors: black and gold. More specifically, in the red box we have 7 gold and 5 black balls and in the blue box we have 3 gold and 6 black balls. 

We could also use this example to depict a situation where we have two cell phone stores from two different mobile operators (e.g. AT&T and Verizon) and both of these stores carry the same smartphone model (balls) that comes in two colors (gold and black). 
![Two Bins](prob_dist.jpg)

Give that the box is red the probability of choosing a golden ball is:  
$p(C=g|B=r) = \frac{7}{12}=0.583$  
  
  
The probability of choosing a black ball is:  
$p(C=k|B=r) = \frac{5}{12}=0.417$ or:  

$p(C=k|B=r)=1-p(C=g|B=r)$ 



We could compute the same conditional probabilities for the blue box:  
$p(C=g|B=b) = \frac{3}{9}=0.33$  

$p(C=k|B=b) = \frac{6}{9}=0.67$ or:  

$p(C=k|B=b)=1-p(C=g|B=b)$

Let's assume that the probability of choosing the red box is:  
$p(B=r)=\frac{2}{5}=0.4$  

Given that a person has picked a golden ball we are assigned with the task of computing the probability that the golden ball was picked from the red box, i.e. $p(B=r|C=g)$. How do we go about computing this probability?



The alternative description of the problem would be:  
We have the red box depicting the Verizon store and the blue box depicting the AT&T store in our neighborhood. We noticed that our neighbor Mark recently obtained a golden color iPhone and our goal is to compute the probability of Mark purchasing his new phone from the Verizon store. Since the Verizon store is further away from our apartment building compared to the AT&T store, we'll assing the probability of Mark visiting the Verizon store to be $p(Verizon)=0.4$ .

Let's start by writing down the probability of picking a golden ball $p(C=g)$:  

$p(C=g)=p(C=g|B=r)p(B=r)+p(C=g|B=b)P(B=b)$ 

$p(C=g)=0.583*0.4+0.33*0.6 = 0.233+0.198 = 0.433$   

We then derive the required probability:  


$\Large p(B=r|C=g)=\frac{p(C=g|B=r)p(B=r)}{p(C=g)}=\frac{0.583*0.4}{0.433}=0.538$

Let's now implement this computation in python:

In [1]:
red_box = ['gold', 'gold','gold','gold', 'gold', 'gold','gold','black','black','black','black','black']
blue_box = ['gold', 'gold','gold','black','black','black','black','black','black']

c_gr=0
c_kr=0
for balls in red_box:
    if (balls=='gold'):
        c_gr +=1
    else:
        c_kr+=1

c_gb=0
c_kb=0
for balls in blue_box:
    if (balls=='gold'):
        c_gb +=1
    else:
        c_kb+=1
print("Golden Balls in Red box="+str(c_gr))
print("Black Balls in Red box="+str(c_kr))
print("Golden Balls in Blue box="+str(c_gb))
print("Black Balls in Blue box="+str(c_kb))

p_gr = (1.0)*c_gr/(c_gr+c_kr)
p_kr = 1-p_gr

p_gb = (1.0)*c_gb/(c_gb+c_kb)
p_kb = 1-p_gb

p_r = 0.4
p_b = 1-p_r
p_g = p_gr*p_r+p_gb*p_b

Golden Balls in Red box=7
Black Balls in Red box=5
Golden Balls in Blue box=3
Black Balls in Blue box=6


**[Assignment 1]** How do we implement $p(B=r\ |\ C=g)$ using the code above?

**[Solution 1]**

In [2]:
p_rg = p_r*p_gr/p_g
p_rg

0.5384615384615385

## Working with Datasets ##

Let's now take a look at the tree census data and the NYPD motor vehicle collision data. 

As in the previous lab session we'll start by loading the two datasets.

In [3]:
import pandas as pd
tree_data_fn2 = './street_tree_census_data/2015_Street_Tree_Census_-_Tree_Data.tsv'
tree_data2 = pd.read_csv(tree_data_fn2, delimiter='\t')
tree_data2.dtypes

tree_id         int64
block_id        int64
created_at     object
tree_dbh        int64
stump_diam      int64
curb_loc       object
status         object
health         object
spc_latin      object
spc_common     object
steward        object
guards         object
sidewalk       object
user_type      object
problems       object
root_stone     object
root_grate     object
root_other     object
trunk_wire     object
trnk_light     object
trnk_other     object
brch_light     object
brch_shoe      object
brch_other     object
address        object
zipcode       float64
zip_city       object
cb_num        float64
borocode        int64
boroname       object
cncldist        int64
st_assem        int64
st_senate     float64
nta            object
nta_name       object
boro_ct         int64
state          object
latitude      float64
longitude     float64
x_sp          float64
y_sp          float64
dtype: object

In [4]:
coll_data_fn = './NYPD_Motor_Vehicle_Collisions.tsv'
coll_data = pd.read_csv(coll_data_fn, delimiter='\t')
coll_data

Unnamed: 0,DATE,TIME,BOROUGH,ZIP CODE,LATITUDE,LONGITUDE,LOCATION,ON STREET NAME,CROSS STREET NAME,OFF STREET NAME,...,CONTRIBUTING FACTOR VEHICLE 2,CONTRIBUTING FACTOR VEHICLE 3,CONTRIBUTING FACTOR VEHICLE 4,CONTRIBUTING FACTOR VEHICLE 5,UNIQUE KEY,VEHICLE TYPE CODE 1,VEHICLE TYPE CODE 2,VEHICLE TYPE CODE 3,VEHICLE TYPE CODE 4,VEHICLE TYPE CODE 5
0,08/25/2015,19:00,,,40.732941,-73.920382,"(40.7329414, -73.9203819)",,,,...,Unspecified,,,,3284922,PASSENGER VEHICLE,TAXI,,,
1,07/27/2012,22:35,,,,,,,,,...,Unspecified,,,,2833714,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
2,05/17/2014,0:45,,,,,,PENNSYLVANIA AVENUE,RIVERDALE AVENUE,,...,,,,,336679,,,,,
3,11/10/2016,15:11,,,,,,MONROE STREET,,,...,Unspecified,,,,3559084,PASSENGER VEHICLE,,,,
4,05/11/2016,14:01,BROOKLYN,11217.0,40.685885,-73.973376,"(40.6858851, -73.9733756)",FULTON STREET,SOUTH OXFORD STREET,,...,Unspecified,,,,3440855,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
5,05/11/2016,14:05,MANHATTAN,10016.0,40.750218,-73.979056,"(40.750218, -73.979056)",EAST 39 STREET,PARK AVENUE,,...,Unspecified,,,,3439718,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
6,05/11/2016,14:05,QUEENS,11435.0,,,,,,138-19 HILLSIDE AVENUE,...,Unspecified,,,,3440233,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
7,05/11/2016,14:07,,,,,,,WHITESTONE EXPRESSWAY,,...,Unspecified,,,,3440245,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
8,05/11/2016,14:15,BRONX,10454.0,40.809060,-73.907538,"(40.8090601, -73.907538)",EAST 144 STREET,SOUTHERN BOULEVARD,,...,Unspecified,,,,3439814,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
9,05/11/2016,14:15,BRONX,10468.0,40.869834,-73.898740,"(40.8698344, -73.8987397)",,,2756 RESERVOIR AVENUE,...,Unspecified,,,,3440604,PASSENGER VEHICLE,,,,


Since the NYPD collision data doesn't come with a description of the variables we'll look into the headers of each colum.

In [5]:
coll_data.dtypes

DATE                              object
TIME                              object
BOROUGH                           object
ZIP CODE                         float64
LATITUDE                         float64
LONGITUDE                        float64
LOCATION                          object
ON STREET NAME                    object
CROSS STREET NAME                 object
OFF STREET NAME                   object
NUMBER OF PERSONS INJURED          int64
NUMBER OF PERSONS KILLED           int64
NUMBER OF PEDESTRIANS INJURED      int64
NUMBER OF PEDESTRIANS KILLED       int64
NUMBER OF CYCLIST INJURED          int64
NUMBER OF CYCLIST KILLED           int64
NUMBER OF MOTORIST INJURED         int64
NUMBER OF MOTORIST KILLED          int64
CONTRIBUTING FACTOR VEHICLE 1     object
CONTRIBUTING FACTOR VEHICLE 2     object
CONTRIBUTING FACTOR VEHICLE 3     object
CONTRIBUTING FACTOR VEHICLE 4     object
CONTRIBUTING FACTOR VEHICLE 5     object
UNIQUE KEY                         int64
VEHICLE TYPE COD

Spend some time to look into the two data collections and decide on what aspect of the data you would like to model. Either start working on computing the probabilities based on your own task or follow the examples below.

### Example 1 ###

In this example we'll compute the probability of having seen a particular tree species in one of the five NYC boroughs. More specifically let's assume that we are taking a walk with our friend in Riverside park. We notice a northern red oak tree. As we admire its color, out friend remembers that she may have seen this tree in one of the other boroughs in this past month. She thinks that most likely she's seen this tree species in Brooklyn where her boyfriend lives but she is not sure about it. Given the tree census data and given that our friend has spend 10 days in Brooklyn, 15 days in Manhattan and the remaining 6 days in the other 3 boroughs in the past months what is the probability that she has seen this tree in each of the five boroughs? 

**[Assingment 2]** Spend some time to formulate the solution of the problem and derive the required probability expressions.

**[Assingment 2 Hint]**
We are tasked to compute the following conditional probabilities:  

$p(Borough\ |\ northern\ red\ oak)$,  

given that the person has visited all five boroughs within the past month with the following probability:  

$p(Borough=Brooklyn)=\frac{10}{31}$  

$p(Borough=Manhattan)=\frac{15}{31}$  

$p(Borough=Staten\ Island)=\frac{2}{31}$  

$p(Borough=Bronx)=\frac{2}{31}$  

$p(Borough=Queens)=\frac{2}{31}$  

**[Solution 2]** We are tasked to compute the following conditional probabilities:  

$p(Borough\ |\ northern\ red\ oak)$,  

given that the person has visited all five boroughs within the past month with the following probability:  

$p(Borough=Brooklyn)=\frac{10}{31}$  

$p(Borough=Manhattan)=\frac{15}{31}$  

$p(Borough=Staten\ Island)=\frac{2}{31}$  

$p(Borough=Bronx)=\frac{2}{31}$  

$p(Borough=Queens)=\frac{2}{31}$  

We use Bayes' Rule to derive the required probabilty:  

$\Large p(northern\ red\ oak\ |\ Borough)\ =\ \frac{p(northern\ red\ oak\ |\ Borough)\ p(Borough)}{\sum_{Borough}{p(northern\ red\ oak\ |\ Borough)}}$  

**[Assignment 3]** Implement the above solution and compute the probability

**[Solution 3]**  

First compute the probability that our friend has been in one of the 5 boroughs in the past month:

In [6]:
p_brooklyn = 10.0/31
p_manhattan = 15.0/31
p_queens = 2.0/31
p_bronx = 2.0/31
p_si = 2.0/31

print (p_brooklyn)

0.322580645161


In [7]:
boro_spcs=tree_data2[['boroname','spc_common']]
boro_counts = pd.value_counts(boro_spcs['boroname'])
boro_counts

Queens           184137
Brooklyn         115783
Staten Island     62971
Bronx             46075
Manhattan         37697
Little Neck           1
Name: boroname, dtype: int64

In [8]:
boro_nro = boro_spcs[boro_spcs['spc_common']=='northern red oak']
nro_counts = pd.value_counts(boro_nro['boroname'])
nro_counts

Queens           2049
Brooklyn         1780
Bronx             859
Manhattan         629
Staten Island     369
Name: boroname, dtype: int64

In [9]:
p_nro_brooklyn = (1.0)*nro_counts['Brooklyn']/boro_counts['Brooklyn']
print ("p(nro|brooklyn)="+str(p_nro_brooklyn))
p_nro_queens = (1.0)*nro_counts['Queens']/boro_counts['Queens']
print ("p(nro|queens)="+str(p_nro_queens))
p_nro_bronx = (1.0)*nro_counts['Bronx']/boro_counts['Bronx']
print ("p(nro|bronx)="+str(p_nro_bronx))
p_nro_si = (1.0)*nro_counts['Staten Island']/boro_counts['Staten Island']
print ("p(nro|si)="+str(p_nro_si))
p_nro_manhattan = (1.0)*nro_counts['Manhattan']/boro_counts['Manhattan']
print ("p(nro|manhattan)="+str(p_nro_manhattan))

p(nro|brooklyn)=0.015373586796
p(nro|queens)=0.011127584353
p(nro|bronx)=0.0186435160065
p(nro|si)=0.00585984024392
p(nro|manhattan)=0.0166856779054


In [10]:
p_nro = p_nro_brooklyn*p_brooklyn+p_nro_queens*p_queens+p_nro_bronx*p_bronx+p_nro_si*p_si+p_nro_manhattan*p_manhattan
p_nro

0.01533170702411999

In [11]:
p_brooklyn_nro = (1.0)*(p_nro_brooklyn*p_brooklyn)/p_nro
print ("p(brooklyn|nro)="+str(p_brooklyn_nro))
p_queens_nro = (1.0)*(p_nro_queens*p_queens)/p_nro
print ("p(queens|nro)="+str(p_queens_nro))
p_bronx_nro = (1.0)*(p_nro_bronx*p_bronx)/p_nro
print ("p(bronx|nro)="+str(p_bronx_nro))
p_si_nro = (1.0)*(p_nro_si*p_si)/p_nro
print ("p(si|nro)="+str(p_si_nro))
p_manhattan_nro = (1.0)*(p_nro_manhattan*p_manhattan)/p_nro
print ("p(manhattan|nro)="+str(p_manhattan_nro))

p(brooklyn|nro)=0.323461799739
p(queens|nro)=0.0468250969578
p(bronx|nro)=0.0784522873023
p(si|nro)=0.024658324653
p(manhattan|nro)=0.526602491348


Spend some time to analyze the results and to draw conclusions. 

### Example 2 ###

In this example we are going to answer a simpler question. Given the NYPD collision data and assuming that a person has been involved in a traffic incident, for each of the five boroughs compute the empirical probability of a pedestrian being injured between 2pm and 3pm. Assume that the probability of being injured as a pedestrian and the probability of being injured between 2-3pm are independent of each other given that the person has been involved in an accident. 

**[Assignment 4]**
Spend some time to formulate the solution of the problem and to derive the required probability expressions.

**[Assignment 4 Hint]**
We are tasked to compute the following conditional probability:  
$p(pedestrian,2-3pm\ |\ involved,\ Borough)$ 

**[Solution 4]**  

We are tasked to compute the following conditional probability:  
$p(pedestrian,2-3pm\ |\ involved,\ Borough)$  

Unlike the previous example in this example we'll assume that we are equally likely to be in any of the five boroughs. 
We then have:  
$p(pedestrian,\ 2-3pm\ |\ Borough,\ involved)\ = \ p(pedestrian\ |\ Borough,\ involved)*p( 2-3pm\ |\ Borough,\ involved)$  



**[Assignment 5]** Implement the above solution and compute the probability

**[Solution 5]**
Start by computing the probability of being injured as a pedestrian in each of the 5 boroughs:

In [12]:
boro_stats = coll_data.groupby('BOROUGH').sum()
boro_stats

Unnamed: 0_level_0,ZIP CODE,LATITUDE,LONGITUDE,NUMBER OF PERSONS INJURED,NUMBER OF PERSONS KILLED,NUMBER OF PEDESTRIANS INJURED,NUMBER OF PEDESTRIANS KILLED,NUMBER OF CYCLIST INJURED,NUMBER OF CYCLIST KILLED,NUMBER OF MOTORIST INJURED,NUMBER OF MOTORIST KILLED,UNIQUE KEY
BOROUGH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
BRONX,946884900.0,3528566.0,-6381864.0,26277,111,6736,68,1589,7,19051,38,162285438694
BROOKLYN,2396929000.0,8342707.0,-15175110.0,63427,260,14858,153,6963,22,43521,86,382324801866
MANHATTAN,1811705000.0,7115376.0,-12912360.0,30697,147,11182,118,5238,11,15133,20,303438774961
QUEENS,2053302000.0,7035624.0,-12756780.0,46843,232,9931,133,3421,17,35360,85,331491475613
STATEN ISLAND,334125500.0,1268370.0,-2316509.0,7862,46,1234,19,176,2,6659,25,53154704626


In [13]:
boro_victim_stats = boro_stats[[
'NUMBER OF PERSONS INJURED',
'NUMBER OF PERSONS KILLED',
'NUMBER OF PEDESTRIANS INJURED',
'NUMBER OF PEDESTRIANS KILLED',
'NUMBER OF CYCLIST INJURED',
'NUMBER OF CYCLIST KILLED',
'NUMBER OF MOTORIST INJURED',
'NUMBER OF MOTORIST KILLED']]
boro_victim_stats.transpose()

BOROUGH,BRONX,BROOKLYN,MANHATTAN,QUEENS,STATEN ISLAND
NUMBER OF PERSONS INJURED,26277,63427,30697,46843,7862
NUMBER OF PERSONS KILLED,111,260,147,232,46
NUMBER OF PEDESTRIANS INJURED,6736,14858,11182,9931,1234
NUMBER OF PEDESTRIANS KILLED,68,153,118,133,19
NUMBER OF CYCLIST INJURED,1589,6963,5238,3421,176
NUMBER OF CYCLIST KILLED,7,22,11,17,2
NUMBER OF MOTORIST INJURED,19051,43521,15133,35360,6659
NUMBER OF MOTORIST KILLED,38,86,20,85,25


In [14]:
injured_counts = boro_victim_stats.transpose().sum(axis=0)
injured_counts

BOROUGH
BRONX             53877
BROOKLYN         129290
MANHATTAN         62546
QUEENS            96022
STATEN ISLAND     16023
dtype: int64

In [15]:
ped_inj_counts = boro_stats['NUMBER OF PEDESTRIANS INJURED']
ped_inj_counts

BOROUGH
BRONX             6736
BROOKLYN         14858
MANHATTAN        11182
QUEENS            9931
STATEN ISLAND     1234
Name: NUMBER OF PEDESTRIANS INJURED, dtype: int64

In [16]:
p_inj_brooklyn = (1.0)*ped_inj_counts['BROOKLYN']/injured_counts['BROOKLYN']
print ("p(pedestrian|brooklyn, involved)="+str(p_inj_brooklyn))
p_inj_queens = (1.0)*ped_inj_counts['QUEENS']/injured_counts['QUEENS']
print ("p(pedestrian|queens, involved)="+str(p_inj_queens))
p_inj_bronx = (1.0)*ped_inj_counts['BRONX']/injured_counts['BRONX']
print ("p(pedestrian|bronx, involved)="+str(p_inj_bronx))
p_inj_si = (1.0)*ped_inj_counts['STATEN ISLAND']/injured_counts['STATEN ISLAND']
print ("p(pedestrian|state island, involved)="+str(p_inj_si))
p_inj_manhattan = (1.0)*ped_inj_counts['MANHATTAN']/injured_counts['MANHATTAN']
print ("p(pedestrian|manhattan, involved)="+str(p_inj_manhattan))

p(pedestrian|brooklyn, involved)=0.114919947405
p(pedestrian|queens, involved)=0.103424215284
p(pedestrian|bronx, involved)=0.125025521094
p(pedestrian|state island, involved)=0.0770142919553
p(pedestrian|manhattan, involved)=0.178780417613


Next we'll compute the probability of being injured between 2pm and 3pm.

In [17]:
collision_boro = coll_data['BOROUGH']
collision_boro
coll_counts = pd.value_counts(collision_boro)
coll_counts

BROOKLYN         213651
QUEENS           180768
MANHATTAN        180735
BRONX             90506
STATEN ISLAND     32417
Name: BOROUGH, dtype: int64

In [18]:
time_boro = coll_data[['TIME','BOROUGH']]

In [19]:
twopm_injuries = time_boro.loc[coll_data['TIME'].str.contains('14:')]
twopm_injuries_borough =  pd.value_counts(twopm_injuries['BOROUGH'])
twopm_injuries_borough

BROOKLYN         14774
QUEENS           12734
MANHATTAN        12572
BRONX             6030
STATEN ISLAND     2437
Name: BOROUGH, dtype: int64

In [20]:
p_2pm_brooklyn = (1.0)*twopm_injuries_borough['BROOKLYN']/coll_counts['BROOKLYN']
print ("p(2-3pm|brooklyn, involved)="+str(p_2pm_brooklyn))
p_2pm_queens = (1.0)*twopm_injuries_borough['QUEENS']/coll_counts['QUEENS']
print ("p(2-3pm|queens, involved)="+str(p_2pm_queens))
p_2pm_bronx = (1.0)*twopm_injuries_borough['BRONX']/coll_counts['BRONX']
print ("p(2-3pm|bronx, involved)="+str(p_2pm_bronx))
p_2pm_si = (1.0)*twopm_injuries_borough['STATEN ISLAND']/coll_counts['STATEN ISLAND']
print ("p(2-3pm|state island, involved)="+str(p_2pm_si))
p_2pm_manhattan = (1.0)*twopm_injuries_borough['MANHATTAN']/coll_counts['MANHATTAN']
print ("p(2-3pm|manhattan, involved)="+str(p_2pm_manhattan))

p(2-3pm|brooklyn, involved)=0.0691501560957
p(2-3pm|queens, involved)=0.0704438838733
p(2-3pm|bronx, involved)=0.0666254170994
p(2-3pm|state island, involved)=0.0751766048678
p(2-3pm|manhattan, involved)=0.0695604061195


In [21]:
p_inj_brooklyn_2pm = p_inj_brooklyn*p_2pm_brooklyn
print ("p(pedestrian,2−3pm|brooklyn,involved)="+str(p_inj_brooklyn_2pm))
p_inj_queens_2pm = p_inj_queens*p_2pm_queens
print ("p(pedestrian,2-3pm|queens,involved)="+str(p_inj_queens_2pm))
p_inj_bronx_2pm = p_inj_bronx*p_2pm_bronx
print ("p(pedestrian,2-3pm|bronx,involved)="+str(p_inj_bronx_2pm))
p_inj_si_2pm = p_inj_si*p_2pm_si
print ("p(pedestrian,2-3pm|brooklyn,involved)="+str(p_inj_si_2pm))
p_inj_manhattan_2pm = p_inj_manhattan*p_2pm_manhattan
print ("p(pedestrian,2-3pm|manhattan,involved)="+str(p_inj_manhattan_2pm))

p(pedestrian,2−3pm|brooklyn,involved)=0.00794673230157
p(pedestrian,2-3pm|queens,involved)=0.00728560341115
p(pedestrian,2-3pm|bronx,involved)=0.00832987749098
p(pedestrian,2-3pm|brooklyn,involved)=0.0057896729955
p(pedestrian,2-3pm|manhattan,involved)=0.0124360384553


Spend some time to analyze the results and to draw conclusions. 