# Data Management with Pandas, NumPy, Discriptive Statistics

In this WA you will learn:
- How  to import data using _Pandas_ library, view your data, and modify data
- Understand what is **Data Frame**
- Explore _NumPy_ library
- Learn how to use _NumPy_ and _Pandas_ to get discriptive statistics 



The main data structure that _Pandas_ uses is called a **Data Frame**.  This is a two-dimensional table of data in which the rows typically represent different entries of collected samples (e.g., participants), and the columns represent variables that characterised collected samples (e.g., age, education).  
_Pandas_ also has a one-dimensional data structure called a **Series** that we will encounter when accesing a single column of a Data Frame.



In [1]:
### To work with Pandas libriary, let's import it
import pandas as pd

## Utilizing Library Functions

After importing a library, its functions can then be called from your code by prepending the library name to the function name.  For example, to use the '`dot`' function from the '`numpy`' library, you would enter '`numpy.dot`'.  To avoid repeatedly having to type the libary name in your scripts, it is conventional to define a two or three letter abbreviation for each library, e.g. '`numpy`' is usually abbreviated as '`np`'.  This allows us to use '`np.dot`' instead of '`numpy.dot`'.  Similarly, the Pandas library is typically abbreviated as '`pd`'.

## Importing Data

_Pandas_ has a variety of functions named '`read_xxx`' for reading data in different formats.  We will focus on reading '`csv`' files, which stands for comma-separated values. However, it is possible to read the other file formats include excel, json, and sql just to name a few.

There are many other **options** to '`read_csv`' that are very useful.  

For example, you would use the option `sep='\t'` instead of the default `sep=','` if the fields of your data file are delimited by tabs instead of commas.

See [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) for the full documentation for '`read_csv`'.

Let's see how we can load data from .cvs file using _Pandas_. In the future, you can always coppy this cell to import your data. 
For this course we will be using the [NextGen National Household Travel Survey (NHTS) data](https://nhts.ornl.gov/downloads).

Full reference: 
Federal Highway Administration. (2022). 2022 NextGen National Household Travel Survey Core
Data, U.S. Department of Transportation, Washington, DC. Available online:
http://nhts.ornl.gov.


In [3]:
### Store the url string that hosts the perv2pub.csv file.
### In my case it is in the folder named 'cvs' that is located in the same folder as this Jupyter Notebook file
### copy data folder csv to the same directory as this notebook
url = "perv2pub.csv"

### Note that we will be using file perv2pub.csv for this assignment
### This is th PERSON file, it contains characteristics of individual Household members 
### (e.g., age and other demographic attributes).


### Read the .csv file using `Pandas` libriary and store it as a `Pandas` Data Frame
df = pd.read_csv(url)

### Let's now check what data type we stored into `df`. This code will output object type
type(df)
pd.set_option('display.max_columns', None) # This option allowes to display ALL columns

## Viewing Data

In [None]:
### Now you can view the Data Frame by calling the head() function. By default only 5 rows are discplayed. 
### You can increase it by typing an integer number in the parentheses
df.head(10)

The *head()* function by default shows the first 5 rows of our Data Frame.  If we wanted to show the entire Data Frame we would simply write the following:

In [None]:
### Output entire Data Frame
### Note that we can see all columns but limited number of rows 
### Becasue we have not set up option for maximum displayed rows, they are dispayed partially

df

As you can see, we have a 2-Dimensional object where each row is an independent observation of our data.

To gather more information regarding the data, we can view the column names with the following functions:

In [None]:
print(df.columns)

In [None]:
### Convert the df to a list to bypass Pandas truncation
print(df.columns.tolist())

## Selecting Sections of Data Frame

Lets say we would like to select only specific portions of our data.  There are three different ways of doing so.

1. .loc()
2. .iloc()
3. .ix()

We will cover the *.loc()* and *.iloc()* splicing functions.

### .loc()
*.loc()* takes two single/list/range operator separated by ','. The first one indicates the rows and the second one indicates columns.

If you need to select all columns or all rows, type ":". In other words, command $df.loc[ : ,  : ]$ select the whole data frame

In [None]:
### Return all observations of EDUC (Respondent education level)
df.loc[:,"EDUC"]

In [None]:
### Select all rows for multiple columns, ["R_SEX", "EDUC", "DRIVER"]
df.loc[:,["R_SEX", "EDUC", "DRIVER"]]

In [None]:
### Select few rows for multiple columns, ["R_SEX", "EDUC", "DRIVER"]
df.loc[:9, ["R_SEX", "EDUC", "DRIVER"]]

In [None]:
### Select range of rows for all columns
df.loc[10:15]


### .iloc()
*.iloc()* is integer based slicing, whereas *.loc()* used labels/column names. Here are some examples:

In [None]:
df.iloc[10:15] # if we slice ONLY rows, command looks exactly the same as .loc, but notice the difference in the output!

In [None]:
df.iloc[1:5, 2:4] # .iloc seelect rows and columns by thier number i nthe data set, including the first one and excluding the last

In [None]:
df.iloc[1:5, ["R_SEX", "EDUC"]] #This code is wrong, can you spot a mistake before running the code?

We can view the data types of our data frame columns by calling *.dtypes* on our data frame:

In [None]:
df.dtypes

In [None]:
### Convert the df to a list to bypass Pandas truncation
print(df.dtypes.tolist())

The output indicates we have integers and floats in our Data Frame.

We may also want to observe the different unique values within a specific column, lets do this for respondent sex and education:

In [None]:
### List unique values in the df['R_SEX'] column
df.R_SEX.unique()

In [None]:
### Lets explore df['EDUC'] as well
df.EDUC.unique()

These fields may serve the purpose to specify education levels of male vs. female. Lets check this quickly by observing only these two columns at the same time:

In [None]:
### Use .loc() to specify a list of mulitple column names
df.loc[20:50,["R_SEX", "EDUC"]]

From eyeballing the output, it seems difficult to check out distribution between sex and education.  We can streamline this by utilizing the _groupby()_ and _size()_ functions.
More about _groupby()_ read here:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

and here for different parametrs such as _size()_

https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html


In [None]:
df.groupby(['R_SEX','EDUC']).size() # here, .count() can be used instead of .size()

In [None]:
### First 10 entries for the column with participant ages saved in separate variable
listfromDF = df.loc[:9,["R_AGE"]]
print(listfromDF)

## Discriptive Statistics with Pandas and Numpy

In [None]:
### Import Numpy library
import numpy as np 


### What is Python Library  NumPy

<span style="color:blue;">_NumPy_</span> is the fundamental package for scientific computing with Python. It contains among other things:

* a powerful N-dimensional array object
* useful linear algebra, Fourier transform
* random number capabilities, etc.

We will start with the <span style="color:blue;">_NumPy_</span> array object.

### Numpy Array

A <span style="color:blue;">_NumPy_</span> array is a grid of values, all of the **same type**, and is indexed by a tuple of **nonnegative integers**. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [None]:
### We have explored arrays already in a form of a list data type. 
### NumPy arrays has more functionality
### Let's, first, create a 3x1 NumPy array with 3 entries '1', '2', and '3'
a = np.array([1, 2, 3]) # Here how you do it

### Print object type
print("NumPy array object type: ", type(a))

### Compare it with list data type
alist = [1, 2, 3]
print("List as a data type: ", type(alist))

In [None]:
### .shape is a funcition what returns the size of teh NumPy array
print("NumPy array 'a' shape: ", a.shape)
### List does not have attribute shape
print("NumPy array 'a' shape: ", alist.shape)

In [None]:
### You can select and use any entry of NumPy array, let's print them 
print("Some values from the NumPy array 'a': ", a[2], a[0], a[1])

In [None]:
### Create a 2x2 NumPy array
b = np.array([[1,2],[3,4]])
print("NumPy array b: ")
print(b)
print("~~~~~~~~~~~~~~~~~~~~~")
blist = [[1,2],[3,4]]
print("List blist: ")
print(blist)
print("~~~~~~~~~~~~~~~~~~~~~")
### Print the shape of the NumPy array b
print("NumPy array 'b' shape: ", b.shape)

### Print some values fromm the NumPy array b
print("Some values from the NumPy array 'b': ", b[0,0], b[0,1], b[1,1]) 
### Numbers in brackets identify specific entry in the array b
### This is done using convention [row, column]

In [None]:
### Create a 3x2 NumPy array
c = np.array([[1,2],[3,4],[5,6]])

print(c)
print("~~~~~~~~~~~~~~~~~~~~~")

### Print shape of c
print("NumPy array 'b' shape: ", c.shape)
print("~~~~~~~~~~~~~~~~~~~~~")

### Print some values in c
print("Some values from the NumPy array 'c': ", c[0,1], c[1,0], c[2,0], c[2,1])

In [None]:
### Using NumPy we can easily create arrays of the given size with all zeros
### Here is an example of 2x3 zero array 
d = np.zeros((2,3))

print("All zeros 2x3 array: ")
print("~~~~~~~~~~~~~~~~~~~~~")
print(d) 

In [None]:
### We also can create an array with all ones: 4 raws and 2 columns array of all ones
e = np.ones((4,2))

print("All ones 4x2 array: ")
print("~~~~~~~~~~~~~~~~~~~~~")
print(e)

In [None]:
### Or array with specified constant number, for example 2x5 array with all '9'
f = np.full((2,5), 9)

print("All 9's 2x5 array: ")
print("~~~~~~~~~~~~~~~~~~~~~")
print(f)

In [None]:
### Finally, we can use NumPy 'random' function to create a 3x3 array with random values
g = np.random.random((3,3))

print("3x3 array with random values: ")
print("~~~~~~~~~~~~~~~~~~~~~")
print(g) # Note, that random values are all between 0 and 1, we will discuss it later

### Array Indexing
Remember how we sliced data frames and lists? We can do the same with NumPy arrays. 

In [None]:
### Create 3x4 array
h = np.array([[1,2,3,4,], [5,6,7,8], [9,10,11,12]])

print("NumPy 3x4 array h:")
print(h)

print("~~~~~~~~~~~~~~~~~~~")
### Slice array to make a 2x2 sub-array
i = h[:2, 1:3]

print("NumPy 2x2 subarray i:")
print(i)

In [None]:
### We can interact with specific element of the NumPy array
combinefg = f[1, 2] + g[1, 0]
print("combinefg = ", combinefg)
print("~~~~~~~~~~~~~~~~~~~")
### Interesting feature, if we modify the slice, we do change in parent array as well
### Be carefull with that!
print("Initial parent element of array h: ", h[0,1])

i[0,0] = 900
print("~~~~~~~~~~~~~~~~~~~")
### Print to show how modifying the slice also changes the base object
print("Modified parent element of array h: ", h[0,1])

### Datatypes in Arrays
Here we will see already familiar data types 

In [None]:
### Integer data type
j = np.array([1, 2])
print(j.dtype)  
print("~~~~~~~~~~~~~~~~~~~")

### Float data type
k = np.array([1.0, 2.0])
print(k.dtype)         
print("~~~~~~~~~~~~~~~~~~~")
### We can force Data Type to make sure that Python is storing what we want
l = np.array([1.8, 2.0], dtype = np.int64)
print(l.dtype) # What will be in the firs element of the NumPy array 'l'? 1 or 2? 
print("~~~~~~~~~~~~~~~~~~~")

### Let's check
print(l[0]) # Have you guessed correctly?

## Array Math

Basic mathematical functions operate $elementwise$ on arrays, this can be very handy! 
These functions are available both as operator overloads and as functions in the numpy module. 

In [None]:
x = np.array([[1,2],[3,4]], dtype = np.float64)
y = np.array([[5,6],[7,8]], dtype = np.float64)

### Elementwise sum can be done in 2 ways; both produce the same array
### [[ 6.0  8.0]
###  [10.0 12.0]]
print("Summation")
print(x + y)
print(np.add(x, y))
print("~~~~~~~~~~~~~~~~~~~")
### Elementwise difference, again two approaches; both produce the same array
### [[-4.0 -4.0]
###  [-4.0 -4.0]]
print("Difference")
print(x - y)
print(np.subtract(x, y))
print("~~~~~~~~~~~~~~~~~~~")
### Same with elementwise product
### [[ 5.0 12.0]
###  [21.0 32.0]]
print("Product")
print(x * y)
print(np.multiply(x, y))
print("~~~~~~~~~~~~~~~~~~~")
### And elementwise division!
### [[ 0.2         0.33333333]
###  [ 0.42857143  0.5       ]]
print("Division")
print(x / y)
print(np.divide(x, y))
print("~~~~~~~~~~~~~~~~~~~")
### What about elementwise square root; here only one way is available and resulted array should be:
### [[ 1.          1.41421356]
###  [ 1.73205081  2.        ]]
print("Square root")
print(np.sqrt(x))

In [None]:
### We can apply sum function to the array, to its columns and rows

### Compute sum of all elements of the array, answer is "10"
print(np.sum(x))
print("~~~~~~~~~~~~~~~~~~~")

### Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis = 0)) 
print("~~~~~~~~~~~~~~~~~~~")

### Compute sum of each row; prints "[3 7]"
print(np.sum(x, axis = 1))

In [None]:
### In the same way we can apply other functions, for example computing average using 'mean'

### Compute mean of all elements; prints "2.5"
print(np.mean(x))
print("~~~~~~~~~~~~~~~~~~~")

### Compute mean of each column; prints "[2 3]"
print(np.mean(x, axis = 0)) 
print("~~~~~~~~~~~~~~~~~~~")

### Compute mean of each row; prints "[1.5 3.5]"
print(np.mean(x, axis = 1))

## Lists vs. numpy arrays, and dictionaries
There is a lot to say about these concepts. For more information, follow the links provided or do your own search for the many great resources available on the web. 

### Lists vs numpy arrays
Lists can have multiple datatypes. For example one element can be a *string* and another can be and *int* and another a *float*. Lists are defined by using the square brackets: \[ \], with elements separated by commas, ','.
Example:
```
my_list = ['one', 2, 3.14, 'Last']
```
Lists are indexed by position. Remember, in Python, the index starts at *0* and ends at *length(list) - 1*. 
To retrieve the first element of the list you call:
```
my_list[0]
```

Numpy arrays  *np.arrays* differ from lists is that the contain only _one_ datatype. For example, all the elements might be *ints* or *strings* or *floats* or *object*s. It is defined by *np.array(object)*, where the input 'object' can be for example a *list* or a *tuple*.    
Example:
```
my_array = np.array([1, 10, 100, 1000])
```
or 
```
my_array = np.array((1, 10, 100, 1000))
```

Lists and numpy arrays differ in their speed and memory efficiency. An intuitive reason for this is that python lists have to store the value of each element and also the type of each element (since the types can differ). Whereas numpy arrays only need to store the type once because it is the same for all the elements in the array. 

You can do calculations with numpy arrays that can't be done on lists.  
Example:
```
my_array/3
```
will return a numpy array, with each of the elements divided by 3. Whereas:
```
my_list/3
```
Will throw an error.

You can appened items to the end of lists and numpy arrays, though they have slightly different commands. It is almost of note that lists can append an item 'in place', but numpy arrays cannot.

```
my_list.append('new item')
np.append(my_array, 5) # Note: a new element must be of the same type as all other elements
```

Links to python docs:  
[Lists](https://docs.python.org/3/tutorial/datastructures.html), [arrays](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html), [more on arrays](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.creation.html)

## Dictionary

A Python dictionary is a built-in data type used to store data in key-value pairs. It's a highly versatile and fundamental part of the language, designed for fast lookups.

- **Key-Value Pairs** Each item in a dictionary consists of a _unique key_ and its corresponding _value_.
- **Unordered (Historically)** In Python versions before 3.7, dictionaries were unordered. In modern Python (3.7+), dictionaries maintain the order in which items were inserted.
- **Mutable** You can change, add, and remove items from a dictionary after it has been created.
- **Unique Keys** Keys must be unique within a dictionary. If you assign a value to an existing key, it will overwrite the old value. 

### Syntax of the Dictionaries

Dictionaries are created using curly braces {} with key-value pairs separated by colons :

We will use dictionaries to manipulate pandas Data Frames. Let's practicw creating and manipulating them.


In [None]:
# Creating a dictionary for a participant profile with id 101 and store it in dictionary subject
subject = {
    "id": 101,
    "is_a_driver": True,
    "used_modes": ["personal vehicle", "bus"]
}
print(f"\nInitial dictionary: {subject}")
print("~~~~~~~~~~~~~~~~~~~")
# Accessing a value by its key
print(f"\nUsername: {subject['id']}")

# Adding a new key-value pair
subject["city"] = "New York"
print(f"User's city: {subject['city']}")

# Modifying an existing value
subject["is_a_driver"] = False
print(f"Is user active: {subject['is_a_driver']}")

# Removing a key-value pair
del subject["used_modes"]
print("~~~~~~~~~~~~~~~~~~~")
# Printing the final dictionary
print(f"\nFinal dictionary: {subject}")

## Let's Importing Another Data Set 
Let's import the data set on housholds' characteristics to further explore Python functionality

In [None]:
### Store the url string that hosts our hhv2pub.csv file (copy data folder to the same directory as this notebook)
url2 = "csv/hhv2pub.csv"

### Read the .csv file using `pandas` libriary and store it as a `pandas` Data Frame
df2 = pd.read_csv(url2)

### Let's now check what data type we stored into `df2`. This code will output object type
type(df2)
pd.set_option('display.max_columns', None)

## Viewing Data
It is always a good idea!

In [None]:
### Now you can view your Data Frame by calling the head() function. By default only 5 rows are displayed. 
### You can change it by typing an integer number in the parentheses
df2.head()

Now, let's answer a few questions about data utilizing Python functionality we have discussed.  

- First, let's understand how many observations do we have in the household data set?
- Then, what are data types of variables HHSIZE and HHVEHCNT?
- Third, identify percent of households with just one member and percent of households with one member but more than 1 car.
- Finally, identify percent of the households with more cars than number of members. 

In [None]:
### Replace all xxxxxx

### How many observations do we have?
print("Shape of the data set is: ", str(df2.shape)) 

### The next line will save the first entry from shape command that reflect number of rows (observations/households)
observ = df2.shape[xxxxxxxxx]

### Now we can print that input stored in the variable 'observ'
print("In this data set we have " + str(xxxxxxxx) + " observations") 

In [None]:
### What are data types of variables HHSIZE and HHVEHCNT?
print("Data type of the variable HHSIZE using dtype function: " + str(df2['HHSIZE'].dtype))

### Repeat for the HHVEHCNT variable below
print("Data type of the variable HHVEHCNT using dtype function: " + xxxxx)


In [None]:
### Now, identify variable that indicates the size of the household.

### Calculate average household size of the participated households.
### To do that, you can use sum(xxxxx)/len(xxxxx)
meansize = sum(xxxxxxx)/len(xxxxxxx)

### You can check your answer by calculating average directly in Data file

### Now we can print that input stored in the variable 'meansize'
print("The average size of participated households is: " + str(xxxxxxx))

In [None]:
### Next question is: What percent of housholds have just one memebrs and have more than 1 car?
### To answer this question we, first, need to use dataframe function 'groupby' and 'size()' 
### applied to proper column of the data set

results1 = df2.groupby([xxxx, xxxx]).size()
print (results1)

In [None]:
### WHat do you think this command will return?
sum(results1)

In [None]:
### We can store that data in NumPy array
### This line of code store all HHVEHCNT for households with size 1
onepersonHH = np.array(results1[1])

In [None]:
### Always display results to make sure that you coded what you want
onepersonHH

In [None]:
results1.loc[(1, 0)] # Access the size of the subgroup with one member in the household and 0 cars. 

In [None]:
### It is time to calculate everything needed to answer the questions about one person households
oneperson_zerocar = xxxxxxxxxxxxxxx
oneperson_onecar = xxxxxxxxxxxxxxx
answer1 = (sum(onepersonHH)) * 100 / df.shape[0]
answer2 = (sum(onepersonHH) - oneperson_zerocar - oneperson_onecar) * 100 / df.shape[0]

In [None]:
### Output the answer to the questions
print("Percent of households with one person: ", answer1, "%")
print("Percent of households with one person that have more then one car: ", answer2, "%")

One of the powerful tools to explore the data is function $describe()$. It provides basic descriptive statistics for selected or all columns. Check if you calculation for average houshold size is correct.

In [None]:
df2.describe()

In [None]:
### if your data can contain wrong inputs, you can use function unique for specific column to check
pd.unique(df["HHSIZE"])

In [None]:
### What about number of vehicles?
pd.unique(df["HHVEHCNT"])

### Comparing two entries in data set

In [None]:
print("Number of vehicles in the first observation is ", df2.loc[0,'HHVEHCNT'] )
print("Household size of the first observation is ", df2.loc[0,'HHSIZE'] )
print("~~~~~~~~~~~~~~~~~~~~")
print("Is number of household members more than number of vehicles in the first observation?")
print(df2.loc[0,'HHVEHCNT'] < df2.loc[0,'HHSIZE'])

In [None]:
### Small introduction into functin "range"
range(10)

In [None]:
range(df2.shape[0])

Now, to know what percent of the hoseholds have more cars than household memebrs, we need to create a new column!
Let's make values in that column following this logic:

-1 if number of vehicles is more than number of residents

0 if numbers are similar

1 if number of residents is larger

How we can do that?

In [None]:
### Let's output the size of the data set
df2.shape

In [None]:
for element in range(df2.shape[0]):
    if df2.loc[int(element),'HHVEHCNT'] > df2.loc[int(element),'HHSIZE']:
        df2.at[element,'VMORER'] = -1
    elif df2.loc[element,'HHVEHCNT'] < df2.loc[element,'HHSIZE']:
        df2.at[element,'VMORER'] = 1
    else:
        df2.at[element,'VMORER'] = 0

In [None]:
results2 = df2.groupby(['VMORER']).size()
print(results2)

In [None]:
### Here is another way to answer the same question
### First, let's create 3 new columns using logical expressions
df2['VMORER2'] = df2.HHVEHCNT > df2.HHSIZE
df2['RMOREV'] = df2.HHVEHCNT < df2.HHSIZE
df2['VEQR'] = df2.HHVEHCNT == df2.HHSIZE

In [None]:
print(df2.groupby(['VMORER2']).size())
print("*******************")
print(df2.groupby(['RMOREV']).size())
print("*******************")
print(df2.groupby(['VEQR']).size())

In [None]:
It is time to calculate percent of the hoseholds that have more cars than household members.

In [None]:
more_car_count = results2[xxxxx] ### use variable results2 or new df columns created above to access number of hoseholds that have more cars than household memebrs
hh_count = xxxxxx ### use variable results2 or new df columns created above to access total number of households
print("Percent of the hoseholds that have more cars than household memebrs: ", xxxxxxx)

# Homework questions

## Question 1

How many females are driving?

In [None]:
### Use groupby to answer the question: How many females are driving?
Q1results = df.groupby(['R_SEX', 'DRIVER']).size() # You can also use .size()
print(Q1results)


R_SEX  DRIVER
-9      1           1
-8      1           7
        2           1
-7     -1          54
        1         145
        2          29
 1     -1         967
        1        6499
        2         688
 2     -1         943
        1        6782
        2         881
dtype: int64


In [9]:
print(Q1results.loc[(2,1)])

6782


In [None]:
### Identify percent of females who is driving 

totalFemale = sum(Q1results.loc[(2)])
drivingFemale = Q1results.loc[(2,1)]

print("percent of female respondents who are driving: ", drivingFemale/totalFemale*100,'%')



Percent of female respondents who are driving:  78.80548454566582 %


## Question 2

Calculate percent of people who is driving to work

In [62]:
### Identify percent of people who is driving to work using similar process
### variable WORKER to identify working participants
### and variable WRKTRANS to identify mode, include Car, Van, SUV, and Pickup

statue_people = df.groupby(['WORKER','WRKTRANS']).size()
print(statue_people)

driving_worker = statue_people.loc[(1,1)] + statue_people.loc[(1,2)] + statue_people.loc[(1,3)] + statue_people.loc[(1,4)]
print(f"the total number of people who driving to work is {driving_worker}")

total_people = df.shape[0]
print(f"the total number of people is {total_people}")

print("the percent of people who is driving to work is :", driving_worker/total_people*100 , '%')

WORKER  WRKTRANS
-1      -1          2193
 1      -1          1687
         1          2972
         2           204
         3          1585
         4           758
         7            12
         8            97
         9             7
         10            2
         11           88
         12           49
         13            1
         14           11
         15            2
         16           35
         17            4
         18           77
         19            1
         20          163
         21           25
         22            3
 2      -1          7021
dtype: int64
the total number of people who driving to work is 5519
the total number of people is 16997
the percent of people who is driving to work is : 32.47043595928693 %


# To submit your homework:

    1 - Run all code cells 

    2 - Answer all questions

    3 - Download as pdf (you can use print the page and save it as pdf) 

    4 - Inspect the pdf file (all cells are executed and all questions are answered?)

    5 - If it looks good

            - Rename it as following *WA2_first_last* replacing *first_last* with your first and last names
    
            - Upload pdf to UBLearns/BrightSpace
    
        else
    
            - Fix the issue
    
            - Repeat from step 3
    

<div class="alert alert-block alert-info">
<b>Tip:</b> you may need to install additional libriaries to enable download into pdf funciton using pip and command window, specifically: >pyppeteer-install and/or nbconvert[webpdf].
</div>