In [None]:
import pandas as pd

### Data Types: all objects in python have a type. You can check the type by using the _type()_ function. Here are a few standard ones

In [None]:
type(1.5)

In [None]:
type(3)

In [None]:
type('abc')

In [None]:
type(True)

### You can convert between types.

In [None]:
float(1)

In [None]:
str(1)

In [None]:
int('9')

In [None]:
int(9.9)

### DataFrames also have a type

In [None]:
accidents = pd.read_csv('../data/Traffic_Accidents__2019_.csv')

In [None]:
type(accidents)

### And each column has a type

In [None]:
accidents.info()

Notice that quite a few of the columns are of the "object" type. By default, pandas will convert text data into the object datatype.

You can convert between types using the `.astype` method. For example, if we needed to treat the accident number as text instead of an integer, we could use the following:

In [None]:
accidents['Accident Number'] = accidents['Accident Number'].astype(str)

In [None]:
accidents.info()

Notice that the `Date and Time` column is currently being treated as an `object`. This would make it quite difficult to do comparisions or aggregations, for example, between months or days of the week.

Fortunately, we can convert it to a more useful data type, the `datetime` data type.

In order to do this, we can use the `pd.to_datetime` function.

If we don't tell it otherwise, this function will infer the different date and time components of the string. This can be slow, especially when we have a large number of rows of data.

However, we can help it out be being explicit about the format. To do this, you will have to use datetime symbols: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [None]:
accidents['Date and Time'] = pd.to_datetime(accidents['Date and Time'], 
                                            format = '%m/%d/%Y %I:%M:%S %p')

In [None]:
# Now the column is a datetime64[ns]
accidents.dtypes

In [None]:
# The values in the Date and Time column look different now
accidents.head()

In [None]:
# And we can see each value is a timestamp
accidents.loc[0, 'Date and Time']

### Once you have a `datetime` object, you can pull out [individual parts](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html)
- Use `.dt` to specify a datetime attribute/function and then what you want to pull out
- Pull out the month from the 'Date and Time' column and save it to a new column called 'month'

In [None]:
accidents['month'] = accidents['Date and Time'].dt.month
accidents.head()

### Now, let's see if we can take advantage of the datetime format to answer some questions.

#### Question 1: What is the maximum number of cars involved in a single accident in July?
- First, subset the `accidents` DataFrame to get the July accidents
- Then, find the maximum `Number of Motor Vehicles` for accidents that happened in July

In [None]:
# Fill in the code here

And if we want to get more information on this accident, we can use the `nlargest` method.

In [None]:
accidents[accidents['month']==7].nlargest(1, 'Number of Motor Vehicles')

#### Question 2: How many total accidents happened in December?

In [None]:
# Fill in the code here

### There are [many different attributes associated with datetimes](https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587)

In [None]:
accidents['Date and Time'].dt.time.head()

In [None]:
accidents['Date and Time'].dt.date.head()

In [None]:
accidents['Date and Time'].dt.day_name().head()

In [None]:
accidents['Date and Time'].dt.is_leap_year.head()

### You can use comparison symbols on `datetime` objects as well

In [None]:
# How many accidents happened before March 3
(accidents['Date and Time'] < '03/03/2019').sum()

# Note: You have to input the comparison value as a string,
# but the format can vary and pandas will attempt to infer the format.
# Try putting in different formats and rerunning this cell.

### You can also perform calculations on `datetime` objects

The difference of datetime objects is a [Timedelta](https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html).

In [None]:
# How long between the 1st and 101th accident?
accidents = accidents.sort_values('Date and Time')
accidents.loc[100, 'Date and Time'] - accidents.loc[0, 'Date and Time']

# It appears as a Timedelta, or a change in time