In [None]:
import pandas as pd

### Data Types: all objects in python have a type. You can check the type by using the _type()_ function. Here are a few standard ones

In [None]:
type(1.5)

In [None]:
type(3)

In [None]:
type('abc')

In [None]:
type(True)

### You can convert between types

In [None]:
float(1)

In [None]:
str(1)

In [None]:
int('9')

In [None]:
int(9.9)

### DataFrames also have a type

In [None]:
accidents = pd.read_csv('data/Traffic_Accidents.csv')

In [None]:
type(accidents)

### And each column has a type

In [None]:
accidents.info()

In [None]:
accidents.dtypes

In [None]:
accidents.head()

### One data type you will encounter is a `datetime`

### The `Date and Time` column in the `accidents` dataframe is treated as an `object` but we can convert it to a different type, such as a `datetime` 

In [None]:
# Let's convert the 'Date and Time' column to a datetime and assign it back to itself
accidents['Date and Time'] = pd.to_datetime(accidents['Date and Time'])

# pd.to_datetime will infer the different date and time components of the string.
# If the datetime is in a strange format or you want to be explicit you can use the 'format' argument
# You will have to use datetime symbols: 
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

# It will take a second to run...

In [None]:
# Now the column is a datetime64[ns]
accidents.dtypes

In [None]:
# The values in the Date and Time column look different now
accidents.head()

In [None]:
# And we can see each value is a timestamp
accidents.loc[0, 'Date and Time']

### Once you have a `datetime` object, you can pull out [individual parts](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html)
- Use `.dt` to specify a datetime attribute/function and then what you want to pull out
- Pull out the month from the 'Date and Time' column and save it to a new column called 'month'

In [None]:
accidents['month'] = accidents['Date and Time'].dt.month
accidents.head()

#### What is the maximum number of cars involved in a single accident in July?
- subset the `accidents` DataFrame to get the July accidents
- find the maximum `Number of Motor Vehicles` for accidents that happened in July


In [None]:
july_accidents = accidents[accidents['month']==7]
july_accidents.head()

In [None]:
july_accidents['Number of Motor Vehicles'].max()

In [None]:
july_accidents.nlargest(1, 'Number of Motor Vehicles')

In [None]:
# How many accidents happened in December?
(accidents['month']==12).sum()

### There are [many different attributes associated with datetimes](https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587)

In [None]:
accidents['Date and Time'].dt.time.head()

In [None]:
accidents['Date and Time'].dt.date.head()

In [None]:
accidents['Date and Time'].dt.weekday.head()

In [None]:
accidents['Date and Time'].dt.is_leap_year.head()

### You can use comparison symbols on `datetime` objects as well

In [None]:
# How many accidents happened before March 3
sum(accidents['Date and Time'] < '03/03/2019')

# Note: You have to input the comparison value as a string,
# but the format can vary and pandas will attempt to infer the format.
# Try putting in different formats and rerunning this cell.

### You can also perform calculations on `datetime` objects

In [None]:
# How long between the 1st and 101th accident?
accidents = accidents.sort_values('Date and Time')
accidents.loc[100, 'Date and Time'] - accidents.loc[0, 'Date and Time']

# It appears as a Timedelta, or a change in time

# End of Instruction

### Use PGAdmin to get the player info for all players, if they are in the Hall of fame, also pull that data.   
### Save those results as a .csv and read them into this notebook in the cell below

### Convert the debut and final game info into Datetime

### Find the difference in bebut and final game for all players

### Next compare that difference among all players, hall of fame players, and players not in the hall of fame