# Uniform dates

After having unified the currencies of your different account amounts, you want to add a temporal dimension to your analysis and see how customers have been investing their money given the size of their account over each year. The `account_opened` column represents when customers opened their accounts and is a good proxy for segmenting customer activity and investment over time.

However, since this data was consolidated from multiple sources, you need to make sure that all dates are of the same format. You will do so by converting this column into a `datetime` object, while making sure that the format is inferred and potentially incorrect formats are set to missing. The `banking` DataFrame is in your environment and pandas was imported as pd.

In [7]:
import pandas as pd
import numpy as np
from faker import Faker
import datetime as dt
fake = Faker()
path=r'Z:/'
file='banking_dirty.csv'
banking = pd.read_csv(path+file,index_col = [0])
acct_cur = [fake.random_element(elements=('dollar', 'euro')) for _ in range(len(banking))]
banking['acct_cur']=acct_cur
print(banking.head(),'\n')

    cust_id  birth_date  Age  acct_amount  inv_amount   fund_A   fund_B  \
0  870A9281  1962-06-09   58     63523.31       51295  30105.0   4138.0   
1  166B05B0  1962-12-16   58     38175.46       15050   4995.0    938.0   
2  BFC13E88  1990-09-12   34     59863.77       24567  10323.0   4590.0   
3  F2158F66  1985-11-03   35     84132.10       23712   3908.0    492.0   
4  7A73F334  1990-05-17   30    120512.00       93230  12158.4  51281.0   

    fund_C   fund_D account_opened last_transaction acct_cur  
0   1420.0  15632.0       02-09-18         22-02-19   dollar  
1   6696.0   2421.0       28-02-19         31-10-18   dollar  
2   8469.0   1185.0       25-04-18         02-04-18     euro  
3   6482.0  12830.0       07-11-17         08-11-18   dollar  
4  13434.0  18383.0       14-05-18         19-07-18   dollar   



* Print the header of `account_opened` from the `banking` DataFrame and take a look at the different results.

In [3]:
# Print the header of account_opened
print(banking['account_opened'].head())

0    02-09-18
1    28-02-19
2    25-04-18
3    07-11-17
4    14-05-18
Name: account_opened, dtype: object


# Question

Take a look at the output. You tried converting the values to datetime using the default to_datetime() function without changing any argument, however received the following error:

In [4]:
pd.to_datetime(banking['account_opened'])

  pd.to_datetime(banking['account_opened'])


0    2018-02-09
1    2019-02-28
2    2018-04-25
3    2017-07-11
4    2018-05-14
        ...    
95   2018-05-26
96   2017-04-05
97   2017-08-16
98   2017-09-10
99   2017-01-08
Name: account_opened, Length: 100, dtype: datetime64[ns]

* Convert the `account_opened` column to `datetime`, while making sure the date format is inferred and that erroneous formats that raise error return a missing value.

In [6]:
# Print the header of account_opened
print(banking['account_opened'].head())

# Convert account_opened to datetime
banking['account_opened'] = pd.to_datetime(banking['account_opened'],
                                           # Infer datetime format
                                          # infer_datetime_format = True,  deprecated
                                           # Return missing value for error
                                           errors = 'coerce') 

0   2018-02-09
1   2019-02-28
2   2018-04-25
3   2017-07-11
4   2018-05-14
Name: account_opened, dtype: datetime64[ns]


* Extract the year from the amended `account_opened` column and assign it to the `acct_year` column.
* Print the newly created `acct_year` column.

In [9]:
# Print the header of account_opend
print(banking['account_opened'].head())

# Convert account_opened to datetime
banking['account_opened'] = pd.to_datetime(banking['account_opened'],
                                           # Infer datetime format
                                        #   infer_datetime_format = True,
                                           # Return missing value for error
                                           errors = 'coerce') 

# Get year of account opened
banking['acct_year'] = banking['account_opened'].dt.strftime('%Y')

# Print acct_year
print(banking['acct_year'] )

0   2018-02-09
1   2019-02-28
2   2018-04-25
3   2017-07-11
4   2018-05-14
Name: account_opened, dtype: datetime64[ns]
0     2018
1     2019
2     2018
3     2017
4     2018
      ... 
95    2018
96    2017
97    2017
98    2017
99    2017
Name: acct_year, Length: 100, dtype: object
