Hands-up all of you who have a system in your organisation that let's your team enter free text answers in to a system? Ok, well that's most of you and I feel for each and every data guru that sits at the end of the database where that information is stored. If you didn't put your hand up, you will have a lot to learn this week!

<img src="https://3.bp.blogspot.com/-YHAinQadPfI/XIePjvWW8LI/AAAAAAAAAKU/d7miB7T_qlYT7vgx53iXQhG0-ImOZ-acACLcBGAs/s400/Notes%2Bstring.JPG" alt="Alternative text" />

I have often wondered whether I would have a career if it wasn't for projects delivering new operational systems not considering that the 'Junk In' to 'Junk Out' rule is a very pertinent one. Project budget cuts, lack of data awareness and time constraints all lead to a perfect storm of project delivery challenges. One of the side-effects of this is felt as soon as the project releases; how is the new system performing and is it doing what we expected? Welcome to this week's challenge!

The input for this week's data is from a small financial services company's contact centre who have to measure some key statistics like:

- Number of Balances that are being requested
- Number of Statements being asked for
- Number of complaints being raised

<img src="https://1.bp.blogspot.com/-Zy2aa-3ZCps/XIegCmc7gGI/AAAAAAAAAKg/HcemOpq5Ba0_V7DEPN8W3KiDznC8G_D0gCEwYBhgL/s400/Week%2B5%2BInput.JPG" alt="Alternative text" />

We need to know those numbers by Policy Number and Customer ID so we can see who is using the call centre to hopefully get them on to our website instead to self-service. When are our busiest days for the contact types above? The data set we are asking you to create will allow you to load that in to Tableau to get those answers - create a viz if you want to!

Requirements:

- Input data set
- Create a date per day
- Determine how the customer contacted the company
- Create a separate column for the Policy Number
- Remove contacts that don't have a Policy Number
- Identify whether the contact was about a balance check, getting a statement or raising a complaint (it's all our boss cares about)
- Get rid of unrequired columns

<img src="https://4.bp.blogspot.com/-jucGdnIJdU8/XIegCoUPYoI/AAAAAAAAAKk/vbhcA4B0erQRI20ejwdBLh7jJ6f9bFiGwCEwYBhgL/s400/Week%2BFive%2BOutput.JPG" alt="Alternative text" />

Output:

- 15 Rows (16 including the column)
- 6 Columns
- One row per day, per customer and policy

In [408]:
import pandas as pd
from datetime import datetime  
from datetime import timedelta
import re
pd.options.mode.chained_assignment = None  # default='warn'

In [409]:
df1 = pd.read_excel('week5input.xlsx', sheet_name='Week commencing 17th June 19')
print(df1)

        Date  Customer ID                                              Notes
0     Monday      29439.0  Called about their policy #4899. Wanted to kno...
1    Tuesday      39822.0  Called regarding policy #4030. Change of Addre...
2    Tuesday      83219.0  Emailed about the recommendation scheme. Wants...
3    Tuesday      27316.0  Called complaining. Had to wait on the line an...
4  Wednesday      12219.0             Email about #2001. Raised a complaint.
5   Thursday      39822.0  Called about policy #4030. Gave the wrong post...
6   Thursday      49291.0  Email asking to Change Address. Change made to...
7     Friday      40201.0  Emailed requesting a new savings account. Poli...


In [410]:
df2 = pd.read_excel('week5input.xlsx', sheet_name='Week commencing 24th June 19')
print(df2)

         Date  Customer ID                                              Notes
0      Monday      72617.0              Emailed. #2080 asking for a statement
1      Monday      39822.0  Call came in correcting the incorrect correcte...
2      Monday      29439.0  Called about their policy #4899. Wanted to kno...
3     Tuesday      29439.0     Called about incorrect balance on policy #4899
4     Tuesday      12219.0  Email about #2001. Complaint about complaint n...
5   Wednesday      34399.0        Email regarding #4002. Statement requested.
6    Thursday      99999.0                                Call. Wrong number.
7    Thursday      99999.0                       Call. Still the wrong number
8    Thursday      99999.0  Email. Asking for the correct number for Pizza...
9      Friday      29439.0  Email about #4899. Customer thanking me for my...
10     Friday      72617.0               Email. Wants to close policy #2080. 


In [411]:
temp_dict = {
    'Monday':0,
    'Tuesday':1,
    'Wednesday':2,
    'Thursday':3,
    'Friday':4
}

In [412]:
print(temp_dict['Monday'])
print(type(temp_dict['Monday']))


0
<class 'int'>


In [413]:
true_date_array = []
x = datetime(2019, 6, 17)
for record in df1['Date']:
    true_date_array.append(x+ timedelta(days=temp_dict[record]))
df1['True Date'] = true_date_array
print(df1['True Date'])

0   2019-06-17
1   2019-06-18
2   2019-06-18
3   2019-06-18
4   2019-06-19
5   2019-06-20
6   2019-06-20
7   2019-06-21
Name: True Date, dtype: datetime64[ns]


In [414]:
true_date_array = []
x = datetime(2019, 6, 24)
for record in df2['Date']:
    true_date_array.append(x+ timedelta(days=temp_dict[record]))
df2['True Date'] = true_date_array
print(df2['True Date'])

0    2019-06-24
1    2019-06-24
2    2019-06-24
3    2019-06-25
4    2019-06-25
5    2019-06-26
6    2019-06-27
7    2019-06-27
8    2019-06-27
9    2019-06-28
10   2019-06-28
Name: True Date, dtype: datetime64[ns]


In [415]:
df = pd.concat([df1,df2])
df.drop(columns=['Date'], inplace=True)
df.reset_index(inplace=True)
df.drop(columns=['index'], inplace=True)
print(df)

    Customer ID                                              Notes  True Date
0       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-17
1       39822.0  Called regarding policy #4030. Change of Addre... 2019-06-18
2       83219.0  Emailed about the recommendation scheme. Wants... 2019-06-18
3       27316.0  Called complaining. Had to wait on the line an... 2019-06-18
4       12219.0             Email about #2001. Raised a complaint. 2019-06-19
5       39822.0  Called about policy #4030. Gave the wrong post... 2019-06-20
6       49291.0  Email asking to Change Address. Change made to... 2019-06-20
7       40201.0  Emailed requesting a new savings account. Poli... 2019-06-21
8       72617.0              Emailed. #2080 asking for a statement 2019-06-24
9       39822.0  Call came in correcting the incorrect correcte... 2019-06-24
10      29439.0  Called about their policy #4899. Wanted to kno... 2019-06-24
11      29439.0     Called about incorrect balance on policy #48

In [416]:
type_array = []
for record in df['Notes']:
    if re.search('Call',record) is not None:
        type_array.append('Call')
    else:
        type_array.append('Email')
df['Type'] = type_array
print(df)




    Customer ID                                              Notes  True Date  \
0       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-17   
1       39822.0  Called regarding policy #4030. Change of Addre... 2019-06-18   
2       83219.0  Emailed about the recommendation scheme. Wants... 2019-06-18   
3       27316.0  Called complaining. Had to wait on the line an... 2019-06-18   
4       12219.0             Email about #2001. Raised a complaint. 2019-06-19   
5       39822.0  Called about policy #4030. Gave the wrong post... 2019-06-20   
6       49291.0  Email asking to Change Address. Change made to... 2019-06-20   
7       40201.0  Emailed requesting a new savings account. Poli... 2019-06-21   
8       72617.0              Emailed. #2080 asking for a statement 2019-06-24   
9       39822.0  Call came in correcting the incorrect correcte... 2019-06-24   
10      29439.0  Called about their policy #4899. Wanted to kno... 2019-06-24   
11      29439.0     Called a

In [417]:
policy_array = []
for record in df['Notes']:
    c_index = record.find('#')
    if c_index > 0:
        policy_array.append(record[c_index+1:c_index+5])
    else:
        policy_array.append(None)
df['Policy Number'] = policy_array
print(df)

    Customer ID                                              Notes  True Date  \
0       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-17   
1       39822.0  Called regarding policy #4030. Change of Addre... 2019-06-18   
2       83219.0  Emailed about the recommendation scheme. Wants... 2019-06-18   
3       27316.0  Called complaining. Had to wait on the line an... 2019-06-18   
4       12219.0             Email about #2001. Raised a complaint. 2019-06-19   
5       39822.0  Called about policy #4030. Gave the wrong post... 2019-06-20   
6       49291.0  Email asking to Change Address. Change made to... 2019-06-20   
7       40201.0  Emailed requesting a new savings account. Poli... 2019-06-21   
8       72617.0              Emailed. #2080 asking for a statement 2019-06-24   
9       39822.0  Call came in correcting the incorrect correcte... 2019-06-24   
10      29439.0  Called about their policy #4899. Wanted to kno... 2019-06-24   
11      29439.0     Called a

In [418]:
df.dropna(subset=['Policy Number'], inplace=True)
df.reset_index(inplace=True)
df.drop(columns=['index'], inplace=True)
print(df)

    Customer ID                                              Notes  True Date  \
0       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-17   
1       39822.0  Called regarding policy #4030. Change of Addre... 2019-06-18   
2       27316.0  Called complaining. Had to wait on the line an... 2019-06-18   
3       12219.0             Email about #2001. Raised a complaint. 2019-06-19   
4       39822.0  Called about policy #4030. Gave the wrong post... 2019-06-20   
5       49291.0  Email asking to Change Address. Change made to... 2019-06-20   
6       40201.0  Emailed requesting a new savings account. Poli... 2019-06-21   
7       72617.0              Emailed. #2080 asking for a statement 2019-06-24   
8       39822.0  Call came in correcting the incorrect correcte... 2019-06-24   
9       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-24   
10      29439.0     Called about incorrect balance on policy #4899 2019-06-25   
11      12219.0  Email about

In [419]:
df['Balance?'] = ''
df['Complaint?'] = ''
df['Statement?'] = ''

# df.loc[index, 'Notes'] = 'Test Remarks'
# print(df['Notes'])


for index, col in df.iterrows():
    # print(df.loc[index, 'Notes'])
    if 'alance' in df.loc[index, 'Notes']:
        df.loc[index,'Balance?'] = 1
        df.loc[index, 'Complaint?'] = 0
        df.loc[index, 'Statement?'] = 0
    elif 'tatement' in df.loc[index, 'Notes']:
        df.loc[index, 'Balance?'] = 0
        df.loc[index, 'Complaint?'] = 0
        df.loc[index, 'Statement?'] = 1
    elif 'omplaint' in df.loc[index, 'Notes']:
        df.loc[index, 'Balance?'] = 0
        df.loc[index, 'Complaint?'] = 1
        df.loc[index, 'Statement?'] = 0
    else:
        df.loc[index, 'Balance?'] = 0
        df.loc[index, 'Complaint?'] = 0
        df.loc[index, 'Statement?'] = 0

print(df)

# print(df)

    Customer ID                                              Notes  True Date  \
0       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-17   
1       39822.0  Called regarding policy #4030. Change of Addre... 2019-06-18   
2       27316.0  Called complaining. Had to wait on the line an... 2019-06-18   
3       12219.0             Email about #2001. Raised a complaint. 2019-06-19   
4       39822.0  Called about policy #4030. Gave the wrong post... 2019-06-20   
5       49291.0  Email asking to Change Address. Change made to... 2019-06-20   
6       40201.0  Emailed requesting a new savings account. Poli... 2019-06-21   
7       72617.0              Emailed. #2080 asking for a statement 2019-06-24   
8       39822.0  Call came in correcting the incorrect correcte... 2019-06-24   
9       29439.0  Called about their policy #4899. Wanted to kno... 2019-06-24   
10      29439.0     Called about incorrect balance on policy #4899 2019-06-25   
11      12219.0  Email about

In [420]:
df.drop(columns=['Notes'],inplace=True)
print(df)

    Customer ID  True Date   Type Policy Number Balance? Complaint? Statement?
0       29439.0 2019-06-17   Call          4899        1          0          0
1       39822.0 2019-06-18   Call          4030        0          0          0
2       27316.0 2019-06-18   Call          3001        0          0          0
3       12219.0 2019-06-19  Email          2001        0          1          0
4       39822.0 2019-06-20   Call          4030        0          0          0
5       49291.0 2019-06-20  Email          9220        0          0          0
6       40201.0 2019-06-21  Email          6090        0          0          0
7       72617.0 2019-06-24  Email          2080        0          0          1
8       39822.0 2019-06-24   Call          4030        0          0          0
9       29439.0 2019-06-24   Call          4899        1          0          0
10      29439.0 2019-06-25   Call          4899        1          0          0
11      12219.0 2019-06-25  Email          2001     