## [Students] Shopee Code League - Multi-Channel Contacts
Customer Service element
- Solution by Made Y
- Understanding by Hendrik

### Background
Customer service is an important element of the Shopee business, as providing a good service for our customers end-to-end is critical for business growth and brand image. Our goal is to resolve the customer’s issue within the least amount of time while requiring the least amount of customer effort.

One measure for customer effort is the number of times a customer has to approach customer service over a particular issue, this is also known as the metric “Repeat Contact Rate” or RCR. Shopee is interested in studying the RCR in order to improve the effectiveness of our customer service.

Customers can contact customer service via various channels such as the livechat function, filling up certain forms or calling in for help. Each time a customer contacts us with a new contact method, a new ticket is automatically generated. A complication arises when the same customer contacts us using different phone numbers or email addresses resulting in multiple tickets for the same issue. Hence, our challenge here is to identify how to merge relevant tickets together to create a complete picture of the customer issue and ultimately determine the RCR.


### Task
- For each ticket, identify all contacts from each user if they have the same contact information.
- For the purpose of this question, assume that all contacts from the same Phone Number / Email are the same user.

### Basic Concepts
- Each Order ID represents a transaction in Shopee.
- Each Id represents the Ticket Id made to Shopee Customer Service.
- All Phone Numbers are stored without the country code and the country code can be ignored.
- Contacts represent the number of times a user reached out to us in that particular ticket (Email, Call, Livechat etc.)
- If a value is NA means that the system or agent has no record of that value.

https://articlearn.id/article/9b7ff55c-shopee-code-league-2021-multi-channel-con/

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_json('contacts.json')

In [3]:
df.head()

Unnamed: 0,Id,Email,Phone,Contacts,OrderId
0,0,gkzAbIy@qq.com,,1,
1,1,,329442681752.0,4,vDDJJcxfLtSfkooPhbYnJdxov
2,2,,9125983679.0,0,
3,3,mdllpYmE@gmail.com,,0,bHquEnCbbsGLqllwryxPsNOxa
4,4,,300364407.0,2,


In [4]:
npdata = df.values

In [5]:
npdata

array([[0, 'gkzAbIy@qq.com', '', 1, ''],
       [1, '', '329442681752', 4, 'vDDJJcxfLtSfkooPhbYnJdxov'],
       [2, '', '9125983679', 0, ''],
       ...,
       [499997, '', '4541459979', 2, 'beXCZSzcHaBwAYoDcpQqjuAFO'],
       [499998, 'RzSDsyH@hotmail.com', '98947185431', 1,
        'ehjeFACGiwrERQxbziMxwOWku'],
       [499999, '', '880053388839', 0, 'JibSBRgzYdfzkzbTuGUXrcvDX']],
      dtype=object)

df.values is gives us dataframe values as numpy array object and it removes index of the table(shows data only)

Approach
- Create empty dictionaries to store all data needed for final submission

In [6]:
memory = {}
connections = {}

Make a function called add_memory for ticket_id and value with conditional statement, if a value isn't empty value then the value added into the ticket_id in the memory named value. else/if value equals to empty data then make a dictionary for ticket_id to store into value in the memory

use loop function to get data from dataframe values as numpy object. The principle of getting the data is similar to the normal dataframe where data divided into column and row, since the npdata applies values as numpy array object, there's no index, so if we want to take based on column then apply 0,1,2,3,4,... as a header.

In [9]:
def add_memory(ticket_id, value):
    if value != "":
        if value in memory:
            memory[value].add(ticket_id)
            return
        memory[value] = {ticket_id}

for i in npdata:
    ticket_id = i[0]
    
    # Order Id
    add_memory(ticket_id, i[4])
    # Email
    add_memory(ticket_id, i[1])
    # Phone
    add_memory(ticket_id, i[2])

Make a simple loop to trace values with same contact information
- Trace values for each ticket_id(ids)
- If the value has connection, then apply union operation to new connection(ids) with old connections(ids)
- set() method is used to convert any of the iterable to sequence of iterable elements with distinct elements
- in the end, put all data in the connections dictionary which is made before

In [10]:
for ids in memory.values():
    current_connection = set(ids)

    for j in ids:
        if j in connections:
            current_connection.update(connections[j])

    for j in current_connection:
        connections[j] = current_connection

- All traces are recorded
- Final step is to calculate the contacts for each ticket_id which has multiple contact informations

In [11]:
output = []
for ticket_id, trace in sorted(connections.items()):
    contacts = np.sum(npdata[list(trace), 3])
    trace = "-".join([str(_id) for _id in sorted(trace)])
    answer = "{}, {}".format(trace, contacts)
    output.append({"ticket_id": ticket_id,  "ticket_trace/contact": answer})

In [12]:
output

[{'ticket_id': 0, 'ticket_trace/contact': '0, 1'},
 {'ticket_id': 1,
  'ticket_trace/contact': '1-2458-98519-115061-140081-165605-476346, 12'},
 {'ticket_id': 2, 'ticket_trace/contact': '2-159312-322639-348955, 4'},
 {'ticket_id': 3, 'ticket_trace/contact': '3, 0'},
 {'ticket_id': 4, 'ticket_trace/contact': '4, 2'},
 {'ticket_id': 5,
  'ticket_trace/contact': '5-50-212533-215197-226720-383605-404324-458692-482810, 15'},
 {'ticket_id': 6, 'ticket_trace/contact': '6-38-32871-142067-236367, 13'},
 {'ticket_id': 7, 'ticket_trace/contact': '7, 1'},
 {'ticket_id': 8, 'ticket_trace/contact': '8-183160-406623, 5'},
 {'ticket_id': 9,
  'ticket_trace/contact': '9-13-16708-33415-343161-417916-468927-484896, 8'},
 {'ticket_id': 10, 'ticket_trace/contact': '10-93270, 7'},
 {'ticket_id': 11, 'ticket_trace/contact': '11-244207, 3'},
 {'ticket_id': 12, 'ticket_trace/contact': '12-160893-480595, 7'},
 {'ticket_id': 13,
  'ticket_trace/contact': '9-13-16708-33415-343161-417916-468927-484896, 8'},
 {'tic

In [13]:
output_df = pd.DataFrame(output)
output_df

Unnamed: 0,ticket_id,ticket_trace/contact
0,0,"0, 1"
1,1,"1-2458-98519-115061-140081-165605-476346, 12"
2,2,"2-159312-322639-348955, 4"
3,3,"3, 0"
4,4,"4, 2"
...,...,...
499995,499995,"499995, 2"
499996,499996,"499996, 4"
499997,499997,"499997, 2"
499998,499998,"121111-499998, 5"
