# Overview

Choose whatever language you're most comfortable with to solve these problems.

# Exercise

The ACME inc. tool supply company manages its operations with 3 csv files:

1. `customers.csv` keeps customer information:
    * `id` is a numeric customer id
    * `firstname` is the customer's first name
    * `lastname` is the customer's last name
2. `products.csv` keeps product info:
    * `id` is a numeric product id
    * `name` is the human-readable name
    * `cost` is the product cost in euros
3. `orders.csv` keeps order information:
    * `id` is a numeric order id
    * `customer` is the numeric id of the customer who created the order
    * `products` is a space-separated list of product ids ordered by the customer

Manually dealing with those files is hard and error-prone, and they've asked for your help writing some code to make their lives easier.

### Task 3

To evaluate our customers, we need a `customer_ranking.csv` containing the following columns, ranked in descending order by total_euros:
* `id` numeric id of the customer
* `firstname` customer first name
* `lastname` customer last name
* `total_euros` total euros this customer has spent on products

In [1]:
import pandas as pd
import numpy as np
from collections import defaultdict

In [2]:
customers = pd.read_csv('customers.csv')
products = pd.read_csv('products.csv')
orders = pd.read_csv('orders.csv')
customers.rename(columns={'id':'customer'}, inplace=True)

In [3]:
def split_by_space(products, customers):
    orders_values = pd.DataFrame(products.str.split(' ').tolist(), index=customers).stack().reset_index()
    orders_values.drop('level_1', axis=1, inplace=True)
    orders_values.rename(columns={0:'id'}, inplace=True)
    return orders_values

def get_total_euros(splitted_data):
    transformed = splitted_data.values.tolist()
    intermediate_euros = defaultdict(list)
    
    for customer, last, first, euros in transformed:
        intermediate_euros[(customer, last, first)].append(int(float(euros)))
        
    total_euros = {}
    for key, cost in intermediate_euros.items():
        total_euros[key] = sum(cost)
    return pd.Series(total_euros).reset_index()

In [4]:
products_splitted_by_customer = split_by_space(orders['products'], orders['customer'])

In [5]:
products_splitted_by_customer_with_cost = pd.merge(products_splitted_by_customer.astype(str), products[['id', 'cost']].astype(str), on='id', how='left')

In [6]:
products_splitted_by_customer_with_cost.drop('id', axis=1, inplace=True)
customers_info_with_cost = pd.merge(customers.astype(str), products_splitted_by_customer_with_cost, on='customer', how='inner')

In [7]:
customer_ranking = pd.DataFrame(get_total_euros(customers_info_with_cost)).rename(columns={'level_0':'customer_id', 'level_1':'firstname', 'level_2':'lastname', 0:'total_euros'}).sort_values(by='total_euros', ascending=False)

In [8]:
customer_ranking.to_csv('customer_ranking.csv')