#Chapter 4 
##NumPy

Rachel is very happy with the way you cleaned and structured the employee data last time, but she wants to take things one step further. The sales team wants to track the performance of the employees not only by the revenue they generated, but also taking the number of calls and the average deal size into account. For that, Rachel wants you to create a software for reporting. To be able to perform calculations on the data you decide that creating a data structure using Numpy will be best.

### Get the data

First, you need to get the data. You created lists with the relevant information in the last chapter.

📌 Copy and paste the lists "names", "call_numbers", "average_deal_size", and "revenues" you created in chapter 3 and assign them to variables.

In [25]:
#Copy the lists "names", "call_numbers", "average_deal_size", and "revenues" from chapter 3
names=["Ben","Omer","Karen","Caline","Sue","Bora","Rose","Ellen","Bob","Tylor","Jude"]
call_numbers=[300,10,500,70,100,100,600,800,200,450,80]
average_deal_sizes=[8,6,24,32,5,25,25,40,15,10,12]
revenues=[2400,60,12000,2275,500,770,4000,6000,800,1200,500]

### Importing NumPy

You plan to create a data structure using NumPy.

📌 Import the NumPy library as.

In [26]:
#Import the NumPy library
import numpy as np

###Creating an initial (base) arrays

Next, you need to prepare an array in which you will store the values. The initial array will be empty. Since it will hold numerical data, the data type should be integer.

📌 Create an empty array using .array(). 

📌 Use the *dtype* parameter to specify the data type.

In [27]:
#Create an empty array with the data type integer
data=np.array([],dtype=int)

###Functions to add the data

Now that your empty array is ready, you need to transfer the data from the lists to the array. You create 2 functions functions for this. 

1. append_names function: A function that takes the "names" list and adds the index of each name to the "data" array.

  📌 Use a for loop and the .append() method for the indexes.

2. append_performance_measures function: A function that takes one of the remaining lists to add the sales performance data like number of calls, average deal size and revenue to the "data" array.

  📌 Use the .append() method.

In [28]:
#Define the append_names function
def append_names(names_list):
    global data
    for i in names_list:
        data=np.append(data,names.index(i))

In [29]:
#Define the append_performance_measures function
def append_performance_measures(feature_list):
    global data
    data=np.append(data,feature_list)

Call the functions to add the data to the array and print the array and its shape to see the result.

📌 Use the .shape() method.

In [30]:
#Use the append_names and append_sales_performance_measures to add the data

append_names(names)
append_performance_measures(call_numbers)
append_performance_measures(average_deal_sizes)
append_performance_measures(revenues)
#Print the array and its shape to see the result
data, data.shape

(array([    0,     1,     2,     3,     4,     5,     6,     7,     8,
            9,    10,   300,    10,   500,    70,   100,   100,   600,
          800,   200,   450,    80,     8,     6,    24,    32,     5,
           25,    25,    40,    15,    10,    12,  2400,    60, 12000,
         2275,   500,   770,  4000,  6000,   800,  1200,   500]),
 (44,))

###Reshape the array

But like this, your array is not very structured. You need a 2D-array to be able to work better with it. The original data was 4 lists each with 11 values. So, the "data" array should have 4 rows and 11 columns. Print the result afterwards.

📌 Use the .reshape() method to rearrange the values in the array.


In [31]:
#Use the .reshape() method to change the array structure to 4 rows and 11 columns
data = data.reshape(4,11)

#Print the resulting array and its shape
data, data.shape

(array([[    0,     1,     2,     3,     4,     5,     6,     7,     8,
             9,    10],
        [  300,    10,   500,    70,   100,   100,   600,   800,   200,
           450,    80],
        [    8,     6,    24,    32,     5,    25,    25,    40,    15,
            10,    12],
        [ 2400,    60, 12000,  2275,   500,   770,  4000,  6000,   800,
          1200,   500]]),
 (4, 11))

###Accessing values

Inside the array you can access the values in different ways. 

1. Print each row separately. 

  📌 Write down the array name and the index of the row you want to access.


In [32]:
#Print the name indexes
data[0]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [33]:
#Print the number of calls
a=data[1]

In [34]:
#Print the average deal sizes
data[2]

array([ 8,  6, 24, 32,  5, 25, 25, 40, 15, 10, 12])

In [35]:
#Print the revenues 
data[3]

array([ 2400,    60, 12000,  2275,   500,   770,  4000,  6000,   800,
        1200,   500])

2. Print a specific value.

  📌 Give the index of the row and the column of the value you want to access.
  For example, to get the revenue generated by Ellen, specify the value in the 3rd row and 7th column.

In [36]:
#Print the revenue generated by Ellen
data[3,7]

6000

###Analyzing the data

Great, your array is ready!

The sales team has a formula that they use to calculate the performance score of an employee.


\begin{align}
        \text{Performance} = \frac{\text{Average deal size x Revenue}}{\text{Number of calls}}
    \end{align}


📌 Create a function called “calculate_performance” to implement this formula. It should take the employee name as an input.

In [37]:
#Define the function calculate_performance
def calculate_performance(employee_name):
    idx=names.index(employee_name)
    number_of_calls=data[1,idx]
    avg_deal_size=data[2,idx]
    revenue=data[3,idx]

    score=(avg_deal_size*revenue)/number_of_calls
    return score

###Try it out

Let's check Ellen's performance score and print the result.

In [38]:
#Use the calculate performance function to print Ellen's pe
calculate_performance("Ellen")

300.0

###Calculate the performance of each employee
Now you need to calculate the performance score of each employee and add these scores to a list.

📌 Create an empty list "performance_scores" to hold the scores.

📌 Use a for loop to convert the scores into integer type data and append it to the list "sperformance_scores".

In [39]:
#Calculate the performance of each employee
performance_scores=[]
for name in names:
    score=int(calculate_performance(name))
    performance_scores.append(score)

###Add the scores to the "data" array

Next, you need to add the scores to your "data" array and print the result.

📌 Use the .concatenate() method to add the "performance_scores" list to the "data" array. 

In [40]:
#Add the "performance_scores" list to the "data" array
data=np.concatenate((data,[performance_scores]),axis=0)

#Print the resulting array
data

array([[    0,     1,     2,     3,     4,     5,     6,     7,     8,
            9,    10],
       [  300,    10,   500,    70,   100,   100,   600,   800,   200,
          450,    80],
       [    8,     6,    24,    32,     5,    25,    25,    40,    15,
           10,    12],
       [ 2400,    60, 12000,  2275,   500,   770,  4000,  6000,   800,
         1200,   500],
       [   64,    36,   576,  1040,    25,   192,   166,   300,    60,
           26,    75]])

###Find out the best and worst performing employees

Finally, you need to determine the best and worst performing employees.

📌 Use the .argmax() and .argmin() methods to find the index of the best and worst performing employees.

In [41]:
#Use .argmax() and .argmin() methods to determine the best and worst performing employees
idx_best_employee=np.argmax(data[4])
idx_worst_employee=np.argmin(data[4])
#Print out the results
print(f"Best perfroming employee: {names[idx_best_employee]} ")
print(f"Worst perfroming employee: {names[idx_worst_employee]} ")

Best perfroming employee: Caline 
Worst perfroming employee: Sue 
