## Generic SQL Questions:

Sample table to be used for solutions below – 
Employee ( empid integer, mgrid integer, deptid integer, salary integer) 
Dept (deptid integer, deptname text)

In [1]:
import pandas as pd
import numpy as np
import string

In [2]:
np.random.seed(1)

In [3]:
## Generate table 
n = 40 # number of employees
m = 6 # number of departments

# employee id
e_id = np.random.choice(np.arange(100,150), n, replace=False)

# manager id
m_id = np.random.choice(np.arange(200,250), n, replace=False)
# set some of them to be 0 for employees who does not manage anybody
imid = np.random.choice(n, int(0.8*n), replace=False)
m_id[imid] = 0

# department id
d_id = np.random.choice(np.arange(300,350), m, replace=False)
d_id = np.random.choice(d_id, n, replace=True)

# salary
salary = 500*np.random.choice(np.arange(16,24), n, replace=True)

# department name
d_name = np.random.choice(list(string.ascii_uppercase), m, replace=False)
d_name = np.random.choice(d_name, n, replace=True)

# Create a DataFrame
data = {'E_ID':e_id, 'M_ID':m_id, 'D_ID':d_id, 
        'SALARY':salary, 'D_NAME': d_name}

df = pd.DataFrame(data)
df

Unnamed: 0,E_ID,M_ID,D_ID,SALARY,D_NAME
0,127,0,309,11500,Y
1,135,0,312,8500,L
2,140,243,324,8000,U
3,138,0,303,10000,N
4,102,0,303,9500,N
5,103,0,312,8000,Z
6,148,201,309,8500,U
7,129,0,312,9000,N
8,146,0,309,10000,L
9,131,0,335,10000,Z


#### 1. Find employees who do not manage anybody.



In [4]:
print('Employees do not manage anybody:')
non_manager = df[df['M_ID'] == 0]['E_ID']

df_non_manager = pd.DataFrame({'non_manager':non_manager})
print(df_non_manager)

Employees do not manage anybody:
    non_manager
0           127
1           135
3           138
4           102
5           103
7           129
8           146
9           131
10          132
11          139
14          119
15          142
16          149
17          126
18          122
19          113
20          141
22          145
23          124
24          123
25          104
26          133
28          130
29          110
30          128
31          144
32          134
33          118
34          120
35          125
36          106
39          101


#### 2. Find departments that have maximum number of employees. (solution should consider scenario having more than 1 departments that have maximum number of employees). Result should only have following information for selected department - deptname, count of employees sorted by deptname.

In [5]:
# unique department names and number of employees
dept,emp_num = np.unique(df['D_NAME'],return_counts=True)
dept_emp_num = pd.DataFrame({'D_ID':dept, 'emp_num':emp_num})

print('Number of employees in each departments')
dept_emp_num

Number of employees in each departments


Unnamed: 0,D_ID,emp_num
0,L,10
1,N,7
2,S,7
3,U,4
4,Y,5
5,Z,7


In [6]:
# indexs have maximum number of employees
i = np.argwhere(emp_num == np.max(emp_num))
#print(i)

# the names of departments having maximum number of employees
# (deptname is automatically sorted from np.unique in the previous command line)
dept_name_max = dept[i]

print('The departments having maximum number of employees:')
print(dept_name_max)

print('The number of employees of the department(s):')
print(np.max(emp_num))

The departments having maximum number of employees:
[['L']]
The number of employees of the department(s):
10


#### 3. Find top 3 employees (salary based) in every department. Result should have deptname, empid, salary sorted by deptname and then employee with high to low salary.

#### 4. List all employees, their salary and the salary of the person in their department who makes the most money but less than the employee.