# Q1 merge function
Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people
Remark: this is a left join problem since everyone's name on the left has to be kept. And dont forget to do slicing in the end.

In [11]:
import pandas as pd
table_1 = pd.DataFrame({'PersonId':[10, 11], 
                        'FirstName':['Allen', 'Mike'], 
                        'LastName':['Koch', 'Steven'] })
table_2 = pd.DataFrame({'Address':['800_street', '700_street'], 
                        'PersonId':[10, 15], 
                        'City':['Columbia', 'Sparta'],
                        'State': ['SC', 'GA']})
left_j = pd.merge(table_1, table_2, 
                  left_on='PersonId', 
                  right_on='PersonId', 
                  how='left')

In [15]:
result = left_j[['FirstName', 'LastName', 'City', 'State']]
result

Unnamed: 0,FirstName,LastName,City,State
0,Allen,Koch,Columbia,SC
1,Mike,Steven,,


In [16]:
#alternative_solution 
table_1.set_index('PersonId', inplace=True)
table_2.set_index('PersonId', inplace=True)
alternative = pd.merge(table_1, table_2, how='left', left_index=True, right_index=True)
output = alternative[['FirstName', 'LastName', 'City', 'State']].reset_index(drop=False)

# Q2 drop_duplicates and sort_values
Second highest salary. Just need the value. Return None if it's not there 

Some clarification:
1. according to the discussion board on leetcode, this is a distinctive order. Which means if there's two highest, then the next lowest one is going to be considered to be the second highest. 
2. Hence, by 'no second highest', the question refers to a situation where everyone is equally paid.

Notes:
1. ascending needs to be set to false since this is to find the second highest salary. This is  easy to forget.
2. to get the null situaion, one can use a try-except clause.
3. dont do pd.unique unless you know what you are doing since that operation returns a numpy array and all index info will be thrown away!

In [128]:
salary = [i*100 for i in range(4)]
salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':salary})
# salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[100, 100, 100, 100]})

unique_list = salary_table['Salary'].drop_duplicates()
try:
    second_highest = unique_list.sort_values(ascending=False).iloc[1]
except IndexError:
    second_highest = None

output_table = pd.DataFrame({'SecondHighestSalary':[second_highest]})

There's a way to get around the try except thing. Use the df.get() function, because you can pass a default value, which is None in this case. 

This way has more traps so be careful.

1. drop_duplicates will retain the index. Hence, if we got 300, 300, 100, 100, the second index is going to be droppped, and when we are trying to get index 1, then it will return us None, but actually this is not the behavior we want since 100 is the second highest. That's why we got to reset_index()

2. A second trap is get(). It functions like loc, not iloc. We can do get(1, None) is because after reset_index() sets all the index to integers, and thus integer is the label!

In [173]:
salary = [i*100 for i in range(4)]
salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':salary})
# salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[300, 300, 100, 100]})
# salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[100, 100, 100, 100]})

unique_list = salary_table['Salary'].\
                drop_duplicates().\
                sort_values(ascending=False).\
                reset_index(drop=True)
second_highest = unique_list.get(1, None)

output_table = pd.DataFrame({'SecondHighestSalary':[second_highest]})
print(output_table)

   SecondHighestSalary
0                  200


# Q3 use get() as a series method
Nth highest salary. Same with Question 2. Change 2 to N

In [185]:
Nth = 3

salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[300, 200, 100, 0]})
salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[300, 300, 200, 100]})
# salary_table = pd.DataFrame({'Id': list(range(4)), 'Salary':[100, 100, 100, 100]})

unique_list = salary_table['Salary'].\
                drop_duplicates().\
                sort_values(ascending=False).\
                reset_index(drop=True)

nth_highest = unique_list.get(Nth-1, None)
output_table = pd.DataFrame({'NthHighestSalary':[nth_highest]})
print(output_table)

   NthHighestSalary
0               100


# Q4 rank function and dense option
rank the scores in the dataframe. 
1. Use the built in function rank, which gives smaller values higher rank, and we need to reverse that by ascending=False.
2. As for sorting options, 
    * method = 'max' will have 1,3,3,4. because the same value occurred in the second and third position. 
    * And a similar situation will give 1, 2, 2, 4 if method='min'. 
    * method='first' will give 1,2,3,4 and 2 belongs to whoever comes first and 
    * 'dense' option is what we wanted here because it will give 1, 2,2, 3. In other words, the rank can be much smaller (more dense) than the index itself. 

In [222]:
score_and_id = pd.DataFrame({'id': range(5), 'score': [3.2,1, 5,1, 0.5]})
score_and_id['id'] = score_and_id.id.astype(np.int64)
score_and_id['rank'] = score_and_id['score'].rank(ascending=False, method='dense').astype(int)
output = score_and_id.drop('id', axis=1).sort_values('rank')
output

Unnamed: 0,score,rank
2,5.0,1
0,3.2,2
1,1.0,3
3,1.0,3
4,0.5,4


# Q5 

Currently this is an ugly fix. Just write a custom function to find a list of such values. 

In [306]:
find_consecutive_numbers = pd.DataFrame({'id': range(1,8), 'Num': [1,1,1,2,1,2,2]})

def find_consecutive(list_, num_of_consecutives=3):
    counter = 0
    temp = list_[0]
    result = []
    
    for i in list_:
        if temp == i:
            counter = counter + 1
        else:
            counter = 0 
        temp = i
        if counter == num_of_consecutives-1:
            result.append(i)        
    return result

cc = find_consecutive(find_consecutive_numbers['Num'])
pd.DataFrame({'ConsecutiveNumber':cc})

Unnamed: 0,ConsecutiveNumber
0,1
