# DataTalks.Club True Leaderboard Rankings

DataTalks.Club is an awesome organization that's teaching the new generation of Data and Machine Learning Engineers. Although their leaderboard is a gamification of their grading system, it isn't really based on merit alone. 

This is because of how they grade your social media score. For each homework and project, each learner can post their progress to a social media site and get a point for each post. They can do this up to seven times, for each assignment that's submitted.

|           |Post |Total|
|----------:|:---:|:----| 
|Homework 1 |  7  |  7  |
|Homework 2 |  7  | 14  |
|Homework 3 |  7  | 21  |
|Homework 4 |  7  | 28  |
|Homework 5 |  7  | 35  |
|Homework 6 |  7  | 42  |
|Homework 7 |  7  | 49  |
|Homework 8 |  7  | 56  |
|Homework 9 |  7  | 63  |
|Homework 10|  7  | 70  |
|Project 1  |  7  | 77  |
|Project 2  |  7  | 84  |

That's **84 points** that can be gained from simply spamming a quick note with a hashtag for the course. Doesn't seem that hard, but not everyone has accounts in seven different platforms. Also, each platform has different rules for how posts are accepted, so not an trivial task. 

I understand that this is how they are able to provide the classes for free. This gets them a tremendous amount of free advertising. I'm sure that it also gets them new sponsorships and I am simpathetic to their cause. It not only gives companies that are trying to reach new users can get more exposure.

## Case against current scoring system

Here are just a few of the things that I could come up with:

* Posting online quite frankly does not really pertain to what is being learned and is not an indicator of whether or not you're learning the material.
* These classes are not easy and require a tremendous amount of time capital, so it can be disheartening to know that all of your hard work and effort will never truly be recognized unless you spam the maximum amount of social media sites.
* Some are not comfortable being public about what they are doing/learning online.

> Most people would gladly post online because they are thankful for what DataTalks.Club doing.

## New scoring method

What I have done here, is created a way to extract your true score from the leaderboard. I have done this in a very simple way. I capped the social media score at 1 point per assignment and recalculated the totals for the assignments. Finally at the end, a new DataFrame is created with the adjusted scores and honest total score.


> Take heart my brothers and systers, if you've reached the end of the course, you have earned the certificates, no matter your ranking in the leaderboard!

In [1]:
from hashlib import sha1

import pandas as pd

In [2]:
def compute_hash(email):
    return sha1(email.lower().encode('utf-8')).hexdigest()

In [3]:
def ranking(hash: str, file: str) -> pd.DataFrame:
    # create the column names for the homeworks and projects
    homeworks = [f'hw-{str(n).zfill(2)}' for n in range(1, 11)]
    projects = [f'project-{str(n).zfill(2)}' for n in range(1, 3)]
    
    # combine them into a single list
    worksheets = homeworks + projects
    
    # variables to hold our working datarfames and leaderboard
    dfs = {}
    leaderboard = None
    
    # import each sheet from the Excel dataset
    for worksheet in worksheets:
        # the homework and project tabs each use different column names and columns
        if worksheet.startswith('hw'):
            lip = 'learning_in_public'
            column_start = 1
            column_count = -1
        else:
            lip = 'learning_in_public_project_score'
            column_start = 10
            column_count = -2
        
        # import each of the worksheets and recalculate the social media scores
        dfs[worksheet] = pd.read_excel(file, sheet_name=worksheet)
        dfs[worksheet][lip] = dfs[worksheet][lip].apply(lambda x: 1 if x > 1 else 0)
        dfs[worksheet]['total_score'] = dfs[worksheet].iloc[:, column_start:column_count].sum(axis=1)
        
        # create initial leaderboard dataframe and merge the rest into it
        if leaderboard is None:
            leaderboard = dfs[worksheet][['email', 'total_score']].rename(columns={'total_score': worksheet})
        else:
            temp_df = dfs[worksheet][['email', 'total_score']].rename(columns={'total_score': worksheet}) 
            leaderboard = leaderboard.merge(temp_df, how='outer', on=['email'])
    
    # sum up the asignment scores and add to a total_scores column
    leaderboard['total_score'] = leaderboard[worksheets].sum(axis=1)
    
    # sort the total scores in ascending order
    leaderboard.sort_values(by='total_score', ascending=False, inplace=True)
    
    # reset the index and start it from 1
    leaderboard.reset_index(drop=True, inplace=True)
    leaderboard.index += 1
    
    rank = leaderboard[leaderboard['email'] == hash]
    
    display(rank)
    return leaderboard

In [4]:
dataset = 'mlz-2022.xlsx'
email = 'clamytoe@gmail.com'
email_hash = compute_hash(email)
df = ranking(email_hash, dataset)

Unnamed: 0,email,hw-01,hw-02,hw-03,hw-04,hw-05,hw-06,hw-07,hw-08,hw-09,hw-10,project-01,project-02,total_score
11,37ee242cc0136ec47502c8e5af75086a2e9a239b,9.0,7.0,7.0,7.0,7.0,7.0,8.0,8.0,8.0,9.0,35.0,32.0,144.0


In [5]:
df.head()

Unnamed: 0,email,hw-01,hw-02,hw-03,hw-04,hw-05,hw-06,hw-07,hw-08,hw-09,hw-10,project-01,project-02,total_score
1,51d367b74ba52590d37a8bc935a6bc800efa2a21,10.0,9.0,9.0,9.0,9.0,9.0,9.0,9.0,9.0,10.0,36.0,36.0,164.0
2,3196fdabf8908c0b88628b2aa92f4699208ff856,10.0,8.0,9.0,9.0,9.0,9.0,9.0,8.0,9.0,10.0,34.0,36.0,160.0
3,fbd7c94e3b9ad8a87aeac6839d98fb1de8e53a6e,9.0,9.0,8.0,9.0,9.0,9.0,9.0,9.0,9.0,10.0,36.0,32.0,158.0
4,722366d8b29ece9be3a7605363562c7c60d6918e,9.0,8.0,8.0,8.0,9.0,8.0,8.0,9.0,9.0,10.0,35.0,36.0,157.0
5,74850e9d79cd340cd1a26839d14906c826ffb752,9.0,9.0,8.0,9.0,8.0,9.0,9.0,9.0,8.0,10.0,36.0,32.0,156.0


In [6]:
df.tail()

Unnamed: 0,email,hw-01,hw-02,hw-03,hw-04,hw-05,hw-06,hw-07,hw-08,hw-09,hw-10,project-01,project-02,total_score
723,475828a44077bcda33e4ffffcb3e663347fe61ca,4.0,,,,,,,,,,,,4.0
724,3f9e00c927598dc3a649dc0828c65b0a85a4600f,4.0,,,,,,,,,,,,4.0
725,63ce6c3124f9580fe53f80ea1fd6c5d2d241f890,4.0,,,,,,,,,,,,4.0
726,9f6927b68c31c6250a83c036d3d1abbe109e2bac,,3.0,,,,,,,,,,,3.0
727,fe5259a23ffbe03481099a63d2b298bb3163cf6d,,,3.0,,,,,,,,,,3.0


In [7]:
df.shape

(727, 14)