## Reproduction of table: project using Python

* Very similar to table: person 
* Will work further to provide more insight <br><br>

* Time spent on table reproduction: 20 minutes
* Duration for table rerproduction: 2018/05/02 to 2018/05/02

In [64]:
#Data: what are the data to be included in the table?
#Source: where are the data being collected from?
#Formula: how are the data being processed?

In [65]:
%%html
<style>
table { float: left }
</style>

| data | source | formula | remarks |
| ----------------- | --------------------------- | ------- | - |
| project_id        | timesheet.project_id        | groupby |   |
| project_name      | timesheet.project_name      | groupby |   |
| organization_name | timesheet.organization_name | groupby |   |
| entry_count       | timesheet.duration          | count   |   |
| total_entry_hours | timesheet.duration          | sum     |   |
| avg_entry_hours   | timesheet.duration          | mean    | sum/count |
| min_entry_hours   | timesheet.duration          | min     |   |
| max_entry_hours   | timesheet.duration          | max     |   |
| first_entry       | timesheet.start_datetime    | min     |   |
| latest_entry      | timesheet.stop_datetime     | max     |   |
| activity_days     | timesheet.datetime          | minus   | stop - start |
| total_gross       | timesheet.total             | sum     |   |
| total_discount    | timesheet.total_discount    | sum     |   |
| person_count      | timesheet.person_name       | count   |   |

In [66]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import math
import datetime
import time
import pytz
import dateutil
from dateutil import relativedelta

sns.set(color_codes=True)
sns.set(rc={"figure.figsize": (16, 6)})

In [67]:
# Database connection credentials

user = "postgres"
password = "200323201"
host = "localhost"
port = "5432"
database = "heroku-timesheet"

In [68]:
db_string = "postgresql://{user}:{password}@{host}:{port}/{database}".format(user=user, 
                                                                             password=password,
                                                                             host=host,
                                                                             port=port,
                                                                             database=database)

In [69]:
from sqlalchemy import create_engine
engine = create_engine(db_string)
con = engine.connect()

In [70]:
query = """
    SELECT model.timesheet.project_id,
           model.timesheet.project_name,
           model.timesheet.organization_name,
           model.timesheet.duration,
           model.timesheet.start_datetime,
           model.timesheet.stop_datetime,
           model.timesheet.total,
           model.timesheet.total_discount,
           model.timesheet.person_name
    FROM model.timesheet
        """

In [71]:
timesheet = pd.read_sql(query, con)

In [72]:
timesheet.head()

Unnamed: 0,project_id,project_name,organization_name,duration,start_datetime,stop_datetime,total,total_discount,person_name
0,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,11:00:00,2017-09-20 01:00:00+00:00,2017-09-20 12:00:00+00:00,3300.0,0.0,Théophile Sandoz
1,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,11:00:00,2017-09-21 01:00:00+00:00,2017-09-21 12:00:00+00:00,3300.0,0.0,Théophile Sandoz
2,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,11:00:00,2017-09-22 01:00:00+00:00,2017-09-22 12:00:00+00:00,3300.0,0.0,Théophile Sandoz
3,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,11:00:00,2017-09-25 01:00:00+00:00,2017-09-25 12:00:00+00:00,3300.0,0.0,Théophile Sandoz
4,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,11:00:00,2017-09-26 01:00:00+00:00,2017-09-26 12:00:00+00:00,3300.0,0.0,Théophile Sandoz


In [73]:
def project(timesheet):
    
    project = {}
    project['entry_count']       = timesheet['duration'].count()
    project['total_entry_hours'] = round(timesheet['duration'].sum().total_seconds()/3600, 2)
    project['avg_entry_hours']   = round(timesheet['duration'].sum().total_seconds()/3600/timesheet['duration'].count(), 2)
    project['min_entry_hours']   = round(timesheet['duration'].min().total_seconds()/3600, 2)
    project['max_entry_hours']   = round(timesheet['duration'].max().total_seconds()/3600, 2)
    project['first_entry']       = timesheet['start_datetime'].min().strftime('%Y/%m/%d')
    project['latest_entry']      = timesheet['stop_datetime'].max().strftime('%Y/%m/%d')
    project['activity_days']     = int(math.ceil((timesheet['stop_datetime'].max() - timesheet['start_datetime'].min()).total_seconds()/3600/24))
    project['total_gross']       = round(timesheet['total'].sum(), 2)
    project['total_discount']    = round(timesheet['total_discount'].sum(), 2)
    project['person_count']      = timesheet['person_name'].nunique()
    
    return pd.Series(project, index=['entry_count', 'total_entry_hours', 
                                     'avg_entry_hours', 'min_entry_hours', 'max_entry_hours',
                                     'first_entry', 'latest_entry', 'activity_days',
                                     'total_gross', 'total_discount', 'person_count'])

project = timesheet.groupby(['project_id', 'project_name', 'organization_name']).apply(project)\
                   .sort_values(by=['total_gross'], ascending=False).reset_index()
project

Unnamed: 0,project_id,project_name,organization_name,entry_count,total_entry_hours,avg_entry_hours,min_entry_hours,max_entry_hours,first_entry,latest_entry,activity_days,total_gross,total_discount,person_count
0,8bb1a1ca-1aa0-44bb-80c7-4ce188b7c970,Atlas Project Timesheet,Coderbunker Shanghai,513,1216.3,2.37,0.17,9.0,2017/08/16,2018/04/15,243,337007.83,0.0,16
1,9e7b0e59-57a3-4bf8-a7c5-8fc1edb64080,DLG Project Timesheet,Coderbunker Shanghai,803,1146.1,1.43,0.05,8.5,2017/06/07,2018/04/15,313,333712.08,0.0,12
2,21d729ae-7dbe-41da-ba03-26e52dc6a853,Scry.Info Project Timesheet,Coderbunker Shanghai,352,720.77,2.05,0.17,10.5,2017/08/31,2018/04/13,226,302303.17,0.0,12
3,92a97c9d-7fe6-4fea-b49f-9e8aaf38c986,Coder Bunker x LVMH Project Timesheet / Spend,Coderbunker Shanghai,284,932.05,3.28,0.17,6.5,2018/01/15,2018/04/15,91,233912.5,0.0,5
4,cceb001c-d8f4-4cce-8649-d18c7318a637,EIC/Coderbunker aggregate timesheets,Coderbunker Shanghai,257,394.45,1.53,0.25,7.0,2016/10/17,2017/06/23,250,168650.0,0.0,3
5,471b7612-8a86-4273-9081-86b4684c5e43,AKQA Project Timesheet,Coderbunker Shanghai,50,464.38,9.29,4.0,12.25,2017/07/17,2017/09/29,75,139315.0,0.0,1
6,883d7f7f-9a86-4fa7-9797-8ed725d623f7,"YeDian (Night+, NightPlus) Project Timesheet",Coderbunker Shanghai,167,358.42,2.15,0.17,12.0,2017/06/07,2017/12/15,192,132912.5,0.0,9
7,01d45c8c-f9fd-4b88-9776-b5dfdf76fac9,Skycoin Project Timesheet,Coderbunker Shanghai,99,135.7,1.37,0.25,6.5,2017/12/23,2018/04/15,113,62465.5,0.0,4
8,e6d0cbd8-d2d4-4007-8daf-178abb921e78,Kipitapp Project Timesheet,Coderbunker Shanghai,79,148.08,1.87,0.17,10.0,2017/11/06,2018/04/07,153,50612.5,0.0,5
9,d6748093-7744-4742-8647-761b2d30f416,Weflex Project Timesheet,Coderbunker Shanghai,76,74.67,0.98,0.17,5.0,2017/07/03,2017/08/30,59,39529.17,0.0,3
