# Total Aggregations
We can perform aggregations dierctly on a Dataframe or we can perform aggregations after grouping by key(s). Some of the typically used functions to perform aggregations are: **count**, **sum**, **avg**, **min**, **max**

# Aggregate data using **groupBy**

We typically use this function to group data by a key(s). Some other functions to perform this action are: rollup, cube.

To perform *aggregations* we typically use: count, sum, avg, min, max. 

If we want to provide aliases to the aggregated fields then we have to use **agg** after **groupBy**

**Example**

    airtraffic. \
        filter('Cancelled = 0'). \
        groupBy(concat("year",
                       lpad("Month", 2, "0"),
                       lpad("DayOfMonth", 2, "0")
                      ).alias("FlightDate")
               ). \
        agg(
            count(lit(1)).alias("FlightCount"),
            sum('DepDelay').alias('TotalDepDelay'),
            avg('DepDelay').alias('AverageDepDelay')
        ). \
        show(31)
    
    +----------+-----------+-------------+------------------+
    |FlightDate|FlightCount|TotalDepDelay|   AverageDepDelay|
    +----------+-----------+-------------+------------------+
    |  20080120|      18406|     117460.0| 6.381614690861675|
    |  20080130|      19072|     129345.0| 6.781931627516778|
    |  20080115|      19204|      75096.0|3.9104353259737556|
    |  20080118|      20117|     223738.0|11.121837252075359|
    |  20080122|      18716|     303796.0| 16.23188715537508|
    |  20080104|      20160|     277373.0|13.758581349206349|
    |  20080125|      19787|     229850.0|11.616212664880983|
    |  20080102|      20442|     452979.0|22.159230995010272|
    |  20080105|      17610|     306068.0|17.380352072685973|
    |  20080111|      19825|     190918.0|  9.63016393442623|
    |  20080109|      19443|      89595.0| 4.608085172041352|
    |  20080127|      18265|     365491.0|20.010457158499865|
    |  20080101|      18623|     354108.0| 19.01455189819041|
    |  20080128|      19493|     220046.0|11.288462525008978|
    |  20080119|      15373|     155488.0|10.114356339035972|
    |  20080106|      19210|     323214.0| 16.82529932326913|
    |  20080123|      19239|     190807.0| 9.917719216175477|
    |  20080117|      19401|     341271.0|17.590381939075307|
    |  20080116|      19232|      61021.0| 3.172888935108153|
    |  20080112|      16346|      24876.0|1.5218402055548759|
    |  20080103|      20462|     329690.0|16.112305737464567|
    |  20080126|      15860|      92129.0| 5.808890290037831|
    |  20080108|      19140|     200670.0|10.484326018808778|
    |  20080110|      19956|     148603.0| 7.446532371216676|
    |  20080121|      19658|     370196.0| 18.83182419371248|
    |  20080129|      18596|     184855.0| 9.940578619057861|
    |  20080131|      19179|     396280.0|20.662182595547215|
    |  20080124|      19935|     158134.0| 7.932480561825934|
    |  20080107|      19762|     238431.0|12.065124987349458|
    |  20080113|      18587|     101753.0| 5.474417603701512|
    |  20080114|      19267|      98261.0| 5.099963668448643|
    +----------+-----------+-------------+------------------+

# Aggregate data using **rollup**

**Example** 

    orders. \
        groupBy(
            year('order_date').alias('order_year'),
            date_format('order_date', 'yyyyMM').alias('order_month'), 
            'order_date'
        ). \
        agg(count(lit(1)).alias('order_count')). \
        orderBy('order_year', 'order_month', 'order_date'). \
        show()
        
        +-----------+--------------------+-----------+
        |order_month|          order_date|order_count|
        +-----------+--------------------+-----------+
        |       null|                null|      68883|
        |     201307|                null|       1533|
        |     201307|2013-07-25 00:00:...|        143|
        |     201307|2013-07-26 00:00:...|        269|
        |     201307|2013-07-27 00:00:...|        202|
        |     201307|2013-07-28 00:00:...|        187|
        |     201307|2013-07-29 00:00:...|        253|
        |     201307|2013-07-30 00:00:...|        227|
        |     201307|2013-07-31 00:00:...|        252|
        |     201308|                null|       5680|
        |     201308|2013-08-01 00:00:...|        246|
        |     201308|2013-08-02 00:00:...|        224|
        |     201308|2013-08-03 00:00:...|        183|
        |     201308|2013-08-04 00:00:...|        187|
        |     201308|2013-08-05 00:00:...|        153|
        |     201308|2013-08-06 00:00:...|        258|
        |     201308|2013-08-07 00:00:...|        203|
        |     201308|2013-08-08 00:00:...|        154|
        |     201308|2013-08-09 00:00:...|        125|
        |     201308|2013-08-10 00:00:...|        270|
        +-----------+--------------------+-----------+
        only showing top 20 rows

# Aggregate data using **cube**

**Example** 

    orders. \
        cube(
            year('order_date').alias('order_year'),
            date_format('order_date', 'yyyyMM').alias('order_month'), 
            'order_date'
        ). \
        agg(count(lit(1)).alias('order_count')). \
        orderBy('order_year', 'order_month', 'order_date'). \
        show()
        
        +----------+-----------+--------------------+-----------+
        |order_year|order_month|          order_date|order_count|
        +----------+-----------+--------------------+-----------+
        |      null|       null|                null|      68883|
        |      null|       null|2013-07-25 00:00:...|        143|
        |      null|       null|2013-07-26 00:00:...|        269|
        |      null|       null|2013-07-27 00:00:...|        202|
        |      null|       null|2013-07-28 00:00:...|        187|
        |      null|       null|2013-07-29 00:00:...|        253|
        |      null|       null|2013-07-30 00:00:...|        227|
        |      null|       null|2013-07-31 00:00:...|        252|
        |      null|       null|2013-08-01 00:00:...|        246|
        |      null|       null|2013-08-02 00:00:...|        224|
        |      null|       null|2013-08-03 00:00:...|        183|
        |      null|       null|2013-08-04 00:00:...|        187|
        |      null|       null|2013-08-05 00:00:...|        153|
        |      null|       null|2013-08-06 00:00:...|        258|
        |      null|       null|2013-08-07 00:00:...|        203|
        |      null|       null|2013-08-08 00:00:...|        154|
        |      null|       null|2013-08-09 00:00:...|        125|
        |      null|       null|2013-08-10 00:00:...|        270|
        |      null|       null|2013-08-11 00:00:...|        154|
        |      null|       null|2013-08-12 00:00:...|        255|
        +----------+-----------+--------------------+-----------+
        only showing top 20 rows