In [1]:
import pandas as pd
import numpy as np

##### *Question:* What is a dataframe?

*Answer:* In the Pandas Python library, a Dataframe is a class for storing rows and columns of data.  It is similar to a spreadsheet or SQL table, except that it is stored in memory as the script runs.  Thus, you can do the same spreadsheet operations over and over again in a script with different input.  

A Dataframe object may be created by reading input from a csv file, SQL table, or other table-like objects.

For reference, see the [Python Pandas website](http://pandas.pydata.org/) and the book *'Python for Data Analysis'* by Wes McKinney.

Below is an example of creating a dataframe from a Python list.

In [2]:
def get_product_df():
    "get product dataframe"
    df = pd.DataFrame([['Dove', 18.2, 30, '2017-05-01'], ['Dove', 23.2, 40, '2017-06-01'],
            ['Dove', 21.4, 32, '2017-06-03'], ['Spam', 7.2, 20, '2017-06-11']])
    df.columns = ['Name', 'Price', 'Shares', 'Date']
    return df

In [3]:
df = get_product_df()  # get dataframe
print(df)

   Name  Price  Shares        Date
0  Dove   18.2      30  2017-05-01
1  Dove   23.2      40  2017-06-01
2  Dove   21.4      32  2017-06-03
3  Spam    7.2      20  2017-06-11


##### *Question:* What is a lambda?  

*Answer:* A lambda is an anonymous method or function.  This means it has no name.  You can define it on-the-fly within code to do the same thing as a method, and use it anywhere in code where a method can be used.  

*Note:* This is a practical definition of a Python lambda, not a theoretical definition used in computer science.  

Below are some examples.

In [4]:
ar = [1, 2, 3, 5]
ar2 = map(lambda x: x*2, ar)
print(list(ar2))

[2, 4, 6, 10]


##### *Question:* How do I filter and aggregate rows in a pandas dataframe?

In [5]:
df = get_product_df()                    # get dataframe
df = df[df['Date'] > '2017-05-31']       # filter by Date
df['Cost'] = df['Price'] * df['Shares']  # compute Cost, set into new column
print(df)

   Name  Price  Shares        Date   Cost
1  Dove   23.2      40  2017-06-01  928.0
2  Dove   21.4      32  2017-06-03  684.8
3  Spam    7.2      20  2017-06-11  144.0


In [6]:
df2 = df.groupby('Name').agg(np.sum)    # group by Name and apply sum aggregate function
df2 = df2[['Shares', 'Cost']]       # filter out Price column, get total Shares and Cost
print(df2)

      Shares    Cost
Name                
Dove      72  1612.8
Spam      20   144.0


In [7]:
df2 = df.groupby('Name').agg([np.sum, np.mean])  # apply sum and mean aggregate functions
df2 = df2[['Shares', 'Cost']]        # filter out Price column, get total Shares and Cost
print(df2)

     Shares         Cost       
        sum mean     sum   mean
Name                           
Dove     72   36  1612.8  806.4
Spam     20   20   144.0  144.0


See the [Pandas groupby docs](https://pandas.pydata.org/pandas-docs/stable/groupby.html) for more details.  

The above is a brief introduction to what Pandas can do for you.  For more details, see Wes' book, especially Chapter 9 to learn more about aggregation and groupby operations.