# Idomatic pandas
* "a group of words established by usage as having a meaning not deducible from those of the individual words (e.g., rain cats and dogs, see the light )." -- some dictionary
* What does it mean to be idomatic with respect to a programming language?

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('datasets/census.csv')
df.head()

## Pandas Idiom 1: Method Chaining

In [None]:
(df.where(df['SUMLEV']==50)
    .dropna()
    .set_index(['STNAME','CTYNAME'])
    .rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'}))

## Pandas Idiom 2: Functional Programming
* We've talked about this at length! Broadcasting! Vectorization!

In [None]:
rows = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013','POPESTIMATE2014', 
        'POPESTIMATE2015']

df['abs'] = df.apply(lambda x: np.max(x[rows]) - np.min(x[rows]), axis=1)

## Documentation, IDEs, and Testing

- General functions vs. specific ones: https://pandas.pydata.org/pandas-docs/stable/reference/index.html
- Deepnote Example

In [None]:
### REPRODUCED FOR REFERENCE
def energy(m, c): ## E = mc^2
    if m >= 0 and c >= 0:
        return m*c**2
    return None

def test_energy():
    assert isinstance(energy(1,1), int), 'should return an int!'
    assert energy(1,1) == 1
    assert energy(2,1) == 2
    assert energy(1,2) == 4
    assert energy(-1, -1) == None
    
test_energy()

 - Example: https://github.com/Liwmo/qwizard
 - Pandas testing: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.testing.assert_frame_equal.html

In [None]:
import pandas as pd
df = pd.DataFrame({'item':['banana', 'apple', 'starfruit', 'broccoli', 'cauliflower'], 'price': [0.50, 1.00, 2.00, 1.00, 2.00], 
                   'type':['fruit', 'fruit', 'fruit', 'veg', 'veg']})
df

### Refactoring

In [None]:
df = pd.DataFrame({'item':['banana', 'apple', 'starfruit', 'broccoli', 'cauliflower'], 
                   'price': [0.50, 1.00, 2.00, 1.00, 2.00], 
                   'type':['fruit', 'fruit', 'fruit', 'veg', 'veg']})

def get_total_by_type(df):
    return df.groupby('type')['price'].sum()

def calculate(df):
    a = 0 
    b = 0
    for i in range(len(df)):
        if df.iloc[i,2] == 'fruit':
            a += df.iloc[i,1]
        elif df.iloc[i,2] == 'veg':
            b += df.iloc[i,1]
    return pd.Series({'fruit': a, 'veg':b})

get_total_by_type(df)