<center>
    <font face= "product sans" style= "font-size: 200px"> Groups </font>
</center>

Here we will get more familiar with groups.

# 

In [1]:
import pandas as pd
import numpy as np

## → How to fill NaN from the individual groups? *(Your question)*

In [6]:
df = pd.DataFrame(np.random.randint(0, 100, (10,1)), columns= ['data'])
df.insert(0, 'key', np.random.choice(['A', 'B', 'C'], 10))
df.iloc[::2, 1] = np.nan
df

Unnamed: 0,key,data
0,C,
1,C,90.0
2,B,
3,B,74.0
4,C,
5,A,37.0
6,B,
7,C,55.0
8,C,
9,A,60.0


Now here, we can't just make a single mean values to fill for all NaN. So, we have 2 ways:
1. With Apply method (recommended)
2. With simple fixed values

In [11]:
df.groupby('key').mean()

Unnamed: 0_level_0,data
key,Unnamed: 1_level_1
A,48.5
B,74.0
C,72.5


↑ This data will be filled accordingly.

In [13]:
#    ↓ Will group, ↓ work on all columns,     ↓ works as on DF (for each group) and finally 'glues' all together
df.groupby('key').apply(lambda group: group.fillna(group.mean()))

Unnamed: 0_level_0,Unnamed: 1_level_0,key,data
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,5,A,37.0
A,9,A,60.0
B,2,B,74.0
B,3,B,74.0
B,6,B,74.0
C,0,C,72.5
C,1,C,90.0
C,4,C,72.5
C,7,C,55.0
C,8,C,72.5


### If there were multiple columns... then? 

In [34]:
df = pd.DataFrame(np.random.randint(0, 100, (10,3)), columns= ['data1', 'data2', 'data3'])
df.insert(0, 'key', np.random.choice(['A', 'B', 'C'], 10))
df.iloc[::2, 1] = np.nan
df.iloc[1::2, 2] = np.nan
df.iloc[2::2, 3] = np.nan
df

Unnamed: 0,key,data1,data2,data3
0,C,,39.0,87.0
1,B,37.0,,83.0
2,A,,82.0,
3,C,57.0,,74.0
4,B,,47.0,
5,A,43.0,,1.0
6,A,,60.0,
7,B,23.0,,57.0
8,C,,15.0,
9,A,25.0,,35.0


In [35]:
df.groupby('key').mean()

Unnamed: 0_level_0,data1,data2,data3
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,34.0,71.0,18.0
B,30.0,47.0,70.0
C,57.0,27.0,80.5


Look at these ↑ they will be filled.

In [94]:
df.groupby('key').apply(lambda group: group.fillna(group.mean()))

Unnamed: 0_level_0,Unnamed: 1_level_0,key,data1,data2,data3
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,2,A,34.0,82.0,18.0
A,5,A,43.0,71.0,1.0
A,6,A,34.0,60.0,18.0
A,9,A,25.0,71.0,35.0
B,1,B,37.0,47.0,83.0
B,4,B,30.0,47.0,70.0
B,7,B,23.0,47.0,57.0
C,0,C,57.0,39.0,87.0
C,3,C,57.0,27.0,74.0
C,8,C,57.0,15.0,80.5


Works!!

# 

## → Quickly make a 52 cards' deck 

In [111]:
deck = pd.Series([], dtype= int)
for suite in ['H', 'D', 'S', 'C']:
    for value, number in enumerate(['A'] + list(range(2, 11)) + ['J', 'Q', 'K']):
        deck = deck.add(pd.Series({str(number) + suite: value + 1}), fill_value= 0)

In [112]:
len(deck)

52

In [114]:
# Getting sample 3 cards from each suite
deck.groupby(lambda card: card[-1]).apply(lambda suite: suite.sample(3))

C  9C     9.0
   KC    13.0
   2C     2.0
D  AD     1.0
   6D     6.0
   3D     3.0
H  7H     7.0
   JH    11.0
   9H     9.0
S  AS     1.0
   9S     9.0
   JS    11.0
dtype: float64

#### Did you see that?
This is an amazing example of how well a FUNCTION can be used (4. - point in using groupby) to quickly make groups.

# 

## → How to work with 2 columns in groupby? 

For this example (well this is very straight forward) we will make a date with *xi* with its *weights*. And based on the category, we will multiply them and do the job done.

In [15]:
df = pd.DataFrame(np.random.randint(0, 100, (10, 2)), columns= ['xi', 'wi'])
df.insert(0, 'key', np.random.choice(['A', 'B'], 10))
df

Unnamed: 0,key,xi,wi
0,A,95,2
1,A,22,40
2,B,41,5
3,A,55,61
4,A,41,16
5,A,13,97
6,B,16,74
7,B,52,87
8,B,43,47
9,B,40,40


If it were just the simple one - we would have done in the simple way... but here:
1. Numpy native way (new - unseen)
2. Gambdai way

# 

### 1. Gambdai Way

In [28]:
def get_weighted_mean(group):
    n = group.shape[0]
    totals = np.sum(np.product([group.xi, group.wi], axis= 0))
    return totals / np.sum(group.wi)

In [29]:
df.groupby('key').apply(get_weighted_mean)

key
A    29.361111
B    37.683794
dtype: float64

# 

### 2. Numpy native way 

In [27]:
df.groupby('key').apply(lambda group: np.average(group.xi, weights= group.wi))

key
A    29.361111
B    37.683794
dtype: float64

See? Both are same, but still a new method to explore the technique.

# 

### Then the authors presents...
Some examples and functionalities of Pivot tables - as we already have explored them in the set, we are not going to take a look at them... (may be because I am eager to start a notebook on time series data)

**Here is the summary of what to do with group by:**<br>
### 1. **Discussed types of groupby**
    - List / Array / Index 
    - Column name
    - Sereis / Dict 
    - Function
### 2. **Iteration over groups**
### 3. **Grouping by axis = 1 (Need to give 1. way type)**
### 4. **Grouping on Index Levels**
### 5. **Data Aggregation**
    - agg (works on series - per group & col)
    - apply (works on df - per group & all col)
### 6. **Manual names .agg(('name', 'aggfunc'), ... )**
### 7. **Fill missing values based on groups**
    - apply(lambda x: x.fillna(x.mean()))
    - apply(lambda x: x.fillna(dict('a' : -99, 'b' : -999)[x.name])
### 8. **Work on multiple columns in the groupby**


### * **Others:**
    - as_index = T/F
    - group_keys = T/F


# 

# That's it!
Next up, we will dive deep in the time series data - and work with them.