In [1]:
import numpy as np, scipy, pandas as pd
from pyDbs.__init__ import *

# Documentation for ```pyDbs.Broadcast```

In [2]:
pd.set_option('display.max_rows', 5)

A helper class used to align the domains of two pandas objects (series, indices). All methods are implemented as ```staticmethod```. The class contains five simple methods that go through in turn here:

*Define some pandas series, dataframes, and indices:*

In [11]:
idx1 = pd.Index(range(10), name = 'a')
idx2 = pd.Index(range(11,15), name = 'b')
idx3 = pd.Index(range(20,24), name = 'c')
mIdx1 = pd.MultiIndex.from_product([idx1,idx2])
mIdx2 = pd.MultiIndex.from_arrays([idx1[0:4], idx2[0:4]])
mIdx3 = pd.MultiIndex.from_product([idx1[3:6], idx3])
s1 = pd.Series(range(len(idx1)), index = idx1, name = 's1')
s2 = pd.Series(range(len(idx2[2:])), index = idx2[2:], name = 's2')
s3 = pd.Series(range(len(mIdx1[0:20])), index = mIdx1[0:20], name = 's3')
s4 = pd.Series(range(len(mIdx2)), index = mIdx2, name = 's4')
df = pd.DataFrame(np.vstack([range(len(mIdx1)), range(len(mIdx1))]).T, index = mIdx1, columns = ['c1','c2'])

#### 1. ```Broadcast.idx(x,y, how = 'inner')```

The method broadcasts two pandas indices x,y to a common domain. If domains are not overlapping, this returns the cartesian product index. If they overlap, we align domains by relying on pandas ```pd.merge``` method. Here, we allow for the ```how``` argument passed to ```pd.merge``` to be adjusted, but it will result in nan values filled in for missing values.

**Example: Non-overlapping indices.**

*For non-overlapping indices this returns the cartesian product index. Note that if x[a,b] and y[c], this does not create a cartesian product of the three domains (a,b,c); we only keep combinations of (a,b) that are in x[a,b].*

In [19]:
Broadcast.idx(idx1, idx2)

MultiIndex([(0, 11),
            (0, 12),
            (0, 13),
            (0, 14),
            (1, 11),
            (1, 12),
            (1, 13),
            (1, 14),
            (2, 11),
            (2, 12),
            (2, 13),
            (2, 14),
            (3, 11),
            (3, 12),
            (3, 13),
            (3, 14),
            (4, 11),
            (4, 12),
            (4, 13),
            (4, 14),
            (5, 11),
            (5, 12),
            (5, 13),
            (5, 14),
            (6, 11),
            (6, 12),
            (6, 13),
            (6, 14),
            (7, 11),
            (7, 12),
            (7, 13),
            (7, 14),
            (8, 11),
            (8, 12),
            (8, 13),
            (8, 14),
            (9, 11),
            (9, 12),
            (9, 13),
            (9, 14)],
           names=['a', 'b'])

**Example: Partial overlap**

In the following example, mIdx1[a,b] overlaps with mIdx3[a,c] in the domain $a$. When we merge on ```how='inner'```, this identifies all elements in $a$ that are in both and keep them. For the non-overlapping part, this keeps all elements in $b$ in mIdx1[a,b] that enters *for the relevant elements of $a$.* 

In [18]:
Broadcast.idx(mIdx1, mIdx3) # only show the first 5 entries.

MultiIndex([(3, 11, 20),
            (3, 11, 21),
            (3, 11, 22),
            (3, 11, 23),
            (3, 12, 20),
            (3, 12, 21),
            (3, 12, 22),
            (3, 12, 23),
            (3, 13, 20),
            (3, 13, 21),
            (3, 13, 22),
            (3, 13, 23),
            (3, 14, 20),
            (3, 14, 21),
            (3, 14, 22),
            (3, 14, 23),
            (4, 11, 20),
            (4, 11, 21),
            (4, 11, 22),
            (4, 11, 23),
            (4, 12, 20),
            (4, 12, 21),
            (4, 12, 22),
            (4, 12, 23),
            (4, 13, 20),
            (4, 13, 21),
            (4, 13, 22),
            (4, 13, 23),
            (4, 14, 20),
            (4, 14, 21),
            (4, 14, 22),
            (4, 14, 23),
            (5, 11, 20),
            (5, 11, 21),
            (5, 11, 22),
            (5, 11, 23),
            (5, 12, 20),
            (5, 12, 21),
            (5, 12, 22),
            (5, 12, 23),


*Note* Using ```how = 'inner'``` to merge indices with overlapping domains, we will not necessarily keep the full domains from x,y - only if the shared index levels match 100%.

#### 2. ```Broadcast.seriesToIdx(series, idx, fIdx = False, how = 'inner')```

Broadcast a pd.Series ('series') to align with pd.Index ('idx').
Values from `series` are repeated across the new dimensions as appropriate (i.e., Cartesian expansion along dimensions that series doesn't have). If dimensions in series does not overlap with idx, broadcast to cartesian product of the two. If 'fIdx' is True, we assume that the index 'idx' has already been broadcasted to suitable domains.

	Parameters
	----------
	series : pd.Series 
	idx : pd.MultiIndex (or Index)
	fIdx: bool. 
	Returns
	----------
	broadcasted_series : pd.Series 

The ```Broadcast.seriesToIdx``` follows the logic of the ```Broadcast.idx``` method; it defines the full new index by broadcasting logic from the ```Broadcast.idx``` method and uses values from the 'series' in this new object:

**Example: No overlap**

In [28]:
Broadcast.seriesToIdx(s2, mIdx3) # broadcasts s2[b] to [a,c] domains from mIdx3. As they do not overlap, this uses the cartesian product expansion

b   a  c 
13  3  20    0
       21    0
            ..
14  5  22    1
       23    1
Name: s2, Length: 24, dtype: int64

Equivalent result using the ```fIdx``` argument:

In [29]:
fIdx = Broadcast.idx(s2.index, mIdx3) 
Broadcast.seriesToIdx(s2, fIdx, fIdx = True) # equivalent approach

b   a  c 
13  3  20    0
       21    0
            ..
14  5  22    1
       23    1
Name: s2, Length: 24, dtype: int64

**Example: Partial overlap**

In [31]:
Broadcast.seriesToIdx(s3, mIdx3)

a  b   c 
3  11  20    12
       21    12
             ..
4  14  22    19
       23    19
Name: s3, Length: 32, dtype: int64

#### 3. ```Broadcast.series(x,y, how = 'inner')```

This follows the logic of ```Broadcast.idx``` as well, but deals with two pd.Series instances ('x','y'): we start by broadcasting the series' indices to a common index, then apply the ```Broadcast.seriesToIdx``` for both x,y. The method returns both x and y broadcasted to the common index.

**Example: Non-overlapping domains**

In [36]:
broadcasted_x, broadcasted_y = Broadcast.series(s1, s2)
pd.concat([broadcasted_x,broadcasted_y], axis =1) # print broadcasted series together

Unnamed: 0_level_0,Unnamed: 1_level_0,s1,s2
a,b,Unnamed: 2_level_1,Unnamed: 3_level_1
0,13,0,0
0,14,0,1
...,...,...,...
9,13,9,0
9,14,9,1


**Example: Overlapping domains**

In [42]:
broadcasted_x, broadcasted_y = Broadcast.series(s1,s3)
pd.concat([broadcasted_x,broadcasted_y], axis =1) # print broadcasted series together

Unnamed: 0_level_0,Unnamed: 1_level_0,s1,s3
a,b,Unnamed: 2_level_1,Unnamed: 3_level_1
0,11,0,0
0,12,0,1
...,...,...,...
4,13,4,18
4,14,4,19


#### 4. ```Broadcast.valuesToIdx(values, idx, fIdx = False, how = 'inner')```

Identical to ```Broadcast.seriesToIdx```, except it allows for a scalar to be used added:
* If ```isinstance(values, pd.Series)``` --> use ```Broadcast.seriesToIdx```.
* If values is a scalar  --> return ```pd.Series(values, index = idx)```

#### 5. ```Broadcast.values(x, y, how = 'inner')```

Identical to ```Broadcast.series```, except it allows for x,y to be pandas series or scalars.