Given a csv file with the following contents

```
20180701, A
20180702, A, B
20180703, A, B, C
20180704, B, C
20180705, C
```

The idea here is to transform the underlying data into a dataframe such as

```
    date,     A,     B,     C
20180701,  True, False, False
20180702,  True,  True, False
20180703,  True,  True,  True
20180704, False,  True,  True
20180705, False, False,  True
```

See also:- https://mail.python.org/pipermail/python-list/2019-July/741529.html

Ref:- https://en.wikipedia.org/wiki/Contingency_table

In [1]:
import pandas as pd
import numpy as np

# expand the data into two numpy arrays such as
# a = np.array(['20180701', '20180702', '20180702', '20180703', '20180703', '20180703', '20180704', '20180704', '20180705'])
# b = np.array(['A', 'A', 'B', 'A', 'B', 'C', 'B', 'C', 'C'])

rows = []
cols = []

with open('data.csv') as fo:
    for line in fo:
        line = line.strip()
        elem = line.split(',')
        N = len(elem)
        rows += elem[0:1] * (N-1)
        cols += elem[1:]

a = np.array(rows)
b = np.array(cols)

In [2]:
print(a)
print(b)

['20180701' '20180702' '20180702' '20180703' '20180703' '20180703'
 '20180704' '20180704' '20180705']
[' A' ' A' ' B' ' A' ' B' ' C' ' B' ' C' ' C']


In [3]:
df = pd.crosstab(a, b)
print(df)

col_0      A   B   C
row_0               
20180701   1   0   0
20180702   1   1   0
20180703   1   1   1
20180704   0   1   1
20180705   0   0   1


Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html

In [4]:
# Get into the desired form
df = pd.crosstab(a, b, rownames=['date']).astype('bool').reset_index()
df

col_0,date,A,B,C
0,20180701,True,False,False
1,20180702,True,True,False
2,20180703,True,True,True
3,20180704,False,True,True
4,20180705,False,False,True


useful links on crosstab
* https://riptutorial.com/pandas/example/6821/cross-tabulation - introductory, the example used is nice.
* https://pbpython.com/pandas-crosstab.html
* http://www.datasciencemadesimple.com/cross-tab-cross-table-python-pandas/