### Get single-loadsource loadsourcegroups

**Sets:**<br>
$\Lambda$, the set of load sources $\lambda$ <br>
$\Psi$, the set of load source groups $\psi$ <br>
$\Psi^{*}$, the set of load source groups containing single load sources

*Note:* <br>
-Load source groups ($\psi$) can contain one or many load sources ($\lambda$).<br>
-Likewise, load sources ($\lambda$) can belong to one or many load source groups ($\psi$).

**Purpose:**<br>
We want to find the subset of load source groups ($\Psi^{*}\subset\Psi$)  for which each member group contains only one load source (i.e. $\left\vert\psi^{*}\right\vert=1 \quad\forall \psi^{*}\in\Psi^{*}$)<br>Then, we want to check that all load sources are represented within this subset. In other words, we want a subset such that there is one-to-one (bijective) mapping between $\Lambda \leftrightarrow \bigcup\Psi^{*}$

In [5]:
import os
import pandas as pd
from IPython.display import display

**File paths**

In [2]:
baseexppath = '/Users/Danny/Desktop/CATEGORIES/CAREER_MANAGEMENT/CRC_ResearchScientist_Optimization/Optimization_Tool/2_ExperimentFolder/'

# Data table directories
sourcedatadir = os.path.join(baseexppath, 'OptSandbox/data/test_source/')
metadatadir = os.path.join(baseexppath, 'OptSandbox/data/test_metadata/')

**Read in the data tables**

In [3]:
TblLoadSource = pd.read_csv(os.path.join(sourcedatadir, 'TblLoadSource.csv'))
TblLoadSourceGroup = pd.read_csv(os.path.join(sourcedatadir, 'TblLoadSourceGroup.csv'))
TblLoadSourceGroupLoadSource = pd.read_csv(os.path.join(sourcedatadir, 'TblLoadSourceGroupLoadSource.csv'))

**Parse**

*Get the original correspondences between loadsourcegroup and loadsource*<br>
This table tells us the loadsource members ($\lambda$) for each loadsourcegroup ($\psi$)<br>

In [79]:
df = TblLoadSourceGroupLoadSource.sort_values('loadsourcegroupid')
display(df.head(4))

Unnamed: 0,loadsourcegroupid,loadsourceid
338,1,43
327,2,39
262,2,32
263,4,32


From the original table we'll create a groupby dictionary that maps load source group ($\psi$) to load source ($\lambda$), and retain only those groups that have a single load source within them.

In [67]:
grouped = df.groupby(['loadsourcegroupid'])
counts = grouped.apply(lambda x: x.count())[['loadsourceid']]
display(counts.head(4))

# Extract the loadsourcegroupids that contain only one load source
singlels_groups = list(counts[counts['loadsourceid']==1].index)
display(singlels_groups[0:3])

Unnamed: 0_level_0,loadsourceid
loadsourcegroupid,Unnamed: 1_level_1
1,1
2,2
4,1
5,1


[1, 4, 5]

Now we'll check that the set of load sources represented by all of these single-loadsource groups is the same as the full set of load sources.

That is, we check whether $\bigcup\Psi^{*}=\Lambda$

In [80]:
singlels_groups_df = pd.DataFrame(singlels_groups, columns=['loadsourcegroupid'])
lss = TblLoadSourceGroupLoadSource.merge(singlels_groups_df,
                                   on='loadsourcegroupid', how='inner')
display(lss.sort_values('loadsourcegroupid').head(4))

symdif = set(lss['loadsourceid']).symmetric_difference(set(TblLoadSource['loadsourceid']))
display(symdif)

Unnamed: 0,loadsourcegroupid,loadsourceid
35,1,43
26,4,32
32,5,39
14,6,17


{16, 61, 62, 63, 64}

It turns out that there are five load sources ($\lambda$) that don't belong to any load source groups ($\psi$)

In [85]:
# write to file
lss[['loadsourceid', 'loadsourcegroupid']].sort_values('loadsourceid').to_csv('single-ls_groups.csv')