# Overview:

In order to deal with accessing and storing the mounds of data associated with the matrix project, I have written a script called matrix_manager. The main workhorse of this script is custom class called 'Database' that uses the 'shelve' package (https://docs.python.org/3.4/library/shelve.html). There is also an accessory function called filter_data in this script that I use a lot.

The purpose of the 'Database' class is to store relevant information related to the project (locations of coolers, location of analysis, location of dot calls), as well as to provide a set of methods to easily access the various type of features we extract from Hi-C maps (scalings, eigvectors, pileups etc)

This notebook is meant to be a tutorial to show you how to create and use these Database object. All my other scripts rely on this class to access the data that I'm analyzing. 

In [17]:
import matrix_manager as mm
import shelve
%matplotlib notebook

# Code to create matrix database

One can create an instance of the Databse object by giving it the path to where the database file are or will be stored. Since the database is being made for the first, all it's attributes are set to either None, '' or [ ].

In [18]:
imp.reload(mm)
db_path = '/net/levsha/share/sameer/U54/matrix_shared/sameer/metadata/U54_matrix_info'
db = mm.Database(db_path)
print(db.metadata, db.keys, db.analysis_path, db.cooler_paths, db.dot_paths)

None []   


Now I will create the database for the matrix project. For this I will need to feed the Database object 4 things:

1) A list of paths the point to where the coolers are located.

2) The path to the base directory where all the analysis will be stored

3) A list of paths for the dot calls

4) A DataFrame that contains the metadata about the library. This table must contain 6 columns: lib_name, celltyep, xlink, enzyme, cycle, seq. 

    a) The 'lib_name' column contains the name of the library upto the first '.' in it's name. So U54-ESC-DSG-DdeI-20161014-R1-T1__hg38.hg38.mapq_30.1000.mcool becomes U54-ESC-DSG-DdeI-20161014-R1-T1__hg38
    
    b) 'celltype', 'xlink' and 'enzyme' should be obvious.
    
    c) 'cycle' column represents whether the library is synchronized or not. Most libraries will be classified as NS (non-synchronous) but the HelaS3 libraries will be split into NS, G1 and M
    
    d) 'seq' refers to if the library is a deeply sequenced library or not. This column can take 3 values - 'deep', 'control' or '-'. Libraries labelled 'deep' are deeply sequenced, while libraries called 'control' are not deeply sequenced but have the same ('celltype','xlink','enzyme') combination as a deep library. Libraries called '-' do not have a deep equivalent.

In [19]:
cooler_paths = ['/net/levsha/share/lab/U54/2019_mapping_hg38/U54_deep/cooler_library_group/',
                '/net/levsha/share/lab/U54/2019_mapping_hg38/U54_matrix/cooler_library/']

analysis_path = '/net/levsha/share/sameer/U54/hic_matrix/'

dot_paths = ['/net/levsha/share/lab/U54/2019_mapping_hg38/U54_matrix/snakedots/',
             '/net/levsha/share/lab/U54/2019_mapping_hg38/U54_deep/snakedots/']

In [25]:
## The details of this cell are not important. I'm just creating the metadata table from the cooler names.

df_dict = defaultdict(list)
for path in cooler_paths:
    for file in os.listdir(path):
        lib_name = file.split('.')[0]
        df_dict['lib_name'].append(lib_name)
        
        if '-END' in lib_name:
            df_dict['celltype'].append('END')
        elif ('-ESC' in lib_name) or ('H1ESC' in lib_name):
            df_dict['celltype'].append('ESC')
        elif '-HFF' in lib_name:
            df_dict['celltype'].append('HFF')
        else:
            df_dict['celltype'].append('HelaS3')
            
        if '-DSG-' in lib_name:
            df_dict['xlink'].append('DSG')
        elif '-EGS-' in lib_name:
            df_dict['xlink'].append('EGS')
        else:
            df_dict['xlink'].append('FA')
            
        if '-MNase-' in lib_name:
            df_dict['enzyme'].append('MNase')
        elif '-DdeI-DpnII-' in lib_name:
            df_dict['enzyme'].append('double')
        elif '-DdeI-' in lib_name:
            df_dict['enzyme'].append('DdeI')
        elif '-DpnII-' in lib_name:
            df_dict['enzyme'].append('DpnII')
        else:
            df_dict['enzyme'].append('HindIII')
          
        if 'deep' in path:
            df_dict['seq'].append('deep')
        else:
            df_dict['seq'].append('-')
            
        if '-G1-' in lib_name:
            df_dict['cycle'].append('G1')
        elif '-M-' in lib_name:
            df_dict['cycle'].append('M')
        else:
            df_dict['cycle'].append('NS')
            
df = pd.DataFrame(df_dict).sort_values(['celltype','xlink','enzyme','cycle','seq']).reset_index(drop=True)
df = df.drop([4, 5, 21, 22, 39, 40, 42]) 
metadata = df[['lib_name','seq','celltype','xlink','enzyme','cycle']].reset_index(drop=True)

deep_indices = metadata.loc[(metadata['seq']=='deep') & (metadata['enzyme'] != 'double')].index.values
metadata.loc[deep_indices-1, 'seq'] = 'control'
metadata

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS
...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M


Once we create the dataset, we see that most of the attributes of the object as now filled. Note: If you try to use the create_dataset method once you have already created the shelve object, it will raise an error.

In [28]:
db.create_dataset(metadata, cooler_paths, analysis_path, dot_paths)
display(db.metadata)
print(db.keys, db.analysis_path, db.cooler_paths, db.dot_paths)

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS
...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M


[] /net/levsha/share/sameer/U54/hic_matrix/ ['/net/levsha/share/lab/U54/2019_mapping_hg38/U54_deep/cooler_library_group/', '/net/levsha/share/lab/U54/2019_mapping_hg38/U54_matrix/cooler_library/'] ['/net/levsha/share/lab/U54/2019_mapping_hg38/U54_matrix/snakedots/', '/net/levsha/share/lab/U54/2019_mapping_hg38/U54_deep/snakedots/']


## Accessing and modifying an already existing database

Now that we've created the database, you can access it by initializing the object with the right database_path.

In [6]:
imp.reload(mm)
db = mm.Database(db_path)
display(db.metadata)

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS
...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M


In [7]:
# I can also alternative access the metadata using the get_tables() method.
display(db.get_tables())

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS
...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M


### Adding to database

For each feature of Hi-C (pileups for example), I like to create various metrics that quantify that feature (dot enrichment score for example) and store these away permanently. I can do this by using the add_table method. The add table method takes in a DataFrame. However this data __must__ have a column named 'lib_name' that has identical entries to the 'lib_name' column in db.metadata

In [8]:
df = db.get_tables()
df = df[['lib_name']].copy()
df.loc[:, 'dummy'] = 1
df

Unnamed: 0,lib_name,dummy
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,1
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,1
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,1
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,1
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,1
...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,1
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,1
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,1


In [9]:
db.add_table('dummy', df)

Now even if I reinitialize the object, it will retreive the 'dummy' table in addition to the metadata

In [10]:
db = mm.Database(db_path)
print(db.keys)
db.get_tables('dummy') # I can give this function a any list of keys that I know the database contains. 
                      # It will append these tables to the metadata table and return it

['dummy']


Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,dummy
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS,1
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS,1
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS,1
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS,1
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS,1
...,...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M,1
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS,1
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1,1
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M,1


### Modifying the database
Modifying an existing table is done using the modify_table() method.

In [11]:
df['dummy'] = np.nan
db.modify_table('dummy', df)

In [12]:
db.get_tables('dummy')

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,dummy
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS,
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS,
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS,
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS,
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS,
...,...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M,
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS,
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1,
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M,


### Removing from the database
Removing an existing table is done using the remove_table() method.

In [13]:
db.remove_table('dummy')
db.keys

[]

## Accessing coolers from the database

I've created this database to allow easy access to the various data files associated with the matrix project. I've created methods for retrieving coolers, scalings, eigenvectors, pileups and insulation tracks. Here I will show you how to access cooler files. Storing the cooler objects in the dataframe allows me to easily iterate through the dataframe and apply my operation sequentially.

In [14]:
table = db.get_tables()
table = db.get_coolers(table, res=100000)
table

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,cooler_100000
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS,"<Cooler ""U54-END-DSG-DdeI-20161031-R1-T1__hg38..."
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS,"<Cooler ""U54-END-DSG-DpnII-20190711-R2-T1__hg3..."
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS,"<Cooler ""U54-END-DSG-HindIII-20161206-R1-T1__h..."
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS,"<Cooler ""U54-END-DSG-MNase-20170508-R1-T1__hg3..."
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS,"<Cooler ""U54-END4DN-FA-DSG-MNase-R1-R2_hg38.hg..."
...,...,...,...,...,...,...,...
69,U54-HelaS3-M-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,M,"<Cooler ""U54-HelaS3-M-FA-HindIII-20180730-R1-T..."
70,U54-HelaS3-NS-FA-HindIII-20180730-R1-T1__hg38,-,HelaS3,FA,HindIII,NS,"<Cooler ""U54-HelaS3-NS-FA-HindIII-20180730-R1-..."
71,U54-HelaS3-G1-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,G1,"<Cooler ""U54-HelaS3-G1-FA-MNase-08072018-R1-T1..."
72,U54-HelaS3-M-FA-MNase-08072018-R1-T1__hg38,-,HelaS3,FA,MNase,M,"<Cooler ""U54-HelaS3-M-FA-MNase-08072018-R1-T1_..."


 You may be wondering why I chose to feed in the metadata table to to get_coolers() method, we the database object already has access to the metadata. The reason for this is that I can now chain several get_coolers() methods together as shown below. 
 
 I use this methodology regularly for analysis that requires multiple types of data as input. For example, for making saddleplots, I would need coolers, expected curves and eigenvectors. Using this, I can easily pipe the result of get_coolers() into the get_scalings() method and further pipe the output of that into the get_eigendecomps() method. This allows me to access and keep track of all the required data to create saddleplots for the entire matrix project in one shot.

In [15]:
table = db.get_tables()
table = db.get_coolers(table, res=100000)
display(table.head())
table = db.get_coolers(table, res=1000)
display(table.head())

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,cooler_100000
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS,"<Cooler ""U54-END-DSG-DdeI-20161031-R1-T1__hg38..."
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS,"<Cooler ""U54-END-DSG-DpnII-20190711-R2-T1__hg3..."
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS,"<Cooler ""U54-END-DSG-HindIII-20161206-R1-T1__h..."
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS,"<Cooler ""U54-END-DSG-MNase-20170508-R1-T1__hg3..."
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS,"<Cooler ""U54-END4DN-FA-DSG-MNase-R1-R2_hg38.hg..."


Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,cooler_100000,cooler_1000
0,U54-END-DSG-DdeI-20161031-R1-T1__hg38,-,END,DSG,DdeI,NS,"<Cooler ""U54-END-DSG-DdeI-20161031-R1-T1__hg38...","<Cooler ""U54-END-DSG-DdeI-20161031-R1-T1__hg38..."
1,U54-END-DSG-DpnII-20190711-R2-T1__hg38,-,END,DSG,DpnII,NS,"<Cooler ""U54-END-DSG-DpnII-20190711-R2-T1__hg3...","<Cooler ""U54-END-DSG-DpnII-20190711-R2-T1__hg3..."
2,U54-END-DSG-HindIII-20161206-R1-T1__hg38,-,END,DSG,HindIII,NS,"<Cooler ""U54-END-DSG-HindIII-20161206-R1-T1__h...","<Cooler ""U54-END-DSG-HindIII-20161206-R1-T1__h..."
3,U54-END-DSG-MNase-20170508-R1-T1__hg38,control,END,DSG,MNase,NS,"<Cooler ""U54-END-DSG-MNase-20170508-R1-T1__hg3...","<Cooler ""U54-END-DSG-MNase-20170508-R1-T1__hg3..."
4,U54-END4DN-FA-DSG-MNase-R1-R2_hg38,deep,END,DSG,MNase,NS,"<Cooler ""U54-END4DN-FA-DSG-MNase-R1-R2_hg38.hg...","<Cooler ""U54-END4DN-FA-DSG-MNase-R1-R2_hg38.hg..."


I've tried to make the code as flexible as possible but there are some bottlenecks. For example, P(s) curves are expected to be stored in hdf5 formats because it allows me to store P(s) as well as average trans interactions in the same location. Similiarly, similarly eigenvectors and eigenvalues are stored together in an hdf5 file. Pileups are expected to be stored in the .npy format and insulation tracks are .txt file with '\t' separation.

The my various notebooks should be able to show you how I use this database class to do all the analysis I've done

The last function that will be used often is the filter_data function. This is **NOT** a method of the Database class. It is used to filter the table using values of the metadata. For example, I will show you, how I filter for libraries that are either 'HFF' or 'ESC' but only 'DSG'

In [16]:
mm.filter_data(table, filter_dict={'celltype':['ESC','HFF'],'xlink':'DSG'})

Unnamed: 0,lib_name,seq,celltype,xlink,enzyme,cycle,cooler_100000,cooler_1000
14,U54-ESC-DSG-DdeI-20161014-R1-T1__hg38,-,ESC,DSG,DdeI,NS,"<Cooler ""U54-ESC-DSG-DdeI-20161014-R1-T1__hg38...","<Cooler ""U54-ESC-DSG-DdeI-20161014-R1-T1__hg38..."
15,U54-ESC-DSG-DpnII-20160722-R1-T1__hg38,control,ESC,DSG,DpnII,NS,"<Cooler ""U54-ESC-DSG-DpnII-20160722-R1-T1__hg3...","<Cooler ""U54-ESC-DSG-DpnII-20160722-R1-T1__hg3..."
16,U54-ESC4DN-DSG-DpnII-R1-R2_hg38,deep,ESC,DSG,DpnII,NS,"<Cooler ""U54-ESC4DN-DSG-DpnII-R1-R2_hg38.hg38....","<Cooler ""U54-ESC4DN-DSG-DpnII-R1-R2_hg38.hg38...."
17,U54-ESC-DSG-HindIII-20161206-R1-T1__hg38,-,ESC,DSG,HindIII,NS,"<Cooler ""U54-ESC-DSG-HindIII-20161206-R1-T1__h...","<Cooler ""U54-ESC-DSG-HindIII-20161206-R1-T1__h..."
18,U54-ESC-DSG-MNase-20170508-R2-T1__hg38,control,ESC,DSG,MNase,NS,"<Cooler ""U54-ESC-DSG-MNase-20170508-R2-T1__hg3...","<Cooler ""U54-ESC-DSG-MNase-20170508-R2-T1__hg3..."
19,U54-H1ESC4DN-FA-DSG-MNase-R1-R2_hg38,deep,ESC,DSG,MNase,NS,"<Cooler ""U54-H1ESC4DN-FA-DSG-MNase-R1-R2_hg38....","<Cooler ""U54-H1ESC4DN-FA-DSG-MNase-R1-R2_hg38...."
29,U54-HFF-plate-DSG-DdeI-20160812-R1-T1__hg38,control,HFF,DSG,DdeI,NS,"<Cooler ""U54-HFF-plate-DSG-DdeI-20160812-R1-T1...","<Cooler ""U54-HFF-plate-DSG-DdeI-20160812-R1-T1..."
30,U54-HFFc6-DSG-DdeI-R1-R2_hg38,deep,HFF,DSG,DdeI,NS,"<Cooler ""U54-HFFc6-DSG-DdeI-R1-R2_hg38.hg38.ma...","<Cooler ""U54-HFFc6-DSG-DdeI-R1-R2_hg38.hg38.ma..."
31,U54-HFF-plate-DSG-DpnII-20170119-R2-T1__hg38,control,HFF,DSG,DpnII,NS,"<Cooler ""U54-HFF-plate-DSG-DpnII-20170119-R2-T...","<Cooler ""U54-HFF-plate-DSG-DpnII-20170119-R2-T..."
32,U54-HFFc6-DSG-DpnII-R1-R2_hg38,deep,HFF,DSG,DpnII,NS,"<Cooler ""U54-HFFc6-DSG-DpnII-R1-R2_hg38.hg38.m...","<Cooler ""U54-HFFc6-DSG-DpnII-R1-R2_hg38.hg38.m..."


**NOTE:**

**The version of the Database class that I was using uptil now had certain things like cooler_paths etc hardcoded into the script. I modified the scripts so that others can use the Database class by just allowing those paths to fed in as variable in the create_dataset() method.**

**In the process, I also modified the layout a bit. The get_coolers, get_scalings functions were earlier independent function but are now methods of the Database object. Also I cleaned up the code and changed the namespace a bit here and there**

**All this is to say that the current version of the Database code may not work with the workflow in the other notebooks. If you encounter a notebook where this is the case, either try to modify it yourself or let me know**