
Here we would like to perform some analysis with the U19 pipeline.

First thing first, let's **import the U19 pipeline schemas as virtual module**, and a few other useful packages.

In [None]:
import datajoint as dj
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
lab = dj.create_virtual_module('lab', 'u19_lab') # the first argument here is the __name__ of the virtual module
task = dj.create_virtual_module('task', 'u19_task') 
subject = dj.create_virtual_module('subject', 'u19_subject')
action = dj.create_virtual_module('action', 'u19_action')
acquisition = dj.create_virtual_module('acquisition', 'u19_acquisition')
behavior = dj.create_virtual_module('behavior', 'u19_behavior')

## Analyzing existing data

**A simple example: compute the average performance across different blocks within a session**


Let's take a look at the schema behavior

In [None]:
dj.Diagram(behavior)

Take a look at the table `behavior.TowersBlock`

In [None]:
behavior.TowersBlock() # bracket necessary

There is a field called block_performance, of course we could do this in one line:

In [None]:
behavior.TowersSession.aggr(behavior.TowersBlock.proj('block_performance'), 
                            avg_performance='avg(block_performance)')

But now let's do it with a computed table, for fun :)

## Create your own schema and tables

The first thing we would like to do is to create a schema with `dj.schema`.  
**Note**: the schema name you create has to either start with your username, which is only accessible by you, Here we use our user_name  

In [None]:
schema = dj.schema('shans_tutorial')

Let's check if the new schema is there:

In [None]:
dj.list_schemas()

Now let's define a **Manual** table to save the result.  
A class created with DataJoint correponds to a table in the database.

In [None]:
@schema
class SessionPerformanceManual(dj.Manual):
    definition = """
    -> behavior.TowersSession         # each session have an average performance
    ---
    avg_performance:      float   # a final product in this table
    """

Let's take a look at the brand-new table we just created.

In [None]:
SessionPerformanceManual()

Yes, sure, it's empty. We haven't inserted anything into it.  
Now let's insert the firing rate we just computed into this empty table.  
We need to insert the entry with all fields defined in the table, usually in a format of dictionary.

Let's first compute the performance for one session, let's pick a session:

In [None]:
behavior.TowersSession.aggr(behavior.TowersBlock.proj(),
                            n_sessions='count(*)')

This session has 7 blocks:

In [None]:
key = {
    'subject_fullname': 'emanuele_B205',
    'session_date': datetime.date(2018, 7, 13),
    'session_number': 0
}

In [None]:
behavior.TowersBlock & key

In [None]:
performances = (behavior.TowersBlock & key).fetch('block_performance')

# create another field in the dictionary key
key['avg_performance'] = np.mean(performances)

In [None]:
key

Now insert it!

In [None]:
SessionPerformanceManual.insert1(key, skip_duplicates=True) # insert1 only works for one entry

Let's check the table again to see what happened:

In [None]:
SessionPerformanceManual()

Cool the entry is there!

So we can of course write a for loop to compute all avg performance and insert them one by one, but that's too slow. We can compute the results and insert them all at once! Now let's compare the two senarios.
1. insert one by one
2. insert all together

Let's pick two animals that have the same number of sessions.

In [None]:
subject.Subject.aggr(behavior.TowersSession.proj(), n_sessions='count(*)') & 'n_sessions=100'

Subjects hnieh_E57 and hnieh_E77 happen to have the same number of sessions.

In [None]:
# loop through sessions of subject B205 and insert one by one, and compute time
import time
start_time = time.time()

for i_session in (behavior.TowersSession & 'subject_fullname="hnieh_E57"').fetch('KEY'):
    performances = (behavior.TowersBlock & i_session).fetch('block_performance')
    # create another field in the dictionary key
    avg_performance = np.mean(performances)
    if np.isnan(avg_performance):
        continue
    entry = dict(**i_session, 
                 avg_performance=avg_performance)
    SessionPerformanceManual.insert1(entry)

print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
# loop through sessions of subject B3151 and insert all at once as a list of dictionaries!
start_time = time.time()

perf_entries = []
for i_session in (behavior.TowersSession & 'subject_fullname="hnieh_E77"').fetch('KEY'):
    performances = (behavior.TowersBlock & i_session).fetch('block_performance')
    # create another field in the dictionary key
    avg_performance = np.mean(performances)
    if np.isnan(avg_performance):
        continue
    entry = dict(**i_session, 
                 avg_performance=avg_performance)
    perf_entries.append(entry)
    
SessionPerformanceManual.insert(perf_entries)
print("--- %s seconds ---" % (time.time() - start_time))

In this way, we will need to remember which clusters has been computed and inserted. If we insert the same entry twice, there will be an error. For example, let's rerun the above cell. We can overcome that problem by add the argument `skip_duplicates=True` inside `.insert()` or `.insert1()`, but it is not a very elegant solution.  
The best approach here is to use a **Computed** table, it has the exact definition as the previous manual table, but with a magic **make** function

In [None]:
@schema
class SessionPerformanceComputed(dj.Computed):
    definition = """
    -> behavior.TowersSession         # each session have an average performance
    ---
    avg_performance:      float   # a final product in this table
    """
    
    key_source = acquisition.Session() & 'subject_fullname="hnieh_E57"' # bracket necessary
    def make(self, key): # key is one primary key of the entries in table acquisition.Session
        # fetch the performance for each block
        performances = (behavior.TowersBlock & key).fetch('block_performance')
        
        # create another field in the dictionary key
        key['avg_performance'] = np.mean(performances)
        self.insert1(key)

And we can `populate` the table.

In [None]:
SessionPerformanceComputed.populate(display_progress=True) 
# first argument could be some restrictor to control the populate

In [None]:
SessionPerformanceComputed()

**What does `populate` do?** 

It does two major things:  
1. From the table definition, get the keys that needs to computed, which we called `key_source`. By default, it would be the join result of the primary dependent tables minus the once has been computed.  
2. Call `make` function defined in the class, and compute one by one, with each individual key from the `key_source`

Here we still have to insert one by one, which is a bit slow. How do we do the trick of insert all firing rate of clusters in one session together?

We can change the `key_source` by redefining it to a larger scale

In [None]:
@schema
class SessionPerformanceComputedFromSubject(dj.Computed):
    definition = """
    -> behavior.TowersSession         # each session have an average performance
    ---
    avg_performance:      float   # a final product in this table
    """
    key_source = subject.Subject() 
    def make(self, key): # key is one primary key of the entries in table subject.Subject()!

        perf_entries = []
        for i_session in (behavior.TowersSession & key).fetch('KEY'):
            # fetch performance of each block
            performances = (behavior.TowersBlock & i_session).fetch('block_performance')
            # create another field in the dictionary key
            avg_performance = np.mean(performances)
            if np.isnan(avg_performance):
                continue
            entry = dict(**i_session, 
                         avg_performance=avg_performance)
            perf_entries.append(entry)
            
        self.insert(perf_entries)

In [None]:
SessionPerformanceComputedFromSubject.populate('subject_fullname="hnieh_E77"', display_progress=True, suppress_errors=True, reserve_jobs=True)

## Delete entries and drop a table

In [None]:
(SessionPerformanceManual & 'subject_fullname="hnieh_E78"').delete() # any restrictor would work here

In [None]:
SessionPerformanceManual.drop()

In [None]:
SessionPerformanceComputed.drop()

In [None]:
SessionPerformanceComputedFromSubject.drop()