First, run the following.
It will replace the cell with a template (shown in the next cell).  
You can modify the template in `$HOME/.chemfish/jupyter_template.txt`.

```python
from chemfish.jupyter import *
%chemfish
```

In [11]:
%matplotlib widget
from chemfish.jupyter import *
today = datetime(2020, 9, 9, 19, 54, 46)
quick = Quicks.pointgrey(as_of=today)

J.md('''
## <Name me!>
**Author: Douglas** (2020-09-09, Chemfish version 0.1.0)
''')



## <Name me!>
**Author: Douglas** (2020-09-09, Chemfish version 0.1.0)


Here's what's going on:

- `%matplotlib widget` is a good interactive matplotlib backend
- `from chemfish.jupyter import *` imports a bunch ChemFish code into the namespace
- `today = ...`  can be useful to reference as the date/time your data is tied to
- `quick = Quicks.new(as_of=today)` creates a `Quick` with common analyses for the most-recent data generation
- `J.md` just renders Markdown -- move the text to a new cell if preferred

As for the output:
- The `Severity key ...` line is just for reference -- there's nothing wrong

The two main entry points are `Lookups` and `quick`:

#### Lookups
`Lookups` contains static functions for searching the database.
They're only meant for lookups -- their output should never be referenced in code.
Each function, like `Lookups.projects`, can accept:
- a single ID or name (or another `UNIQUE` field), like `22` or `"testproject"`
- a varargs list of those (ex `22, "testproject"`)
- a varargs list of peewee expressions (see http://docs.peewee-orm.com/en/latest/);
  ex: `Users.first_name="John"` or `Experiments.created < datetime(2020, 1)`

#### quick
- `quick` contains functions for standard analyses and plots with sensible defaults.
  Each instance is tied to a specific datetime applied automatically to queries,
  and a data "generation" that you can ignore for modern data.
  The constructor for `Quick` has a LOT of arguments.


In [6]:
Lookups.projects(Projects.name % "%qc%")  # a LIKE expression

Unnamed: 0_level_0,name,description,creator,active,when_inserted
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
756,hardware :: misc qc,,douglas,False,2018-11-15 18:12:12
773,reference :: qc dose-response,,chris,True,2019-04-05 15:05:09
783,reference :: qc :: runs,QC single concentration test for paper,douglas,True,2019-07-29 13:36:36


In [7]:
Lookups.experiments(Projects.id == 783)

Unnamed: 0_level_0,name,description,creator,active,project,project_id,battery,template_plate,transfer_plate,when_inserted,n_runs,first_run,last_run,generations,saurons,configs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1578,reference :: qc-opt :: 2nd,Second set of plates of single computationally...,jack,False,reference :: qc :: runs,783,standard :: flames,soft_tap_consistency,,2019-04-23 13:24:14,13,2019-05-07 15:10:06,2019-06-19 17:34:49,(POINTGREY),(Thor),(90:20190307)
1670,reference :: qc-opt :: 1st,First set of 3 plates of single computationall...,jack,False,reference :: qc :: runs,783,standard :: flames,,,2019-07-29 13:38:03,3,2019-04-23 14:28:09,2019-04-23 15:50:58,(POINTGREY),(Thor),(90:20190307)
1774,reference :: qc :: blank,,douglas,True,reference :: qc :: runs,783,standard :: flames,untreated,,2019-09-23 14:04:50,2,2019-09-23 16:14:57,2019-09-23 16:46:16,(POINTGREY),(Thor),(113:20190923)


In [9]:
Lookups.runs(Experiments.id == 1670)

Unnamed: 0_level_0,name,tag,submission,description,plate,user_run,when_run,when_dosed,when_plated,acc_sec,...,inc_sec,config,sauron,exp,project,battery,length,fps,generation,when_inserted
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
7106,142809-r7106-uJT-S10-0-142809,20190423.142809.S10,3493d39eab5a,optisep with flames plate 1,6476,jack,2019-04-23 14:28:09,2019-04-23 13:15:00,2019-04-23 11:03:00,0 days 00:05:00,...,7920.0,90,Thor,reference :: qc-opt :: 1st,783,standard :: flames,1020000,100,pointgrey,2019-04-23 14:29:59
7107,151904-r7107-uJT-S10-0-151904,20190423.151904.S10,3ebbb658fa1a,optisep with flames plate 2,6477,jack,2019-04-23 15:19:04,2019-04-23 14:16:00,2019-04-23 12:59:00,0 days 00:05:00,...,4620.0,90,Thor,reference :: qc-opt :: 1st,783,standard :: flames,1020000,100,pointgrey,2019-04-23 15:26:44
7112,155058-r7112-uJT-S10-0-155058,20190423.155058.S10,f77eb325fa0b,optisep with flames plate 3,6482,jack,2019-04-23 15:50:58,2019-04-23 14:57:00,2019-04-23 13:34:00,0 days 00:08:00,...,4980.0,90,Thor,reference :: qc-opt :: 1st,783,standard :: flames,1020000,100,pointgrey,2019-04-23 18:56:11


#### Getting a WellFrame

We'll work with run 7112.
We can fetch feature and metadata for the run in the form of a `WellFrame`.  
The feature used is specified in `quick.feature`; by default it's "cd(10)[⌇]",
which is a timestamp-interpolated count of pixels that changed by an intensity of >= 10.

Use `quick.df` to get a `WellFrame`.

In [12]:
df = quick.df(7112)

Quick(cd(10)[⌇] @ 2020-09-09 19:54)

As with `Lookups`, you can use various arguments.
You can provide a list of run IDs or tags, for example.  
You can also provide free queries like `Experiments.id == 1670`.

Every downloaded run/feature combination will be cached locally.
These can be 100 MB each, so you can delete them periodically.  
This cache is stored under `~/.chemfish/cache`.
You can also use `quick.delete` to delete these data.

A `WellFrame` is actually a subclass of `pandas.DataFrame` using [typed-dfs](https://github.com/dmyersturnbull/typed-dfs).  
(More correctly, it's a subclass of `TypedDf`.)  

The metadata is stored in the index (it's a multi-index DataFrame),
and the features are stored in columns 0, 1, ... (using actual integers; i.e. `df[1]`).
A number of metadata columns are required for a valid WellFrame.

Unlike a normal DataFrame, you can access metadata columns (index names) via `[]`; ex: `df["name"]`
You can convert to a simple "untyped" DataFrame using `df.untyped()`.  
(It's very rarely needed, but you can get a pure DataFrame using `df.vanilla()`.)

Most regular DataFrame functions still work. Most that ordinarily return another `DataFrame` will now return
a new `WellFrame`.  
If you're doing an operation that won't return a proper `WellFrame`, you can call `.untyped()` first:
`df.untyped().applymap(my_breaking_function)`.

Note that calling `.reset_index()` will still return a `WellFrame`.
This is because calling `WellFrame.convert(df.reset_index()` (equivalently `WellFrame.of`) will automatically move those columns back into the index.

`WellFrame` has some useful functions like `slice_ms` to slice the (time-dependent) feature vector by milliseconds in the battery.

#### Plotting traces

Use `quick.traces` to plot traces.
It is a generator over `(name, Figure)` pairs, where `name` is from the corresponding column in the WellFrame.

In [None]:
quick.traces(df)

You can also put in run IDs or expressions, but often you'll need to keep the WellFrame for something else too.

You can iterate over the results and show each.
There's a function for convenience called `plt.show_all` that does just that.

In [None]:
plt.show_all(quick.traces(df))

You can loop over and save each. But for convenience, use `Figures.save(quick.traces(df), "mydir")`.

#### Namers

The `name` column is computed after downloading the data using `quick.well_namer`.
You can set it in the Quick constructor.

Each `WellNamer` is a function that maps the metadata columns to names.
You can build them using `WellNamer.builder`:

In [14]:
WellNamers.builder().column("control_type").text(" and ").treatments().build()

Namer( ⟦ ⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩control_type ¦ ‘ and ’ ¦ display(${name} [${id}]) 〛 @ 0x219d3ef6970)

These can choose whether to include a column in the name depending on whether there are multiple values,
and on the values of other columns.

But the predefined ones are probably sufficient.

In [15]:
WellNamers.elegant()

Namer( ⟦ ⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩control_type‘ ’ ¦ ⟨∄control_type⟩⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩variant_name‘ ’ ¦ ⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩age‘dpf ’ ¦ ⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩‘n=’n_fish‘ ’ ¦ ⟨`<class 'chemfish.core.tools.Tools'>`.truncate40(2)⟩‘{’well_group‘} ’/‘-’ ¦ ⟨∄control_type⟩display(${name} [${id}] (${dose})) 〛 @ 0x219d4c1ea60)

There are other kinds of namers, most notably `TreatmentNamer` and `CompoundNamer`.

A Treatment namer defines how to display a single treatment (a single drug at a single dose).
These use simple string patterns:


In [19]:
# the compound ID and a free-form dose with sensible units
print(TreatmentNamers.of("${id} @ ${dose}"))
# a name if found, otherwise ID; the micromolar dose with 3 decimal places
print(TreatmentNamers.of("${id|name} @ ${um.3}"))

display(${id} @ ${dose})
display(${id|name} @ ${um.3})


There's a long list of parameters. See `?StringTreatmentNamer` for more info.

The second example refers to a "name".
`CompoundNamer` comes in here.

The problem is that a compound can have multiple names.
(Multiple compounds can also share the same name, especially for enantiomers.)
It's not obvious computationally how to choose a good name from the database.

To do that, a `TieredCompoundNamer` prioritizes names by their references.
This list is controlled through a config file.
In our file, DrugBank "secondary IDs" come first -- these are typically generic drug names.  
Most compounds don't have such convenient names.
Eventually, the namer will just give up and use the compound ID rather than something weird like a 80-character IUPAC name.

You can use `CompoundNamers` to get or create a different method, though the default should be good for most cases.

In [21]:
quick = quick.using(well_namer=WellNamers.well(), compound_namer=CompoundNamers.chembl())
print(quick)

Quick(cd(10)[⌇] @ 2020-09-09 19:54)


### Other kinds of plots

Quick has well-by-well heatmaps: `quick.rheat` for raw and `quick.zheat` for control-subtraced ("z-score") blue-to-white-to-red heatmaps.

There's also `quick.smears` and `quick.zmears`.

Finally, see `quick.sensor_plots`, `quick.stimframes_plot`, `quick.durations`, and `quick.timeline`.

### Batteries

Let's:
- See what assays a battery has
- Get the full series of stimuli for the battery
- See precise descriptions of the assays
- See the original definition

In [None]:
# Pretty obvious -- just get some basic info
Lookups.batteries("standard :: flames")

In [None]:
# Get a pretty list of the assays
Lookups.assays("standard :: flames")

In [None]:
# Get a long dataframe of stimuli, with one row per millisecond
stimdf = quick.stim("standard :: flames")

# Plot the battery
quick.stim_plot("standard :: flames")

##### How the stimulus values are defined

This is advanced and skippable.  
Exactly what the values in `stimdf` mean depends on the stimulus:
- For LED stimuli, the value is the pulse-width modulation (PWM) from 0-255, which determines the brightness  
- For push-pull solenoid stimuli, the value reflects the power sent to the motors, also in PWM (0-255).  
  Values become nonzero when the motor turns on, but it takes time for the solenoid to contact the stage.  
  This might be a 5 ms delay if PWM=255, but 50 for PWM=100. For low PWM, the solenoid might never reach the stage  
  because it can't beat the spring's tension when it's far enough down. So, the solenoid's PWM, distance from the  
  stage, type, and age all affect this. (They're pretty bad stimuli.)
- For speaker/transducer stimuli, the value reflects the volume (0-255).  
  This, of course, is independent of the volume settings in the OS and the hardware setup.  
  There are two possible 'modes': If it's nonzero only for 1 ms (value), the full-length audio file is played once.  
  If it's nonzero for > 1 ms, the audio is duplicated and truncated as needed to make it play for that length.

##### "Insight"

How to know what's in a battery, other than just the plot

In [None]:
# Get an "app frame" (application-of-stimuli dataframe):
frames = quick.app("standard :: flames")
frames

In [None]:
# Now, get something
redleds = frames.by_stimulus("red LED")
redleds

In [None]:
# See what's in it
redleds.insight()

##### Template assays

You can also see what the original definition was for an assay.  
(This might not be defined for legacy assays.)


In [None]:
pd.DataFrame([
    pd.Series(row.get_data())
    for row in TemplateStimulusFrames
    .select(TemplateStimulusFrames, TemplateAssays, Assays)
    .join(TemplateAssays).join(Assays)
    .where(Assays.name == "xxxxx")
])

### Layouts

### t-SNE / PCA / UMAP / filtering

##### Quick-and-dirty way

In [None]:
transformed, figure = quick.transform(df, transform=WellTransforms.tsne(), path_stub="my_tsne")
# Outputs a plot, etc.

### Classification

##### Quick and dirty way

This will train a multiclass model to predict the labels.
It will then save the model and figures.

In [None]:

quick.classify(df, "my_output_dir")

# If you want to change hyperparameters:
#quick.classify(df, "my_output_dir", model_fn=WellClassifiers.forest(n_estimators=800))

##### A la carte way

In [None]:
# Make a type of classifier based on a random forest
# We're really just setting hyperparameters here
my_forest = WellClassifiers.forest(n_estimators=1000, n_jobs=2)

# Make a new forest of that type
forest = my_forest.build()

# Train it
# This will give nice output and check things
forest.train(df)

# Save it and metadata in a file alongside it
# The metadata includes hyperparameters and exactly which wells were used to train
forest.save("a_classifier.h5")


In [None]:

# Try testing on it
# It warns that we're testing on wells we already trained with
forest.test(df)


In [None]:
# Because it's a random forest, it has a training decision function
# This reports reports back the out-of-bag predictions on the training data
decision_function = forest.training_decision()
decision_function

In [None]:
# We can summarize it in terms of accuracy
acc = decision_function.accuracy()
acc

In [None]:
# We can get a confusion matrix from that decision function
confusion = decision_function.confusion()
confusion

In [None]:
# Apply confusion matrix ordering (CMO) to sort the labels to maximize block-diagonals
confusion = confusion.sort()
confusion

### Phenosearch

Finding similar phenotypes



##### The low-level way


In [None]:
# Get our query data
source = quick.df(7112).with_all_compound(55).mean(numeric_only=True)

# We DEFINITELY need to only include those with the same battery
my_stats = {}
for run in quick.query_runs(Projects.id==1, Batteries.name=="standard :: flames"):
    for well in Wells.select().where(Wells.run==run):
        target = WellFrameBuilder.well(well)
        # Compute my statistic here:
        my_stats[well.id] = calculate_statistic(source, target)

##### The better way

Use `HitSearch`

In [None]:
search = (
    HitSearch()
    .set_feature(FeatureTypes.cd_10_i)
    .set_primary_score(lambda arr, well: -np.abs(arr - prototypical).mean())
    .add_secondary_score("mean", lambda arr, well: np.mean(arr))
    .where(Experiments.id == 12)
    .set_save_every(10)
)
hits = search.search("query_results.csv")  # type: HitFrame

### Advanced caching

### Sensor data

### "Concerns"

### Advanced WellFrames


### Dose--response plots

### chemfish_rc

### Iterating case--control comparisons

### Videos

### Mandos

### Auto-analysis