# Sequential learning with CAMD

CAMD is a package designed to assist materials science researchers with *sequential learning*,
which we define as an iterative process of experimentation that improves knowledge or strategy with each iteration.

## Agents

In CAMD, Hypothesis *Agents* are python objects which select candidates on which to perform experiments.  Almost all of the "AI" components, including ML algorithms, simpler regression, and even random selection, within CAMD are contained in logic implemented within Agents.  


To implement a CAMD-compatible Agent, we use the *HypothesisAgent* abstract class, which basically will issue an error if we don't fulfill all of the things we need to in order to ensure that our Agent is compatible with the sequential learning process implemented in a CAMD *Campaign* (more on Campaigns later).

In [5]:
from camd.agent.base import HypothesisAgent

In [11]:
class LinearAgent(HypothesisAgent):
    pass

linear_agent = LinearRegressionAgent()

TypeError: Can't instantiate abstract class LinearRegressionAgent with abstract methods get_hypotheses

### Exercise - Perform a 2-parameter linear regression
How do its selections from our dataset differ?

In [3]:
### Implement agent here

In [4]:
### Test agent here

## Experiments

In CAMD, *Experiments* are objects that are used to generate new data corresponding to the output of the *Agent.get_hypotheses* method.  In other words, *Agents* pick the candidates on which you want to do experiments, and *Experiments* actually do those experiments.  As of today, only two experiments are implemented in CAMD, one of which is a AWS-based density functional theory computation of an input crystal structure.  The other, which we'll demonstrate below, is an *after-the-fact sampler*, which basically fetches the result of an experiment we already did that corresponds to the input.

Why is the ATFSampler useful?  We'll discuss simulation in more detail in a bit, but let's just say we use the ATFSampler to help us evaluate the performance of an Agent when we're trying to pick which agent is the best!

In [5]:
from camd.experiment.base import ATFSampler

In [6]:
experiment = ATFSampler(dataframe=dataframe)

NameError: name 'dataframe' is not defined

In [7]:
experiment.blahblahblah

NameError: name 'experiment' is not defined

## Analyzers

**Analyzers** are a bit tricky to explain because they're not necessary for every sequential learning process.  We're not going to spend much time on them here other than to say that, after you've performed an experiment, sometimes you want to postprocess the data in order to summarize the results of the current iteration and to augment the **seed data** which is being used to provide the **Agent** with the information it needs to make its next decision on which candidates to select for further experiments.

In [8]:
from hackathon.helper import SimpleAnalyzer

ImportError: No module named 'hackathon'

## Data, Campaigns, and Simulations

## Final thoughts

## Glossary
* **Agent** - decision making object in camd, must implement `get_hypotheses` in order to work properly in the loop
* **Experiment** - object which performs some action in order to determine unknowns about an input dataset
* **Analyzer** - object which postprocesses experimental outputs and prior seed data in order to provide a new seed data
* **seed_data** - Data which is "known" either before the start of a given **Campaign** or prior to any iteration.  Is used to inform the **Agent** of the data it should be using to make a decision about how to select from the **Candidate data**.
* **candidate_data** - data which represents the information about the set of "unknowns" at a given point of time for a **Campaign**.
* **Campaign** - the iterative procedure by which an **Agent** suggests experiments from the **candidate data**, the **Experiment** performs them, the **Analyzer** analyzes them and feeds a new **seed data** and set of **candidate data** back to the **Agent** to start a new iteration. 