# Insights Core Datasource Registry
This notebook shows the recommended method of writing datasources.

It also shows how to register datasources with Insights Core to provide alternative methods of collection while still taking advantage of our parser and combiner catalog.

It assumes familiarity with datasources from the *Standard Components* section of the *Insights Core Tutorial*.

In [1]:
import sys
sys.path.append("..")

In [2]:
from insights import run
from insights.core import dr
from insights.core.spec_factory import simple_file, simple_command

## Fixing datasource names
The simplest way to define a datasource is with a helper function from `insights.core.spec_factory`.

However, if you use one of these to define a datasource at the module level, you'll notice that it doesn't have a very useful name.

In [3]:
hosts = simple_file("/etc/hosts")
print dr.get_name(hosts)

insights.core.spec_factory.inner


We can fix that by including it in a subclass of `insights.core.spec_factory.SpecSet`.

**This is the recommended way of writing datasources and is required if you want to serialize and reconstitute them later.**

In [4]:
from insights.core.spec_factory import SpecSet

class MySpecs(SpecSet):
    hosts = simple_file("/etc/hosts")

print dr.get_name(MySpecs.hosts)

MySpecs.hosts


## Making datasources dynamic
What if you have datasources on which many downstream components depend, and you want to provide different ways of collecting the data they represent? Maybe you want to execute a command in one context, but you want to read from a file in another. Parsers depend on a single datasource, but jamming multiple collection methods into a single implementation isn't attractive.

Instead, you can define a subclass of `insights.core.spec_factory.SpecSet` that has `insights.core.spec_factory.RegistryPoint` instances instead of regular datasources. Then you can provide implementations for the registry points in the form of datasources that are members of subclasses of your original class. This keeps the alternative implementations cleanly separated while allowing parsers to depend on a single component.

In [5]:
from insights.core.spec_factory import RegistryPoint
from insights.core.context import ExecutionContext, HostContext

# We'll use HostContext and OtherContext as our alternatives.

class OtherContext(ExecutionContext):
    pass

In [6]:
# Define the components that your downstream components should depend on.

class TheSpecs(SpecSet):
    hostname = RegistryPoint()
    fstab = simple_file("/etc/fstab", context=HostContext)


# Provide different implementations for hostname by subclassing TheSpecs and
# giving the datasources names that match their corresponding registry points.

class HostSpecs(TheSpecs):
    hostname = simple_command("/usr/bin/hostname", context=HostContext)

class OtherSpecs(TheSpecs):
    hostname = simple_file("/etc/hostname", context=OtherContext)
    
# Note that we don't and actually can't provide an alternative for TheSpecs.fstab
# since it's not a RegistryPoint.

Downstream components should depend on `TheSpecs.hostname`, and the implementation that actually runs and backs that component will depend on the context in which you run.

In [7]:
results = run(TheSpecs.hostname, run_context=HostContext)
print results[TheSpecs.hostname]

CommandOutputProvider("/usr/bin/hostname")


In [8]:
results = run(TheSpecs.hostname, run_context=OtherContext)
print results[TheSpecs.hostname]

TextFileProvider("/etc/hostname")


In [9]:
results = run(TheSpecs.fstab, run_context=HostContext)
print results[TheSpecs.fstab]

TextFileProvider("/etc/fstab")


`RegistryPoint` instances in `SpecSet` subclasses are converted to special datasources that simply check their dependencies and return the **last** one that succeeds. So, `TheSpecs.hostname` is just a datasource. When `HostSpecs` subclasses `TheSpecs`, the class machinery recognizes that `HostSpecs.hostname` is callable and named the same as a `RegistryPoint` in an immediate super class. When that happens, the datasource of the subclass is added as a dependency of the datasource in the superclass.

If the datasources in each subclass depend on different contexts, only one of them will fire. That's why when we ran with `HostContext`, the command was run, but when we ran with `OtherContext`, the file was collected.

Notice that the `TheSpecs.fstab` datasource can be run, too. If a subclass had provided one of the same name, it would not have been registered with the super class but would instead have stayed local to that subclass.

Note also that the datasources in the alternative implementation classes aren't special in any other way. You can run them directly, too, and components can depend on them if you want.

What happens if you have multiple subclass implementations for a given registry point, and more than one of them depends on the same context? All of the implementations for the context are evaluated, but only the last one that was loaded is used. Notice that import order matters! **However, there should be only one way to collect data for a given context anyway: that's part of the definition of a context.**

## Registering implementations for standard datasources
**This use case should be very rare. In all cases but the simplest, you should know exactly how the registry system works before attempting it.**

With that said, providing alternative implementations for the standard Insights Core datasources is easy if you don't also need some of the default implementations. The datasources on which the core parsers depend are all defined as `RegistryPoint`s on the `Specs` class in `insights.specs`.

In [10]:
from insights.specs import Specs

class UseThisInstead(Specs):
    hostname = simple_file("/etc/hostname", context=OtherContext)
    
results = run(Specs.hostname, run_context=OtherContext)
print results[Specs.hostname]
print results.get(Specs.hosts)

TextFileProvider("/etc/hostname")
None


Notice that `Specs.hosts` didn't run! That's because we haven't loaded the module containing the default implementations, and we've only provided an implementation for `Specs.hostname`. Also, none of the defaults depend on `OtherContext` anyway.

**Taking advantage of some default implementations while overriding others is tricky.** A good rule of thumb if you want to provide an implementation for one datasource is to provide one for them all. If you're providing command implementations, you should be providing one for all of them anyway. The way a command is formatted and executed is directly related to the `ExecutionContext` on which it depends after all.

What if you want to provide alternative implementations for all of the command datasources, but you'd like to reuse the file collection datasources just with a different file system root? You can do that by decorating your context class with `insights.core.context.fs_root`.

You also need to ensure that the default datasource implementations have been loaded.

In [11]:
from insights.core.context import fs_root
import insights.specs_default  # load the default implementations

@fs_root
class MyOtherContext(ExecutionContext):
    pass

class MoreSpecs(Specs):
    hostname = simple_file("/etc/hostname", context=MyOtherContext)

results = run(Specs.hostname, run_context=MyOtherContext)
print results[Specs.hostname]

results = run(Specs.hosts, run_context=MyOtherContext)
print results[Specs.hosts]

TextFileProvider("/etc/hostname")
TextFileProvider("/etc/hosts")


If you use an `@fs_root` decorated context, you shouldn't provide implementations for any default file collection datasources in your subclass. If you do, *both* the default implementation and your implementation will run, even though downstream components will only see the last one that ran successfully.

It's for a similar reason that any command you override should depend on a context that's different from the default implementation, and since you'll need a different context to fire your alternative implementation, you can't run any of the other default command implementations and will need to override them, too.

Again, overriding only some of the datasources is tricky. If you need to override one, you should probably just override everything you're interested in.