# VISOR admin programming manual and recipes
----
###  michael st. clair 
### v0.3a -- 2025-02-21

This Jupyter Notebook discusses the structure of the VISOR backend and methods for modifying its contents using Python scripts. The code cells in this notebook
are intended both as illustrative examples and as useful 'recipes' that
can be minimally modified and incorporated into other notebooks or scripts to perform
administrative tasks.

*This is a preliminary version of this document intended for internal
operations. Please do not publicly distribute.*

### usage notes
----
* This notebook should always be launched using ```python manage.py shell_plus --notebook```. Otherwise, django won't get to run its setup scripts and you will get errors when you try to import models or alter the database. 
* many of these cells create huge amounts of output. to shrink this if you're tired of looking at it, double-click on the gutter to the left of the cell. to totally erase it, go up to the 'cell' menu and choose 'current outputs -> clear', or 'all outputs' -> clear to get rid of output for every cell.
* this notebook assumes that it's being run in the root directory of the application. If you move it somewhere else or use the code snippets in files located in other places, you may have to adjust paths.

### imports
----
Run this next cell if you want the code to function.

In [8]:
from ast import literal_eval
from functools import partial, reduce
import json
from operator import or_
import os
import random
import re
import math

import django
from django.conf import settings
from django import forms
import numpy as np
import pandas as pd

django.setup()

from recipes import samples
from visor.dj_utils import are_in, djget, eta, fields
from visor.io.handlers import ingest_sample_csv
from visor.models import Database, Library, Sample, SampleType
from visor.spectral import make_filterset

print("Success")

# the examples in this notebook don't do risky async stuff to the database.
# they all work the exact same way on the backend as the admin console. however, 
# ipython/jupyter wraps itself in an event loop that looks scary to django. this 
# environment variable tells django to calm down about it.
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

Success


# I: database structure
----

## I.1: basics and location

VISOR is backed by a SQLite database. This database is entirely contained
in one file: db.sqlite3. **Keep several backups of this file outside of the
working tree of the application.** This lets you freely experiment with the
database. If you do anything horrible to it, you can immediately repair it by
overwriting the file in the application directory with one of these backups. The only entry-specific items that are not stored in this database are image files (links to the images are stored, but not files themselves, because filesystems are better than databases at storing files).

### II.1.a: working live or offline

Because it's easy to copy the database file, you can also work in a separate development directory, or on a totally different machine with a version of the software running locally. On the other hand, because it's easy to fix mistakes, it's also pretty safe to work on the live version. You can even swap the database file out while the application is running. Users will only notice if they make queries while the file is in the middle of being overwritten. 

## I.2: django and models

VISOR primarily uses the Python framework Django to interact with the
database. Django calls SQL tables -- which are essentially big spreadsheets
stored inside the database -- "models".
There are five important models / tables / spreadsheets in the VISOR proper:

* ```Sample``` (individual samples)
* ```Database``` (origin databases, like ASTER/ECOSTRESS)
* ```FilterSet``` (definitions for sets of filters, like Mastcam-Z's, used
    for generating simulated reflectance curves)
* ```Library``` (application- or team-specific groups of samples, and maybe
    other things later -- this is fully functional, but not currently populated)
* ```SampleType``` (top-level physical categories of sample, like minerals or
    coatings -- this is again fully functional, but only skeletally populated)

*Note: while you probably don't want to interact with them from the Python
shell, admin tables, including users and their access information, are also
stored in the database. So, for instance, if you roll back to an earlier
version of the database after changing a user's password but before making a
new database backup, that password will be reset to the earlier version.*

# II. searching the database
----

## II.1: searching the database

VISOR includes some helper functions that make Django easier to use. By default, Django uses "querysets" to interact with tables. Querysets are powerful, but use a custom syntax that combines SQL queries and Python. This syntax is sometimes awkward and rarely looks like idiomatic Python. This section shows how to define search functions that are quicker to learn and use.

### II.1.a: custom search functions

The next cell defines a simple search function ```samples``` that looks for
samples that contain a particular value anywhere in a particular field, case-insensitive.
 
*Note: ```samples``` is also included in the recipes.py module, but
manipulating the code in the cell below will allow you to define different versions of
it.*

The syntax is simply: ```samples(value, field)```; it returns a ```QuerySet```
(which can mostly be treated as a Python list) of all samples containing that value in that field.

Other useful values for ```querytype``` in cousins of ```samples``` include
'lt' or 'gt' (less/greater than) or 'iexact' (exact match). dropping the
leading 'i' makes the search case-sensitive.

The names VISOR prints for samples in Python shell / notebook are formatted like this:

```sample name + _ + sample id (in database of origin) + _ + database-of-origin short name```

In [None]:
# define partially-evaluated convenience function
get_contains = partial(
    djget, 
    model=Sample, 
    value = "",
    # the field value is model-specific! you can omit it if you don't want to
    # use the shortened call types discussed in II.1.c
    field = "sample_name", 
    querytype='icontains'
)
# reorder arguments to prevent collisions
samples = eta(get_contains, "value", "field")

### II.1.b: fetch all hematites in the database and look at 5 of them

In [None]:
# note: the vanilla django equivalent to the next line is: 
# hematites = Sample.objects.filter(sample_name__icontains='hematite')
hematites = samples('hematite', 'sample_name')
random.choices(hematites, k=5)

### II.1.c: check total number of samples, or of a subset 

In [None]:
# samples() looks in the sample_name field by default.
# called with no arguments, it returns all values in the model.
# note: the vanilla django way to do that is to call Sample.objects.all().

len(samples("smectite")), len(samples()), len(Sample.objects.all())

## II.2: fields and values of models

There are lots of ways to get fields and field values from the database. See the next few cells for some examples.

### II.2.a: get every field of a model

In [None]:
fields(Sample), fields(Database)

### II.2.b: get values of a particular field from instances of a model

In [None]:
random_sample = random.choice(samples())
print(random_sample.sample_name)
smectites = samples("smectite")
print([
    sample.id for sample in smectites
])

### II.2.c: get unique values of a field, ordered alphabetically

In [None]:
names = [
    name_list[0] for name_list in
    set(samples().values_list('grain_size'))
]
names.sort()
names

## II.3 related model fields

Accessing fields from a different model is done by, depending on context:
* using chained dots (like: ```model.other_model.field```)
* separating the related field name and the field you want from the
    other model by a double underscore (like: ```sc(value, "other_model__field")```)

### II.3.a: learn about a sample's database of origin

In [None]:
random_sample = random.choice(samples())
print(random_sample.origin.name) # full name of that sample's database of origin
print(random_sample.origin.url) # url for that sample's database of origin
# is that sample in the group of all samples whose databases of origin have that 
# full name? (hopefully yes, or something is very wrong) 
print(random_sample in samples(random_sample.origin.name, "origin__name"))

## II.4 interpreting Sample model fields

There are a *lot* of fields on the ```Sample``` model, and most of them are 
empty for most samples. This is because the table is intended to support
content ingested from a bunch of different databases,  each of which has its
own metadata standard. So, for instance, while we'd like to retain information
about resolution if it's available in an input database, most of our input
databases don't provide
resolution values in their metadata. The fields you can expect to be on every
or almost every sample are:
* sample_name (Name of the sample from the original database, like "Talc")
* sample_id (ID of the sample from the original database, retained for
traceability)
* id (unique ID number in VISOR, also known as a database primary key or PK)
    * bear in mind that because a primary key is how a database distinguishes
    objects, changing a sample's id field makes it a whole new entry
* date_added (last modification date of the sample)
* min_reflectance (minimum wavelength in the reflectance array)
* max_reflectance (maximum wavelength in the reflectance array)
* origin (database of origin -- this is an instance of the ```Database```
    model)
* released (has the sample been released to the public?)
* reflectance (reflectance array flattened into a simple string) 
* simulated_spectra (dictionary of ```pandas DataFrames``` giving simulated
    reflectance arrays flattened into a json string)

### II.4.a: get a random sample and look at all its fields

You can use the ```as_dict()``` method of a ```Sample``` object to get most
things about it in a ```dict``` -- note that the flattened reflectance and
simulated_spectra fields aren't very readable! See the next few cells for
 ways to interpret them as ```numpy``` arrays and ```pandas``` dataframes.

In [None]:
random_sample = random.choice(samples())
random_sample.as_dict()

### II.4.b: look at properties of that sample's reflectance data

In [None]:
reflectance = np.array(literal_eval(random_sample.reflectance))
print(reflectance[0:10, 0]) # first 10 wavelength values of spectrum
print(np.median(reflectance,0))  # median wavelength and reflectance of spectrum 
print(reflectance[:,1].mean()) # mean reflectance of spectrum

### II.4.c: look at a simulated spectrum for that sample

In [None]:
sim_zcam = pd.DataFrame(
    json.loads(literal_eval(random_sample.simulated_spectra)['Mastcam-Z'])
)
sim_zcam # dataframe containing simulated values for Mastcam-Z

# III: manipulating database entries
----
## III.1: field assignment and model entry updates

Similar methods can be used to modify entries in the database. The easiest way
is to assign values directly to fields of a model instance (like an individual sample). This is useful if
you want to quickly modify items without using the admin console. The single exception to this in 
**Important:** updating the fields of a model instance in memory **does not**
automatically change it in the database. After
modifying a model instance, calling its ```clean()``` and ```save()``` methods
will validate its changed data and record the updated version in the database.
Some other stuff only happens after you call ```save()```, generally things that require comparisons with other values in the database. For instance,
simulated spectra are generated at that point for samples, and if a model
instance doesn't have an id / primary key, it gets assigned one.

### III.1.a: change an in-memory sample without saving it

In [None]:
sample = samples()[0]
sample.sample_name = sample.sample_name + "_TEST"
print(sample.sample_name) # great! working great, right? the sample is updated!
sample = samples()[0]
print(sample.sample_name) # aww...no, the sample wasn't updated.

### III.1.b: make a test version of a sample and save it in the database

In [None]:
sample = samples()[0]
# remember that changing id / primary key makes something a "new" object from the 
# database's perspective; delete id so we don't overwrite the real sample
sample.id = None
# change its sample_name and sample_id fields to distinguish it
sample.sample_name = sample.sample_name + "_TEST"
sample.sample_id = sample.sample_id + "_TEST" 
sample.released = False # don't show visitors our silly test sample
sample.clean() # validate sample fields
sample.save() # save it in the database
# is it there, and different from the original? hopefully.
samples()[0], samples(sample.id, "id")[0] 

### III.2.c: modify and save the test sample

In [None]:
# saving a model instance _without_ changing its id modifies the existing entry
# rather than creating a new one.
test_sample = samples(sample.id, "id")[0]
print(test_sample.sample_name)
test_sample.sample_name = "Terrible Rock"
test_sample.clean()
test_sample.save()
print(test_sample.sample_name)

## III.2: deleting model instances

You can delete a database entry simply by calling its ```delete()``` method.
Note that if other entries link to it -- for instance, a database of origin that is listed in many samples -- you won't be able to delete it while those other entries
still exist in the database.

### III.2.a: delete test sample

In [None]:
# Demo:
# it's probably better if we don't keep this Terrible Rock in the database 
# (see preceding section if you didn't make a Terrible Rock.)

terrible_rock = samples("Terrible Rock", "sample_name")[0]
terrible_rock.delete()
samples("Terrible Rock", "sample_name")

In [11]:
# Delete individual file from Spectrum ID
bad_file = samples("Pct_1970_3_whitem_gar", "sample_id")[0]
bad_file.delete()
samples("Pct_1970_3_whitem_gar", "sample_id")

<QuerySet []>

## III.3: bulk modification

These techniques can be combined with standard Python control structures to
change many items at once. Most of these examples are 'disarmed', with their
```save``` or ```delete``` calls commented out. **Make sure you back the
database up first if you arm and run them!** Running these without
saving samples but leaving ```print()``` statements in acts as a 'dry run', and is
very useful to verify that your changes are good before you commit them.


### III.3.a: reprocess every sample in the database

You might want to do this if you need to recalculate simulated spectra values
because you've added new filtersets, or if you suspect that some malformed
entries snuck in to the database and you'd like to reprocess entries one-by-one
to find them.

*Note: At current database size, assuming everything processes cleanly, this
will probably take between half an hour and an hour and a half depending on
operating environment (primarily single-core speed and secondarily
disk throughput). You might want to add a progress timer or something.
Also, this one is mostly harmless -- if everything
is ok with a sample, it will just save it back to the database unchanged.*

In [None]:
for ix, sample in enumerate(samples):
    # good to know in case it hits something bad and crashes -- 
    # you have the name, id, and index (list position) of the sample to investigate
    print(ix, sample.sample_name, sample.id) 
    sample.clean()
    sample.save()

### III.3.b: find all samples without a sample name and assign placeholders

Let's say some samples don't have a sample name, either because of an accidental omission in the source database or an unusual metadata convention that wasn't caught when scraping / importing from that database. Let's look at all those samples and assign sample names from their composition values. Also, let's check what databases they're from so that we can diagnose that problem.

Using the `bulk_update` function makes this much much faster -- but note that when you do that, it only modifies
metadata, so if something else about the sample is mangled, it's better to call `save`.

In [None]:
unnamed = []
for sample in samples("", querytype="iexact"):
    if sample.composition:
        sample.sample_name = sample.composition
        print(sample.sample_name + " from " + sample.origin.name)
        unnamed.append(sample)
Sample.objects.bulk_update(unnamed, ['sample_name'])

### III.3.c: standardize unit names across the database

The are some samples in the database that give grain size in micrometers as 'um' and some that give it as 'microns'. Also, some samples have spaces between SI unit abbreviations and numerals, and some don't. Let's say you'd like to regularize this to always use 'um' for micrometers and also not have spaces between numerals and abbreviations. This replacement may be too crude, so we include print statements to see if it's good or not.

In [None]:
has_si_units = are_in(["cm", "mm", "nm", "um"], or_)
standardized = []
for sample in samples():
    if not sample.grain_size:
        continue # don't bother doing anything if there's no grain size metadata
    
    replaced = sample.grain_size.replace("microns", "um")
    # we don't want to remove spaces in phrases that don't contain si units
    if has_si_units(sample.grain_size):
        replaced = replaced.strip().replace(" ", "")
    # nothing happened, move on
    if sample.grain_size == replaced:
        continue
    print(f"original: {sample.grain_size}; reformatted: {replaced}")
    sample.grain_size = replaced
    standardized.append(replaced)
# if you're happy with everything:
# Sample.objects.bulk_update(standardized, ['grain_size'])

### III.3.d: mark every sample from a particular origin as released

By default, VISOR treats new samples as private -- specifically, their "released" field is set to ```False```, and only users logged in as admins can view them. You might use a command like the following when you're done QAing all the samples from a new source and you'd like to release them all to the public -- or if you've reingested all the samples for some reason and immediately want to mark them as released.

In [7]:
release_samples = samples("RELAB", "origin__name")
unreleased = []
for sample in release_samples:
    sample.released = True
    unreleased.append(sample)
Sample.objects.bulk_update(unreleased, ['released'])

32881

### III.3.e: assign all samples listed in an external file to a custom library 
You've received suggestions, given as sample IDs, to add to a custom library, and you've compiled all of those suggestions into a text file with one sample on each line. This assigns every sample in that file to a custom library. The Library model is different from other models because it has a "many-to-many" relationship with samples. The method for associating a sample and a library is therefore a little different.

In [None]:
# generate the new library
test_library = Library(name='test library')
test_library.clean()
test_library.save()

# read file in and split it line-by-line into a list
with open("tests/custom_ids.txt") as library_entry_file:
    library_ids = library_entry_file.read().splitlines()
# get all the samples matching these ids and squeeze them into a single queryset 
library_samples = reduce(or_, [
    samples(library_id, 'sample_id') for library_id in library_ids
])

# because a single sample can belong to many libraries, you can't add a sample
# to a library through direct assignment. instead use the add method of sample.libraries:
for sample in library_samples:
    sample.libraries.add(test_library)

# you can check samples in a library using the Library.sample_set.all method:

print(test_library.sample_set.all())

# clean up
for sample in library_samples:
    sample.libraries.remove(test_library)
    sample.clean()
    sample.save(convolve=False)

test_library.delete()

### III.3.f: add an image to a sample

You can assign either a path to a JPEG file or in-memory image data (as a PIL.Image) object to a sample's 'image' field. When you save it, it gets moved to the VISOR image directory, thumbnailed, and linked to the database entry.

In [None]:
# make a test version of a sample
sample = samples()[0]
sample.id = 100000000000
sample.sample_id = "TEST"
sample.image = 'tests/test_rock.jpg'
sample.clean()
sample.save()
print(sample.image)
sample.get_image() # displays image in jupyter. clunky but fine. 

In [None]:
# clean this test sample up 
# Note / TODO: we don't currently delete images along with samples. this is a 
# way to do so, but I'm probably going to add some sort of automatic cleanup after 
# we're more certain about how we're going to use images in the application. --michael

# delete image
os.remove(settings.SAMPLE_IMAGE_PATH + "/" + sample.image)
# delete sample database entry
sample.delete()

### III.3.g: update field

Currently set to update all objects from RELAB && Unknown grain size
Updates grain size to "Whole Object" if the words chip, slab, or rock are present in the sample name

In [2]:
print("Start")

samples = Sample.objects.filter(origin__name="RELAB").iterator()

updated_count = 0

for sample in samples:
    updated = False
    for field in Sample._meta.get_fields():
        if hasattr(sample, field.name) and not field.many_to_many and not field.one_to_many:
            value = getattr(sample, field.name)
            if value is None:
                continue
            if isinstance(value, str) and value.lower() == "nan":
                setattr(sample, field.name, "")
                updated = True
            elif isinstance(value, float) and math.isnan(value):
                setattr(sample, field.name, "")
                updated = True
    if updated:
        sample.save()
        updated_count += 1
        if updated_count % 100 == 0:
            print(f"Updated {updated_count} samples")

print(f"Total updated: {updated_count}")
print("Complete")


In [4]:
print("Start")

# List of keywords to check for in name field (case insensitive)
keywords = ["chip", "slab", "rock"]

# Fetch all samples where grain_size is unknown
samples = Sample.objects.filter(origin__name="RELAB", grain_size="Unknown")

updated_count = 0

for sample in samples:
    name_lower = sample.sample_name.lower()
    
    if any(keyword in name_lower for keyword in keywords):
        sample.grain_size = "Whole Object"
        sample.save()
        updated_count += 1

print(f"Updated {updated_count} samples.")
print("Complete")


Start
Updated 2154 samples.
Complete


In [3]:
print("Start")

# Fetch all samples from RELAB, clean up values that have grain size of 0
samples = Sample.objects.filter(origin__name="RELAB")

updated_count = 0

for sample in samples:
    if not sample.grain_size or sample.grain_size.strip() == "":
        sample.grain_size = "unknown"
    else:
        cleaned = sample.grain_size.strip("()")
        parts = cleaned.split("_ ")
        try:
            if len(parts) == 2:
                min_size, max_size = map(float, parts)
                if min_size == 0.0 and max_size == 0.0:
                    sample.grain_size = "unknown"
                else:
                    sample.grain_size = f"({min_size}_ {max_size})"
            elif len(parts) == 1:
                # Single value — just leave it as-is
                value = float(parts[0])
                if value == 0.0:
                    sample.grain_size = "unknown"
                else:
                    sample.grain_size = str(value)
            else:
                sample.grain_size = "unknown"   
            
        except ValueError:
            sample.grain_size = "unknown"
        
    sample.clean()
    sample.save()
    updated_count += 1

print(f"Updated {updated_count} samples.")
print("Complete")


Start
Updated 32881 samples.
Complete


# IV: adding new data
----
## IV.1: importing sample files
Sample data can be imported using the upload interface in the application, or pasted in / modified using the admin console. However, the same underlying functions can also be called from admin console or notebook. Also, there are some functions that can *only* be accessed from admin console / shell / notebook. In particular, there are 'safety' features in the upload interface that won't allow you to re-upload samples with identical sample_id, which means that you can't *update* a sample using the upload interface. (You can always activate these protections by passing ```uploaded=True``` to ```Sample.save()```.) 

### IV.1.a: import a single sample file
```ingest_sample_csv``` is the primary function used to import files into the database. Passing it the name of a file in the Western Mars Lab spectrum CSV format will return a ```dict``` containing a Sample instance that can then be saved in the database, the filename of the ingested file, and warnings and errors if applicable. Also remember that, by default, every sample is imported 'unreleased', only visible to users with admin permissions. Set a sample's 'released' field to ```True``` to make it immediately visible.

'warnings' mostly includes things the ingestion function did that changed some 
values in the imported sample.

If there's anything in 'errors', the input file isn't valid and needs to be altered
in order to go in the database. Lots of special cases are covered in the ingestion code and it should give useful error messages for many different sorts of problems. 

*Note: Please let me know if there's another case you need verbose feedback about. --michael*

*Note / TODO: we don't have an updated version of the format standard description yet, but the format remains similar, so many examples are available*


In [None]:
# this is a good sample file

ingest_dict = ingest_sample_csv(
    'tests/single_column_test.csv'
)
# 'filename' and 'warning' are also placed in the sample.filename and 
# sample.import_notes fields respectively. they are used internally,
# but are available in the return because it can be useful to 
# print or manipulate them separately, mostly for error-checking purposes. 
print(ingest_dict)
sample = ingest_dict['sample']
sample.clean()
sample.save()
# test re-save with upload / anti-dupe protections
try:
    sample.save(uploaded=True)
except ValueError as dupe_error:
    print(dupe_error)
print(samples("TEST", "sample_name"))
# delete this test sample
# sample.delete()

In [None]:
# these are bad sample files
bad_ingest_dict_1 = ingest_sample_csv(
    'tests/single_column_test_error_1.csv'
)
bad_ingest_dict_2 = ingest_sample_csv(
    'tests/single_column_test_error_2.csv'
)
bad_ingest_dict_1['errors'], bad_ingest_dict_2['errors']

### IV.1.b import a "multisample" file
```ingest_sample_csv``` can also ingest files containing multiple wavelength / reflectance columns. It splits these into a list of Sample objects and increments their Sample IDs to distinguish the columns from one another.

*Note / TODO: I don't know much about how the Western Mars Lab uses this format internally, so I don't know how to contextualize it.*

In [6]:
multisample = ingest_sample_csv('../TANAGER-Ingest/inputs/2024_04_09_Brad_Dunites1.csv')
print(multisample['warnings'])
print(multisample)
for sample in multisample['sample']:
    print(sample.sample_id)
    sample.clean()
    sample.save()
    sample.sample_type.add("Rock") # Just for this example

[]


TypeError: 'NoneType' object is not iterable

### IV.1.c: import all CSV files in a directory
```ingest_sample_csv``` is happy to be used inside Python control structures. This can be used as an alternate way to ingest samples in bulk. *Note that this following cell doesn't handle multisamples, but can easily be extended to do so.*

In [9]:
#csv_directory = '../Relab-Ingest/output/'
base_directory = '../TANAGER-Ingest/output/' # Use output folder
print("Starting")
# print(len(os.listdir(base_directory)))

subdirectories = [d for d in os.listdir(base_directory) 
                 if os.path.isdir(os.path.join(base_directory, d))]

with open("ingest_errors.log", "w") as log_file:
    for subfolder in subdirectories:
        csv_directory = os.path.join(base_directory, subfolder) + '/'
        
        csv_files = [f for f in os.listdir(csv_directory) if f.endswith('.csv')]
        
        if not csv_files:
            continue
    
        for ix, file in enumerate(csv_files):
            if not file.endswith('.csv'):
                continue    

            ingest_dict = ingest_sample_csv(csv_directory + file)

            sample = ingest_dict["sample"]
            print(ingest_dict)

            if sample is None:
                log_file.write(f"{ix}: {file} - Skipped, no sample object created.\n")
                print("Error: sample = none")
                continue

            try:
                sample.clean()
                sample.save()
            except forms.ValidationError as ve:
                log_file.write(f"{ix}: {file} - ValidationError: {ve}\n")
                print("Error: validation error")
                continue
            except IndexError as ie:
                log_file.write(f"{ix}: {file} - IndexError: {ie}\n")
                print("Error: index error")
                continue

            if sample.material_class:
                sample_type_obj, created = SampleType.objects.get_or_create(name=sample.material_class)
                sample.sample_type.add(sample_type_obj)

    #         print(sample.id, ix, file)
print("finished")

Starting
Error: validation error
Error: validation error


























Error: validation error














finished


## IV.2: filtersets
### IV.2.a: creating a new filterset

VISOR does not offer a fully automated method for ingesting a new
```FilterSet```. This is because response curves for different instruments
are given in a wide variety of formats, and we do not have a standardized
format for representing them. *Note / TODO: we could develop one, though!
--michael* However, it offers a convenience function, ```make_filterset```, designed to help the process.

In [None]:
# start by defining a pandas dataframe with two columns:
# filter name and canonical center wavelength.
# in this case, we have high-resolution response curves for the instrument --
# these center wavelengths are simply the points at which we 
# convolve the instrument response curve with the lab spectra.

# note that we're following the MERTools convention here --
# scale to the left eye, don't duplicate filters with centers
# within 5nm.
# also note that we don't currently have curves for L7/R7, but this doesn't
# particularly matter because we're probably not going to be looking at
# geological features through them.
ZCAM_FREQS = pd.DataFrame(
    {
        "filter": [
            "L0R", "L0G", "L0B", "L1",
            "L2", "L3", "L4", "L5",
            "L6", "R2", "R3",
            "R4", "R5", "R6",
        ],
        "wavelength": [
            630, 544, 480, 800,
            754, 677, 605, 528,
            442, 866, 910, 
            939, 978, 1022
        ],
    }
)
# 5nm-spaced wavelength bins for numerical integration, going
# from the bottom to the top of the wavelength ranges referenced
# in the ZCAM filter response files
ZCAM_BINS = np.arange(300,1105,5)
# where are we storing the filter response curve files?
ZCAM_FILTER_PATH = 'filters/mastcam_z/'

In [None]:
# read filter definition files into a dictionary of pandas dataframes
# giving wavelength vs. responsivity for each filter.
# if we didn't have these files, and only had, say, a specification 
# for band center + FWHM of each filter, we would numerically generate 
# responsivity curves for each filter at this step.
# note that we load all the curves in, but when we generate simulated
# spectra, we only actually use the filters referenced in ZCAM_FREQS,
# which becomes the filter_frequencies field of the FilterSet object.
filter_files = [file for file in os.listdir(ZCAM_FILTER_PATH)]
filters = {}
for filter_file in filter_files:
    filter_name = re.search(r"(L|R).{1,2}(?=\.)", filter_file).group(0)
    filters[filter_name] = pd.read_csv(
        ZCAM_FILTER_PATH + filter_file,
        names=['wavelength', 'responsivity']
    )
print(filters.keys())

In [None]:
# make_filterset is a convenience function that takes the values defined
# above, interpolates each filter response curve to the array of shared bins,
# power-normalizes each of these interpolated curves, and builds a FilterSet object. 
# interpolation and normalization are helpful to get consistent results from 
# numerical integration and present users with apples-to-apples comparisons on 
# graphs; any consequent loss of precision is not meaningful in this application
# (and possibly not at all).
# we also pass it a path to a CSV file of solar spectra that will be used to simulate 
# observations with solar illumination.

zcam_filterset = make_filterset(
    'Mars-2020 Mast Camera Zoom (Mastcam-Z)',
    filters,
    ZCAM_BINS,
    ZCAM_FREQS,
    "filters/sun_input.csv"
)
# set short name and reference URL
zcam_filterset.short_name = 'Mastcam-Z'
zcam_filterset.url = "https://www.hou.usra.edu/meetings/lpsc2020/pdf/2312.pdf"

zcam_filterset.clean()
zcam_filterset.save()

## MAYBE: IV.3: creating new entries manually

Automatic import functions aren't the only way to produce new database entries. You can also manually generate them. This will often be more awkward than using import functions, but is preferable or necessary in some cases.


In [None]:
for sample in Sample.objects.all():
    sample.released = True
    sample.clean()
    sample.save(convolve=False)