<a href="https://colab.research.google.com/github/casangi/casadocs/blob/master/docs/notebooks/usingcasa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using CASA 

Description of how CASA interacts with the python environment.



***





### Obtaining and Installing 

A full installation of CASA including custom python environment is available as a Linux (.tar) or Mac (.dmg) file from our [Downloads](http://casa.nrao.edu/casa_obtaining.shtml) page (<http://casa.nrao.edu/casa_obtaining.shtml>)

The CASA 6.x series is also available as modular packages, giving users the flexibility to build CASA tools and tasks in their own Python environment. This includes the casatools, casatasks, and casampi modules, allowing for core data processing capabilities in parallel.

The CASA 5 and 6 versions in each release share the same C++ source code producing equivalent scientific output.  Both versions are intended to be nearly identical in functionality and output and thus share the same documentation found here. 

<div class="alert alert-info">
CASA 5 will be supported alongside CASA 6 for three release cycles (6.0/5.6, 6.1/5.7, and 6.2/5.8). CASA 5.8 will be the last CASA 5 release.
</div>

 



##### Full Installation of CASA 5 and 6

**On Linux:** 

1.  Download the .tar file and place it in a work directory (e.g. \~/casa)

2.  From a Linux terminal window, expand the file:
    ```
    $ tar -xzvf casa-xyz.tar.gz
    ```

3.  Start CASA
    ```
    $ ./casa-xyz/bin/casa
    ```

4.  The one caveat is that CASA on Linux currently will not run if the Security-Enhanced Linux option of the linux operating system is set to enforcing. For the non-root install to work, SElinux must be set to disabled or permissive (in `/etc/selinux/config`) or you must run (as root):

    ```
    setsebool -P allow_execheap=1
    ```

     Otherwise, you will encounter errors like:

    ```python
    error while loading shared libraries: /opt/casa/casa-20.0.5653-001/lib/liblapack.so.3.1.1: cannot restore segment prot after reloc: Permission denied
    ```

The non-root installation is thought to work on a wide variety of linux platforms. Check the list of operating systems [here](https://casa.nrao.edu/../casa_obtaining.shtml) for the officially supported OSs. Other platforms may work, too, but we do not regularly test those.  An unofficial Knowledge Base article on installing CASA on Ubuntu or Debian can be found [here](memo-series.ipynb#installing_casa_ubuntu_debian.pdf).

 

**On Macintosh:**

1.  Download the .dmg disk image file
2.  Double click on the disk image file (if your browser does not automatically open it).
3.  Drag the CASA application to the *Applications* folder of your hard disk.
4.  Eject the CASA disk image.
5.  Double click the CASA application to run it for the first time. If the OS does not allow you to install apps from non-Apple sources, please Change the settings in \"System Preferences-\> Security & Privacy -\> General\" and \"Allow applications downloaded from: Mac App store and identified developers\".
6.  Optional: Create symbolic links to the CASA version and its executables (Administrator privileges are required), which will allow you to run `casa`, `casaviewer`, `casaplotms`, etc. from any terminal command line. To do so, run 
    ```
    !create-symlinks 
    ```

<div class="alert alert-warning">
**WARNING:** By default, python 3.6 (and earlier versions of python 3) include the current working directory in the python path at startup. Any script in that directory with the same name as a standard python module or a CASA module will be detected and used by python instead of the code that is delivered with CASA. Protections have been included for files called "new.py" and "pickle.py", but other scripts may cause problems with the CASA startup. For example, do not include a file named runpy.py in the working directory. 
</div>

<div class="alert alert-warning">
**WARNING: **The GUI-based application **plotcal** and the interactive mode of **flagdata** have not yet been migrated into the CASA 6 series. Both are available in CASA 5.7. 
</div>

 

 



##### Modular Installation of CASA 6

Pip wheels for casatools and casatasks are available as Python 3 modules from the public PyPI server [casa-pip.nrao.edu](http://casa-pip.nrao.edu). This allows simple installation and import in to standard Python 3.6 environments. The casatools wheel is necessarily a binary wheel so there may be some compatibility issues for some time as we work toward making wheels available for important Python configurations. Initially, we are targeting Python 3.6 as provided by RedHat for our wheel production, with RH6 and RH7 as official supported platforms. We have had some success on other Linux-based platforms as well, but we do not recommend the use of Conda until compatibility with Conda is better understood.

The following prerequisites must be present on the host machine before installing CASA:

1.  Python 3.6
2.  libgfortran3 (yum or apt-get install)

Installation instructions are as follows (from a Linux terminal window):

```
$ python3 -m venv casa6

$ source casa6/bin/activate

(casa6) $ pip install --index-url https://casa-pip.nrao.edu/repository/pypi-casa-release/simple casatools

(casa6) $ pip install --index-url https://casa-pip.nrao.edu/repository/pypi-casa-release/simple casatasks
```

Start CASA and sanity check:

```
(casa6) $ python

Python 3.6.9 (default, Nov 7 2019, 10:44:02) 
[[GCC 8.3.0] on linux]
Type "help", "copyright", "credits" or "license" for more information.
>>> import casatasks
>>> help(casatasks)
```

To exit the python venv, type deactivate from the terminal.  However, the rest of this documentation **assumes the venv is active** (to reactivate, type source casa6/bin/activate)

The use of python3 venv is a simple built-in method of containerizing the pip install such that multiple versions of CASA 6.x can be kept on a single machine in different environments. In addition, CASA is built and tested using standard (python 3.6) libraries which can be replicated with a fresh venv, keeping the libraries needed for CASA isolated from other libraries which may already be installed on your machine.

With the pip installation, CASA may be used in a standard Pythonic manner. Examples can be found in [this Jupyter Notebook](https://colab.research.google.com/github/casangi/examples/blob/master/casa6/CASA6_demo.ipynb).

<div class="alert alert-warning">
**WARNING:** The pip-wheel modules for CASAviewer and CASAplotms, as well as other GUIs, are unvalidated. They are included in the full tar-file distribution, and we recommend to use of the tar-file for these GUIs. We also recommend to use the tar-file for add-on ALMA tools/tasks, such as wvrgcal. Additional testing is being performed to ensure that the pip-wheels for the GUIs and add-on ALMA tools/tasks can be reliably offered as stand-alone modules in a subsequent CASA 6 release.
</div>

 



##### Parallel Processing Installation

The casampi package provides the task-level [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) parallelization infrastructure of CASA.  The casatasks module detects when casampi is available and enables the parallel processing capabilities of CASA. Advanced users may also access the casampi package directly to build new or custom parallelization schemes.

The full installation of CASA includes the MPI package and no further action is necessary. 

For the modular installation of individual packages in to a standard python environment, ensure that openmpi is installed on the host machine (RHEL: yum install openmpi-devel, Ubuntu: apt-get install libopenmpi-dev), then perform the following commands(from the venv in a Linux terminal after the previous installation of casatools and casatasks):

*NRAO systems only: contact the helpdesk to install casa-toolset-3, then run the command following command: export PATH=/opt/casa/03/bin:\$PATH*

```
(casa6) $ pip install wheel

(casa6) $ pip install --index-url https://casa-pip.nrao.edu/repository/pypi-casa-release/simple casampi
```

Sanity check (from Linux terminal):

```
(casa6) $ echo "from casampi.MPIEnvironment import MPIEnvironment; print('working?', MPIEnvironment.is_mpi_enabled)" > test.py

(casa6) $ mpirun -q -n 2 python test.py
```

observe two instances of \"working? True\"

 



##### Jupyter Notebooks and Google Colab

<div>

<div>

Jupyter notebooks are ideally suited for code tutorials, exploration, and collaborative development. Together with Google Colaboratory, which hosts Jupyter notebooks on free virtual hardware in the cloud, the door is opened to powerful new ways of developing and sharing software. CASA 6 casatools and casatasks modules are compatible with the Google Colab environment.  The CASA team is working towards making additional modules compatible in the future as well as introducing new Jupyter-based CASAguide tutorials.

An example of a Jupyter notebook that explains installation and usage of CASA 6 is available [here](https://colab.research.google.com/github/casangi/examples/blob/master/casa6/CASA6_demo.ipynb).

</div>

</div>

 



##### CASA Tool Names

From the CASA 6 command line, the tools can be listed with \'*toolhelp( )\'*  and the tasks can be listed with \'*taskhelp( )*\'. In CASA 5, the tools had a certain name when imported from the **casac** module, and another name when used from the CASA 5 command line. In addition, one instance of each tool was pre-constructed and available for the user at the command line. The table below lists the tool naming in CASA 5 and CASA 6. In CASA 6, all of the CASA 5 names (e.g. imtool, im, etc.) are available for the user at the CASA 5 command line, but otherwise, the CASA 6/casac names are used by default. It is easy to import the CASA 6 tool with whatever name you like with:

\>\>\> from casatools import imager as imtool

  ---------------------------------------------- --------------------------------------------------- -------------------------------------------------
  **CASA 6/casac**                               **CASA 5/Class/Ctor**                               **CASA 5 instance**                            
  imager                                         imtool                                              im
  calibrater                                     cbtool                                              cb
  ms                                             mstool                                              ms
  quanta                                         qatool                                              qa
  table                                          tbtool                                              tb
  agentflagger                                   aftool                                              af
  measures                                       metool                                              me
  image                                          iatool                                              ia
  imagepol                                       potool                                              po
  simulator                                      smtool                                              sm
  componentlist                                  cltool                                              cl
  coordsys                                       cstool                                              cs
  regionmanager                                  rgtool                                              rg
  spectralline                                   sltool                                              sl
  vpmanager                                      vptool                                              vp
  msmetadata                                     msmdtool                                            msmd
  functional                                     fntool                                              fn
  imagemetadata                                  imdtool                                             imd
  atmosphere                                     attool                                              at
  calanalysis                                    catool                                              ca
  mstransformer                                  mttool                                              mt
  singledishms                                   sdmstool                                            sdms
  ---------------------------------------------- --------------------------------------------------- -------------------------------------------------

 



***





### Configuration 

CASA accepts a variety of options through two mechanisms: configuration files and command line arguments.  Configuration files are typically stored in a \~/.casa folder while command line options (only applicable to the full installation) are specified after the casa command at startup.

 



##### Configuration Files

Each modular CASA 6 package as well as the full installation reads a single **config.py** configuration file. This file should be placed in the user root .casa folder (**\~/.casa**) prior to starting the casa installation or importing the packages in to a standard python environment for the first time.

The following parameters can be set in the configuration file. Finer control over telemetry can also be set in the configuration file, as described [here](usingcasa.ipynb#telemetry). 

<div class="alert alert-info">
datapath              : list of paths where CASA should search for runtime data
logfile               : log file path/name
telemetry_enabled     : allow anonymous usage reporting
crashreporter_enabled : allow anonymous crash reporting
</div>

The configuration file is a standard python script, so any valid python syntax and libraries can be used.  A typical config.py file might look something like this:

```
$ cat ~/.casa/config.py

import time

[datapath=["/home/casa/data/casa-data", "/home/casa/data/casa-data-req"]]
logfile='casalog-%s.log' % time.strftime("%Y%m%d-%H",time.localtime())
telemetry_enabled = True
crashreporter_enabled = True
```

At runtime the datapath(s) are expanded through a resolve(\...) function to find the needed data tables. For example

```
>>> casatools.ctsys.resolve('geodetic/IERSpredict')

'/home/casa/data/casa-data/geodetic/IERSpredict'
```

<div class="alert alert-warning">
**WARNING**: CASA 5 does not use config.py. Instead ~/.casa/prelude.py is evaluated during startup before anything else and ~/.casa/init.py is evaluated just before the CASA prompt is presented. The configuration options are different and more limited. 
</div>

 



##### Command Line Arguments (full installation only)

With the full installation of CASA from a tar file, the python environment itself is included and started through ./bin/casa.  This ./bin/casa executable can be provided the following options to change configuration values at run time: 

<div class="alert alert-info">
  -h, --help            show this help message and exit
  --logfile LOGFILE     path to log file
  --log2term            direct output to terminal
  --nologger            do not start CASA logger
  --nologfile           do not create a log file
  --nogui               avoid starting GUI tools
  --rcdir RCDIR         location for startup files
  --norc                do not load user config.py (startup.py is unaffected)
  --colors {Neutral,NoColor,Linux,LightBG}
                        prompt color
  --pipeline            load CASA pipeline modules on startup
  --agg                 startup without graphical backend
  --iplog               create ipython log
  --notelemetry         disable telemetry collection
  --nocrashreport       do not submit an online report when CASA crashes
  --datapath DATAPATH   data path(s) [colon separated]
  --user-site           include user's local site-packages lib in path (toggling this option turns it on; use startup.py to append to the path)
  -c ...                python eval string or python script to execute
</div>

These options **take precedence over the configuration files.**

Some options imply or take precedence over other options:

-   \--nologfile takes precedence over \--logfile
-   \--nogui implies \--nologger
-   \--pipeline implies \--agg

<div class="alert alert-warning">
**WARNING**: the command line arguments listed above apply to CASA 6. In CASA 5 (including CASA 5.7):

- The following command line arguments are still available (removed/replaced in CASA 6): 
  ```
  --telemetry (removed in favor of --notelemetry in CASA 6) 
  --trace 
  --maclogger
  ```
- the following command line arguments are not available: 
  ```
  --norc 
  --notelemetry 
  --datapath 
  --user-site 
   ```
</div>



##### Starting CASA

CASA packages installed through pip may be imported in to the standard Python environment on the host machine. For example:

```
(casa6) $ python

Python 3.6.9 (default, Nov 7 2019, 10:44:02) 
[[GCC 8.3.0] on linux]
Type "help", "copyright", "credits" or "license" for more information.
>>> import casatasks
>>> help(casatasks)
```

The \~/.casa/**config.py** file will be read and processed when the casatasks package is imported.

The full installation of CASA includes a python environment and is executed like an application.  Any desired command line arguments may be included.  For example:

```
$ ./casa6/bin/casa --logfile MyTestRun.txt --nogui
```

The \~/.casa/**config.py** file will be read and processed as the casa application executes, with the supplied command line arguments (logfile and nogui) added on top.

Users may wish to set shortcuts, links, aliases or add bin/casa to their envrionment PATH.  See the documentation for your operating system. 



***





### Running User Scripts 

###### CASA 6: modular version

The modular version of CASA behaves like a standard Python package and user scripts should include the relevant modules as they would any other python module (i.e. numpy).  Executing external user scripts with modular CASA is just like any other python application.  Note we recommend running in a Python venv, see the [installation instructions](usingcasa.ipynb#obtaining-and-installing) for more information.

```
$ (casa6) python myscript.py param1 param2
```

###### CASA 6: all-inclusive version 

Since the full CASA installation from a tar file includes its own python environment that is (typically) not called directly, alternative methods of feeding in user scripts are necessary.  There are three main standard Python ways of executing external user scripts in the full installation of CASA:

1.  -c startup parameter (see configuration instructions)
2.  exec(open(\"./filename\").read()) within the CASA Python environment
3.  add your script to startup.py in the \~/.casa directory

In addition, an *\"execfile\"* python shortcut has been added to the full installation of CASA 6 for backwards compatibility with ALMA scriptForPI.py restore scripts. This allows running scripts with the following command:

4. execfile \'filename.py\' within the CASA Python environment

The *execfile* command in CASA 6 has been tested and found to work in the same way as in (Python 2 based) CASA 5 with the exception that the treatment of global variables has changed in Python 3. For *execfile* calls within a script which itself is run via *execfile*, it is necessary to add *globals()* as the second argument to those *execfile* calls in order for the nested script to know about the global variables of the calling script. For example, within a script *\'mainscript.py\'*, calls to another script *\'myscript.py\'* should be written as follows: *execfile(\'myscript.py\', globals())* . 

 



### Startup.py

**This section only applies to the monolithic/tar-file CASA distribution, and it only applies to CASA 6.**

For CASA 5, use *\~/.casa/init.py* instead. *startup.py* should be Python 3 compliant whereas *init.py* is assumed to be Python 2.7.

The \'*startup.py*\' file found in *\$HOME/.casa* (i.e. [*\~/.casa/startup.py*) is evaluated by the CASA shell just before the CASA prompt is presented to the user. This allows users to customize their CASA shell environment beyond the standard settings in [config.py](usingcasa.ipynb#configuration), by importing packages, setting variables or modifying the python system path. ]

One case where this is useful is for configuring CASA for ALMA data reduction. A package called \'analysisUtils\' is often used as part of ALMA analysis. It is typically imported and instantiated in startup.py:

```
$ cat ~/.casa/startup.py
import sys, os
sys.path.append("/home/casa/contrib/AIV/science/analysis_scripts/")
import analysisUtils as aU

es = aU.stuffForScienceDataReduction()
```

In this example, the standard python modules *os* and *sys* are made available in the CASA shell. The path where the *analysisUtils* module can be found is added to the Python system path, and finally the package is imported and an object is created. These modules and objects will then be available for the user within the CASA shell environment.



***



### Logging 

Detailed description of the CASA logger



#### **Logging your session**


The output from CASA commands is sent to the file casa-YYYYMMDD-HHMMSS.log in your local directory, where YYYYMMDD-HHMMSS are the UT date and time when CASA was started up. New starts of CASA create new log files.

![cde9d5a8ce1cfeb84295afa1b539d64fafe3213d](https://github.com/casangi/casadocs/blob/master/docs/notebooks/media/cde9d5a8ce1cfeb84295afa1b539d64fafe3213d.png?raw=1){width="900" height="353"}

>The CASA Logger GUI window under Linux. Note that under MacOSX a stripped down logger will instead appear as a Console.
  

The output contained in casa-YYYYMMDD-HHMMSS.log *i*s also displayed in a separate window using the **casalogger**. Generally, the logger window will be brought up when CASA is started. If you do not want the logger GUI to appear, then start casa using the *\--nologger* option,

```
 casa --nologger
```

which will run CASA in the terminal window. See [Starting CASA](old-pages.ipynb#starting-casa) for more startup options.

<div class="alert alert-warning">
**ALERT:** Due to problems with Qt , the GUI qtcasalogger is a different version on MacOSX and uses the Mac Console. This still has the important capabilities such as showing the messages and cut/paste. The following description is for the Linux version and thus should mostly be disregarded on OSX. On the Mac, you treat this as just another console window and use the usual mouse and hot-key actions to do what is needed.
</div>

The CASA logger window for Linux is shown in the [figure above](http://casa.nrao.edu/casadocs/stable/usingcasa/casa-logger#figid-loggerfiggui). The main feature is the display area for the log text, which is divided into columns. The columns are:

-   *Time* --- the time that the message was generated. Note that this will be in local computer time (usually UT) for casa generated messages, and may be different for user generated messages;
-   *Priority* --- the Priority Level (see below) of the message;
-   *Origin* --- where within CASA the message came from. This is in the format Task::Tool::Method (one or more of the fields may be missing depending upon the message);
-   *Message* --- the actual text.

![be077f88660e4fd271021e4d643915e0e53acc68](https://github.com/casangi/casadocs/blob/master/docs/notebooks/media/be077f88660e4fd271021e4d643915e0e53acc68.png?raw=1){width="900" height="353"}

>The CASA Logger GUI window under Linux. Note that under MacOSX a stripped down logger will instead appear as a Console.
  

![230a345b508be96e7bc81d5cb1f7e9bdecfc114f](https://github.com/casangi/casadocs/blob/master/docs/notebooks/media/230a345b508be96e7bc81d5cb1f7e9bdecfc114f.png?raw=1){width="900" height="353"}

>Using the casalogger Filter facility. The log output can be sorted by Priority, Time, Origin, and Message. In this example we are filtering by Origin using 'clean', and it now shows all the log output from the clean task.
  

 

The casalogger GUI has a range of features, which include:

-   *Search* --- search messages by entering text in the Search window and clicking the search icon. The search currently just matches the exact text you type anywhere in the message. See Figure [above](http://casa.nrao.edu/casadocs/stable/usingcasa/casa-logger#figid-loggerfiggui) for an example.
-   *Filter* --- a filter to sort by message priority, time, task/tool of origin, and message contents. Enter text in the *Filter* window and click the filter icon to the right of the window. Use the pull-down at the left of the *Filter* window to choose what to filter. The matching is for the exact text currently (no regular expressions). See Figure [above](http://casa.nrao.edu/casadocs/stable/usingcasa/casa-logger#figid-loggerfigfilter) for an example.
-   *View* --- show and hide columns (*Time, Priority, Origin, Message*) by checking boxes under the *View* menu pull-down. You can also change the font here.
-   *Insert Message* --- insert additional comments as "notes" in the log. Enter the text into the "I*nsert Message*" box at the bottom of the logger, and click on the *Add* (+) button, or choose to enter a longer message. The entered message will appear with a priority of "*NOTE*" with the Origin as your username. See Figure [below](http://casa.nrao.edu/casadocs/stable/usingcasa/casa-logger#figid-loggerfiginsert) for an example.
-   *Copy* --- left-click on a row, or click-drag a range of rows, or click at the start and *shift click* at the end to select. Use the *Copy* button or *Edit* menu *Copy* to put the selected rows into the clipboard. You can then (usually) paste this where you wish.
-   *Open* --- There is an Open function in the File menu, and an Open button, that will allow you to load old casalogger files.

<div class="alert alert-warning">
**Alert:** Messages added through *Insert Message* will currently not be inserted into the correct (or user controllable) order into the log. *Copy*  does not work routinely in the current version. It is recommended to open the casa-YYYYMMDD-HHMMSS.log file in a text editor, to grab text.
</div>

![507e2d4b51f64aef893603257c48a890f830c47c](https://github.com/casangi/casadocs/blob/master/docs/notebooks/media/507e2d4b51f64aef893603257c48a890f830c47c.png?raw=1){width="900" height="353"}

>CASA Logger - Insert facility: The log output can be augmented by adding notes or comments during the reduction. The file should then be saved to disk to retain these changes.
  

Other operations are also possible from the menu or buttons. Mouse "flyover" displays a tooltip describing the operation of buttons.

It is possible to change the name of the logging file. By default it is 'casa-YYYYMMDD-HHMMSS.log'. But starting CASA with the option *\--logfile *will redirect the output of the logger to the file 'otherfile.log' (see also Page on \"[Starting CASA](old-pages.ipynb#starting-casa)\").

```
casa --logfile otherfile.log
```

The log file can also be changed during a CASA session. Typing:

```
casalog.setlogfile('otherfile.log')
```

will redirect the output to the 'otherfile.log*'* file. However, the logger GUI will still be monitoring the previous 'casa-YYYYMMDD-HHMMSS.log' file. To change it to the new file, go on *File - Open* and select the new log file, in our case 'otherfile.log*'*.




#### **Startup options for the logger** 

One can specify logger options at the startup of casa on the command line:

```
casa <logger options>
```

The options are described in \"[Starting CASA](old-pages.ipynb#starting-casa)\". For example, to inhibit the a GUI and send the logging messages to your terminal, do

```
casa --nologger --log2term
```

while

```
casa --logfile mynewlogfile.log
```

will start CASA with logger messages going to the file mynewlogfile.log. For no log file at all, use:

```
casa --nologfile
```

 



#### **Setting priority levels in the logger** 

**Logger** messages are assigned a Priority Level when generated within CASA. The current levels of Priority are:

1.  *SEVERE* --- errors;
2.  *WARN* --- warnings;
3.  *INFO* --- basic information every user should be aware of or has requested;
4.  *INFO1* --- information possibly helpful to the user;
5.  *INFO2* --- details for advanced users;
6.  *INFO3* --- continued details;
7.  *INFO4* --- lowest level of non-debugging information;
8.  *DEBUGGING* --- most "important" debugging messages;
9.  *DEBUG1* --- more details;
10. *DEBUG2* --- lowest level of debugging messages.

The "debugging" levels are intended for the developers use. 

<div class="alert alert-info">
**Inside the Toolkit:**

The **casalog** tool can be used to control the logging. In particular, the **casalog.filter** method sets the priority threshold. This tool can also be used to change the output log file, and to post messages into the logger.

There is a threshold for which these messages are written to the casa-YYYYMMDD-HHMMSS.log file and are thus visible in the logger. By default, only messages at level *INFO* and above are logged. The user can change the threshold using the **casalog.filter** method. This takes a single string argument of the level for the threshold. The level sets the lowest priority that will be generated, and all messages of this level or higher will go into the casa-YYYYMMDD-HHMMSS.log file.

Some examples:

casalog.filter('INFO')           #the default
casalog.filter('INFO2')          #should satisfy even advanced users
casalog.filter('INFO4')          #all INFOx messages
casalog.filter('DEBUG2')         #all messages including debuggingcasalog.

**WARNING:** Setting the threshold to DEBUG2 will put lots of messages in the log!
</div>

 



***





### Information Collection 

To better understand real-world usage patterns, quality and reliability, CASA collects runtime telemetry and crash reports from users and sends periodic reports back to NRAO. This information is anonymous with no personal identifiable information (PII) or science data included.



#### Telemetry

Telemetry records task usage activity (task name, start time, end time) during CASA runs.  Periodically, these reports will be batched together and sent to NRAO.

You can disable telemetry by adding the following line in \~/.casa/config.py

```
telemetry_enabled = False
```

Telemetry adds log files in the \"rcdir\" (e.g., \~/.casa) directory and submits the data at CASA startup after a predefined interval. This can be configured in the \~/.casa/config.py file by setting telemetry_submit_interval to a desired value in seconds. The default value is 1 week.

The log file cache directory can be changed by setting \"telemetry_log_directory\" in \~/.casa/config.py. \"telemetry_log_directory\" must be an absolute path. 

Maximum telemetry log usage can be set with \"telemetry_log_limit\" (in kilobytes). CASA will check for the logfile size periodically and disable Telemetry when the limit is reached. The check interval can be set with \"telemetry_log_size_interval\" (seconds).

Summary of all available options in .casa/config.py:

```
telemetry_log_directory: /tmp
telemetry_log_limit: 1650
telemetry_log_size_interval: 30
telemetry_submit_interval: 20
```



#### Crash Reporter

Crash reports are triggered whenever a CASA task terminates abnormally (e.g., unhandled C++ exception, segfault, etc.). The crash reports include:

-   program call stack
-   filesystem mount information
-   CASA log
-   memory information
-   operating system version
-   CPU information

You can disable crash reports by adding the following line in \~/.casa/config.py

```
crashreporter_enabled = False
```

 

 



***





### Hardware Requirements 

Recommended CASA computing environments

The recommended Hardware requirements are provided [here](https://casa.nrao.edu/../casa_hardware-requirements.shtml) as part of the CASA webpages.

 



***





### Amazon Web Services 

Overview of how to use CASA on Amazon Web Services


#### Introduction 

An introduction to Amazon Web Services

In this chapter you will learn how to create an account within AWS, select appropriate resources for a problem, launch those resources, perform any desired processing, return or store resulting products, and finally to release the reserved resources back to Amazon.

![3c5e2909ab2bbc221f7c4669215ce21a70523c54](https://github.com/casangi/casadocs/blob/master/docs/notebooks/media/3c5e2909ab2bbc221f7c4669215ce21a70523c54.jpg?raw=1){.image-inline width="442" height="168"}



##### Amazon Web Services Introduction

[Amazon Web Services (AWS)](https://aws.amazon.com/) is a collection of physical assets and software tools for using ad hoc computing resources (aka Cloud Computing) within Amazon. The combination of a wide range of processing hardware, network speeds, storage media and tools allows users to create virtual computing platforms tailored to specific problem sizes with discrete durations.

In simplest terms, AWS allows users to create workstations or medium sized clusters of computers (ranging from 10\'s to a few 1000 nodes) that are essentially identical to the kind of physical workstation or small cluster they might have at their home institution without the overhead of upfront capital expense, space, power or cooling. The full range of offerings from Amazon goes well beyond that simple conceptual model but many, if not most, are not directly applicable to radio astronomy data processing.

The target audience for this document is the astronomer who wishes to run their computations more quickly, would like to know if AWS can help accomplish that goal, and what possibilities and limitations AWS brings.



##### Applicability to NRAO Data Processing

NRAO data products, particularly those from the Atacama Large Millimeter Array (ALMA) and the Jansky Very Large Array (JVLA), are of sufficient volume (100s to 1000s of GBytes) and compute complexity to require processing capabilities ranging from high end workstations to small clusters of servers. Additionally advanced imaging algorithms typically benefit from more system memory than is likely available on standard desktops.

AWS can facilitate the transfer of data among researchers through high speed networks or shared storage. Large scale projects which can be decomposed along some axis (e.g. by observation, field, frequency, etc) can be processed concurrently across 10s, 100s or even 1000s of compute instances.



##### Document Outline

This document set attempts to walk users through the necessary steps to [create an account within AWS](usingcasa.ipynb#account-and-user-setup), [select appropriate resources for their problem](usingcasa.ipynb#amazon-machine-images), [launch those resources](usingcasa.ipynb#amazon-machine-images), [perform any desired processing](usingcasa.ipynb#instances), [return or store resulting products to the user](usingcasa.ipynb#storage), and finally to release the reserved resources back to Amazon. The last step is a critical aspect to the financial viability of computing within AWS.  Later sections will cover the potential financial benefit and possible pitfalls of utilizing AWS resources.



##### Requesting Assistance

Given the unique nature of AWS resources, please direct any questions or comments to [nrao-aws\@nrao.edu](mailto:nrao-aws@nrao.edu?subject=AWS%20Questions/Comments "AWS Questions/Comments") rather than to CASA or Helpdesk personnel.



***





#### User Account Setup 

Creating and setting up a user account on AWS



##### Overview

To facilitate fine grain control of AWS resources, Amazon supplies two distinct account types. A Root user account (Account Root User) and Identity and Access Management Users (IAM Users). [Learn more.](http://docs.aws.amazon.com/general/latest/gr/root-vs-iam.html)

These accounts are distinct from regular Linux accounts that may exist on a compute instance.



##### Account Root User

[Click here and follow the steps to set up an Amazon Web Services account.](http://www.dummies.com/programming/cloud-computing/amazon-web-services/set-up-your-amazon-web-services-account/)

Signing up for an AWS account automatically creates a Root user account with total control over the account. The credit card used during sign-up will be billed for all usage by the Account Root User and the Account\'s IAM Users.

In general, the Root user account should only be used for changing account wide settings, e.g. creating or removing IAM users, changing AWS support plan or closing the account. An IAM User account should be used when requesting resources. Following this model allows for finer grain control over the type and scale of resources a specific user can request, and can limit the risk from unexpected expenses or accidental global changes to the account.



##### IAM Users

##### Getting Started with IAM Users

[View Amazon\'s AWS Documentation](http://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html)

The Root Account User can create IAM Users. These IAM Users may have more limited permissions or they may have Administrators permissions. It is recommended the Root Account User first create an IAM User in the Administrators group. That IAM User can then login and perform essentially all administrative operations, including adding more IAM Users.

IAM users can be given different levels of permissions for requesting AWS resources. Perimissions can be mapped to a User via membership in an IAM group or by having an IAM Role mapped to the User. More information on creating and utilizing IAM groups and IAM Roles can be found [here](http://docs.aws.amazon.com/IAM/latest/UserGuide/id.html)

##### How to Sign into AWS as an IAM User

[View Amazon\'s AWS Documentation](http://docs.aws.amazon.com/IAM/latest/UserGuide/console.html#user-sign-in-page)

##### Best practices for using IAM Users

[View Amazon\'s AWS Documentation](http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#create-iam-users.)



##### Linux Users

[View Amazon\'s AWS Documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)

IAM Users typically have the ability to start Instances, a virtual machine running Linux on AWS hardware. While starting the instance, an ssh key is specified. That key can be used to ssh into the running instance.

##### Adding Additional Linux Users

[View Amazon\'s AWS Documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/managing-users.html)



***





#### Amazon Machine Images 

An Amazon Machine Image (AMI) provides the information required to launch an instance, which is a virtual server in the cloud



##### Overview 

An AMI is an object that encapsulates a base OS image (e.g., Red Hat Enterprise Linux, CentOS, Debian), any 3rd party software packages (e.g., CASA, SciPy) and any OS level run time modifications (e.g., accounts, data, working directories). Since an AMI is a discrete object, it can be mapped onto differing hardware instances to provide a consistent work environment independent of the instance\'s number of processors, available memory, or total storage space.

The NRAO provides a set of pre-made images based on the standard Amazon image, which include specific release versions of CASA and Python and AWS\'s command line interface (CLI) and application programming interface (API). The appropriate AMI can be used to start an image with operating system and software ready to run CASA.



##### Finding an NRAO AMI

You can search for NRAO AMIs [using the console](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console).

-   From the navigation bar in the upper right, change your region to be \"US West (Oregon)\"
-   Open the AWS Console and click on \"EC2 Dashboard\".
-   Click on \"AMIs\" to bring up the interface for selecting AMIs.
-   The AMI selection interface by default displays only AMIs \"Owned by me\". Change it to \"Public Images\".
-   Narrow this long list to only NRAO images.
    -   Click in the box to the right of the magnifying glass.
    -   A menu listing \"Resource Attributes\" will pop up below the box you clicked on. Ignore it and type \"NRAO\" in the box and press the Enter key.
-   The list of AMIs has been narrowed to NRAO AMIs only.
-   New NRAO AMIs will be released periodically.



##### Using an AMI

Click the box next to the AMI you want. Click Launch. The [Instances](usingcasa.ipynb#instances) section of this document covers starting instances.



##### Geographic Locales

AWS has the concept of [Regions and Zones](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) in which Instances, EBS volumes, S3 buckets, and other resources are run. So an S3 bucket may be in region us-west-2 and not directly accessible if another IAM User is currently using us-east-1. However, a user may select the region they run in. And users may also duplicate some resouces across regions if access latency is the concern. To find the latency from your computer to all the AWS regions, try the cloudping tool: http://www.cloudping.info.



##### AMIs are Region-specific

An AWS AMI User can only use AMIs stored in its region. However, [copying an AMI](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/CopyingAMIs.html#ami-copy-steps) to another region is straightforward. The copy gets a new AMI-ID, so it effectively becomes a new AMI; the original AMI-ID is appended to the new AMI\'s \"Description\". The new AMI always starts out private and may need to be made public. (It takes about 15 minutes after making an AMI public for it to show up in a search.) In every other way, it is a duplicate of the original.

To make the image public, select the AMI and from the \"Actions\" menu and pick \"Modify Image Permissions\". As in this image select \"Public\" and click Save.



***





#### Storage 

AWS provides four basic forms of storage that vary by speed, proximity to their associated instance (which impacts latency/performance), and price.

The sections below describe storage in roughly the order of proximity to the instance. To first order each, subsequent type decreases in both performance and cost with Glacier being the slowest and cheapest. EBS may be the most commonly used.



##### Instance Store

Instance stores are solid state drives physically attached to the hardware the instance is running on that are available only on certain instance types. (Use of instance stores is beyond the scope of this document, although more information is available at this AWS page.) It is indirectly the most expensive form of storage since instance types with instance store capacity also include extra processor cores and memory. Cost effectiveness is a function of whether the extra cores and memory are utilized. Instance stores do not use redundant hardware. Instance stores cannot preserve their data after their instance stops.



##### Elastic Block Storage (EBS)

EBS is connected to instances via high speed networks and is stored on redundant hardware. EBS persists independently after a compute instance terminates (although it may be set to terminate instead). EBS storage can be allocated during the creation of an instance, or it may be created separately and attached to an existing instance. It may be detached from one instance and re-attached to another, which is potentially useful where the processing requirements for one stage of processing (e.g., calibration and flagging) are substantially different from a later stage (e.g., imaging).



##### Simple Storage Service (S3)

S3 storage is an object level store (rather than a block level store) designed for medium to long-term storage. Most software applications like CASA do not interact directly with S3 storage. Instead, one of the AWS Interfaces to S3 is used. Typically, S3 is used to temporarily store data before moving it to an EBS or Instance store volume for processing. Or it is used as long term storage for final products. As of this writing, S3 storage costs range from \$150 - \$360 TByte/year, depending on whether data is flagged as infrequent access. Longer term storage utilizes Glacier storage.



##### Glacier

Glacier is the lowest cost AWS storage. Data within S3 can be flagged for migration to glacier where it is copied to tape. As of this writing, Glacier storage costs roughly \$86 TByte/year. Retrival from Glacier takes \~4 hours.



***





#### Instances 

An instance is effectively a single computer composed of an OS, processors, memory, base storage and optional additional data storage



##### Instance Types

Amazon has predefined over 40 instance types. These fall into classes defined roughly by processing power, which are further subdivided by total memory and base storage.

[Click here to see a list of all Linux instance types with their number of  virtual CPUs (vCPU), total memory in GBytes of RAM, and the type of storage utilized by the instance type.](https://aws.amazon.com/ec2/pricing)

Note the \"vCPU\" is actually a hyperthread, so 2 \"vCPU\" equal one core, e.g. m4.xlarge has 2 cores. (These prices are for on-demand instances.)



##### Starting Instances

CASA requires \>=4GB/core. Storage can be EBS. Because AWS has hyperthreading turned on, each \"vCPU\" is one hyperthread and therefore 2 vCPUs essentially equal 1 physical core. Some experimentation may be required to optimize the instance type if many instances are to be used. This can have a very significant impact on total run time and cost. The 4GB per physical core rule should be sufficient to get started.



##### Choosing a Payment Option: On-demand vs. Spot

Choosing an on-demand instance (the default) guarantees you the use of that instance (barring hardware failure).

There is also a Spot Price market; [click here to read about it](https://aws.amazon.com/ec2/spot/pricing/). The price of an instance fluctuates over time as a function of demand.  AWS fills spot requests starting at the highest bid and working down to the lowest, the bid price paid by all spot users is the bid price reached when all resources were exhausted.  When you request a spot instance you submit a bid for those resources.  If that bid exceeds the current spot price, then the spot instance will launch and continue to run as long as the spot price remains below the bid. Typically, the spot price is much less than the on-demand price, so bidding the on-demand price typically permits an instance to run its job to completion. During the time it runs, it is billed only at the running spot price, not the bid, so the savings can be considerable.  If the spot price rises above your bid your instance will be terminated.  Be warned, if demand is excessive for that particular instance type and you bid 2x or more of the on-demand price you run the risk of the spot price rising to that level.  Over bidding the demand price is most useful for very long running jobs where it\'s considered acceptable to pay more for brief periods while minimizing the risk that the instance will be terminated due to a low bid.

For example, the on-demand price for a m4.xlarge instance is \$0.239 per hour. For the past 3 months, the mean spot price has been \$0.0341 (maximum \$0.0515 per hour). A 10-hour run of 100 instances would have cost \$239 for on-demand and \$34 for spot instances. That assumes adding 100 instances to the spot market will not affect the spot price much, which is a reasonable assumption. However, adding 500 instances will certainly raise the spot price.

It\'s possible to bid up to 10 times the on-demand price.

There are other ways to purchase AWS instances, but only on-demand and spot instances appear of interest to running CASA. [See purchasing options.](https://aws.amazon.com/ec2/purchasing-options/)



***





#### Monitoring 

A critical aspect to the financial viability of computing within AWS



##### Instance Monitoring

After a job finishes on an instance, that instance and storage are still running and generating charges to the AWS Account Root User. Instances and storage known to not be needed can of course be shut down. To check whether an instance is in use or not, a quick check can be made using the console: Click Instances, check the box next to your instance, and select the \"Monitoring\" tab. CPU Utilization is shown. But to be really sure, login to the instance and check if your job is finished. If so, you can transfer data off the root volume as needed and terminate the instance.

For more information see: <http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_SingleMetricPerInstance.html#d0e6752>



##### Storage Monitoring (EBS)

After the instance terminates, the instance ID and data volume names will continue to show up for several minutes. This can be used to find EBS volumes that were attached to the instance. If the user does not want their data to remain on EBS, i.e., to transfer their EBS volume to another instance or save it for later, then terminating the volume at that time makes sense. Although, the user might want to preserve the data by copying it to S3 or downloading it to a local storage device and then terminate the EBS volume.



##### Storage Monitoring (S3)

If you have data in S3 you may wish to leave it there. If you wish to move it to your local storage device, click the cube in the upper left of the console, then choose S3. Unfortunately the console is clumsy for transferring data. Reading through the [Interfaces](usingcasa.ipynb#interfaces) section of this chapter (and the relevant links), specifically on the use of the AWS CLI, is therefore recommended. Once the user is done with the previously allocated AWS resources, the user can release the reserved resources back to Amazon.



***





#### Interfaces 

Amazon has multiple interface methods for interacting with resources.

The [AWS console](https://aws.amazon.com/ec2/getting-started/) is a web interface used to control AWS resources. For users launching isolated instances and querying status, this is likely the simplest and most commonly used interface.

[The CLI](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html) is the command line interface to AWS. This software, installed on your local computer, can be used to launch and control AWS resources. This interface is most useful for repeated tasks and simple automation

The python SDK interface is beyond the scope of this document, but more information can be found [here](https://aws.amazon.com/sdk-for-python/). The python interface is most useful for complex automation frameworks and includes a REST-ful interface to all AWS resources.



***





#### Example Using CASA 

Tutorial of CASA on AWS



##### Overview of Using CASA on AWS

Amazon Web Services (AWS) allows researchers to use instances for NRAO data processing. This section presents readying an Instance to process a data set with CASA. The CASA tutorial will be used as a demonstration of running CASA on AWS.



##### Choose an Instance

The tutorial does not require an instance with a lot of CPU and RAM. Per the [hardware requirements page](usingcasa.ipynb#hardware-requirements) section on memory: 500MB per core is adequate. An m4.xlarge is more than adequat and costs \$0.24/hour. It has 2 cores and 16 GB RAM. A smaller instance such as m4.large would probably work as well, except it has 1 core and could not run things in parallel.



##### Get Ready to Start an Instance

Follow the directions on the [AMI page](usingcasa.ipynb#amazon-machine-images) to locate an NRAO AMI. Select the AMI, and from the Actions menu, choose Launch. Select the m4.xlarge image type, which, as mentioned above, should be adequate for the tutorial at 2 cores and 8 GB/core. After that, we often press the \"Review and Launch\" button to skip to the end of the process. However, we need to add some storage first.



##### Start an Instance with Some Extra Storage Space

Select \"4. Add Storage\" at the top. You can see the root volume is by default 8 GB. For the tutorial, we might get by with 8 GB, but enlarging the root volume will remove any doubt. Change 8 to 1024 (the upper limit for the root volume). You might notice a checkbox called \"Delete on Termination\". For root volumes, this is the default. Unchecking it causes the root volume to persist after shutdown of the instance. The charges for storing a terminated instance are minimal compared to the charge for a running one. The user (or a coworker with adequate privileges) can mount the EC2 volme on another instance. After making this selection, click Review and Launch. Then click \"Launch\". You are asked for the ssh key pair that will allow ssh access to the instance you are about to start. If possible, use an existing key pair. Click Launch to start the instance.



##### Logging into Your Instance

Once the instance has had a couple of minutes to start, you can see in the instances screen the running (and recently terminated) instances. Your instance is identifiable by the Key Name and the fact that under Status Checks it says \"Initializing.\" Copy the external IP address and login to the instance: ssh -i \~/.ssh/mykeyname.pem centos\@my-IP-address.  If your login is not immediate, try again in a minute or so.

Using an NRAO AMI to start an instance brings up an instance with CASA already installed. Everything you need to run CASA should be there except for the data, which will be downloaded directly to the instance in the next step.



##### Downloading the Data

In this example we\'ll be using the [VLA high frequency Spectral line tutorial](https://casaguides.nrao.edu/index.php?title=VLA_high_frequency_Spectral_Line_tutorial_-_IRC%2B10216).

You can bring up that page in a browser on your host computer, there\'s no need to launch a browser on the AWS instance.

Section 2 \"Obtaining the Data\" of the tutorial lists the URL where the data can be found and is repeated below.  Once you\'ve logged into your instance you can retreive and unpack the data with the commands below.  If you\'ve attached a seperate storage device to the instance you should cd to where it was mounted to ensure the data is written to that device.  See the [storage section](usingcasa.ipynb#storage) for more details.

```
wget http://casa.nrao.edu/Data/EVLA/IRC10216/day2_TDEM0003_10s_norx.tar.gz

tar xf day2_TDEM0003_10s_norx.tar.gz

```



##### Launching and Running CASA

Typing \'casa\' in the terminal will start the pre-installed version of casa. The first time it is run, it will take a few minutes to initialize. An Ipython interpreter for CASA will eventually open, ready for commands to CASA. (The CASA log window should display as well.)

###### Display the antenna map

```
#In CASA
plotants(vis='day2_TDEM0003_10s_norx',figfile='ant_locations.png')
```

###### Plot the MeasurementSet, amplitude vs. uv-distance

```
plotms(vis='day2_TDEM0003_10s_norx',field='3', xaxis='uvdist',yaxis='amp',correlation='RR,LL', avgchannel='64',spw='0~1:4~60', coloraxis='spw')
```

###### Flag data

```
flagdata(vis='day2_TDEM0003_10s_norx', mode='list', inpfile=["field='2,3' antenna='ea12' timerange='03:41:00~04:10:00'", "field='2,3' antenna='ea07,ea08' timerange='03:21:40~04:10:00' spw='1'"])
```

###### Transfer data

When you are done with your instance and want to move the data on its root volume to your local storage, you can use scp -r or rsync -a .



***





#### Costs 

Overview of Costs Associated with AWS

Amazon Web Services (AWS) allows researchers to use AWS resources for NRAO data processing. This section presents costs associated with using these resources. Resource types include: Instances, EBS Volumes, EBS Snapshots, S3, and Glacier.

The primary resource utilized is Instances.

Other resources are methods of storing input and output data: EBS, EFS, Snapshots, S3, and Glacier.

The way to contain costs is to first determine the needs of the code that is to be run. Then AWS resources can be matched to those needs as efficiently as possible given the circumstances.



##### CASA Hardware Requirements

##### Running CASA

Selecting a suitable instance type and size requires some knowledge about the CASA tasks to be run. The [Hardware Requirements](usingcasa.ipynb#hardware-requirements) page includes sizing guidelines for typical computing hardware that can be translated to AWS instance types.

##### Choosing an Instance

An instance is the best place to start (allocating resources). A list of on-demand instance costs and capabilities are listed here: <https://aws.amazon.com/ec2/pricing/>, though be aware it takes a minute to load and nothing will display under \"on-demand\" while the data loads. Note that spot instances can be utilized to run jobs at a much reduced cost; this is covered in the [Instances](usingcasa.ipynb#instances) section of this document. The goal is to select an Instance type that is large enough to do the job, but leaves as few resouces idle as possible.

##### File I/O

Default EBS is generally used. However, options exist to specify different types of EBS, e.g., storage with higher iops, etc., that cost more. EBS storage pricing details can be found here: <https://aws.amazon.com/ebs/pricing/>. For reference, there is a detailed discussion of EBS volume types here: <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html>.

##### Instance Store

Some instance types are pre-configured with attached SSD storage called \"instance store\". If you start such an instance, part of its intrinsic cost is this storage. More details about instance store are here: <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/add-instance-store-volumes.html#adding-instance-storage-instance>.



##### Selecting an Instance Type for CASA

##### What to try First

There are over 40 instance types with varying amounts of RAM and CPUs. There are, when you look closely, many storage options, but a few recurrent themes emerge. The simple storage system (S3) is primarily for storing large volumes of data from a few hours to a few months. S3 can be used to stage input and output data. You can share these data with other users. EBS storage is the most often used \"attached\" storage with performance comparable to a very good hard drive attached to your desktop system. However, it\'s bandwidth goes up with the core count, contrary to ordinary storage. The number of cores and GBs of RAM depend entirely on instance type. If you were to want 4 GB RAM per core (as recommended in CASA [Hardware Requirements](usingcasa.ipynb#hardware-requirements)) and 10 cores, you can find the closest instance on the list of instance types, <https://aws.amazon.com/ec2/instance-types/>. m4.4xlarge is close with 16 \"vCPU\" and 64 GB. Despite apearances, it does not have enough cores. AWS lists \"vCPUs\" (a.k.a. hyperthread) instead of cores. A vCPU is an internal CPU scheduling trick that is useful for certain classes of programs, but is nearly always detrimental in scientific computing, e.g., CASA. To summarize, 2 Amazon vCPUs equal 1 core. From here on cores are used, where 1 core = 2 vCPUs.

Continuing with the example of looking for an instance that meets the contraints of 10 cores and 4 GB RAM per core, it makes sense to look at core count first. The listing of instances with their core count, memory, and cost are here: <https://aws.amazon.com/ec2/pricing/>. There are no instances with 10 cores. The closest have 8 and 16 cores. So we\'ll look at \>=16 core instances. Also, we\'ll throw out any instances that do not have: RAM \>= (\#cores \* 4 GB RAM). What is left (without looking at immensely large and therefore expensive instances) are these:

-   m4.10xlarge with 20 cores and 160 GB of RAM. Cost: \$2.394/hour
-   x1.32xlarge with 64 cores and 1962 GB of RAM. Cost: \$13.338/hour
-   r3.8xlarge with 16 cores and 244 GB of RAM. Cost: \$2.66/hour
-   d2.8xlarge with 18 cores and 244 GB of RAM. Cost: \$5.52/hour

Selecting 10 cores produced a results list that contains the most expensive instances. If it\'s feasible to use a number of cores that is an exponent of 2, a more efficient arrangement may result.  Looking at what instances with 2\^3 = 8 cores also meet the criterion of 4GB RAM per core, for example:

-   m4.4xlarge 8 cores, 64 GB RAM. Cost: \$0.958/hour
-   c3.8xlarge 8 cores, 60 GB RAM. Cost: \$1.68/hour (instance store)

The c3.8xlarge, although very similar to the m4.4xlarge, costs 75% more per hour. That\'s because c3.8xlarge comes pre-configured with local (instance store) storage. This is charged even when it is not used. It is something to watch out for. Instance store can be useful, but it is tricky to make use of. The use of instance store is outside the scope of this document. When considering 8 core instances, m4.4xlarge appears to be the most attractive option in this case.

-   m4.4xlarge 8 cores 4 GB/core \$0.958/hour
-   r3.8xlarge 16 cores \~15 GB/core \$2.66/hour
-   r3.4xlarge 8 cores \~7.6 GB/core \$1.33/hour

r3.4xlarge is not far behind in price. And it has more RAM as well as 320 GB of instance store storage. So zeroing in on the best instance takes some time. However, it is not time well spent to find the most efficient instance until many instances are to be run or an instance is run for a long period of time.

 



##### What Instance(s) to Consider for Big or Long Jobs

So, to begin, it is probably best to choose EC2 as your primary storage, S3 for cold storage, and an instance with \>=4GB RAM per core. A more detailed discussion of these (and other) hardware considerations is outlined in the  [Hardware Requirements](usingcasa.ipynb#hardware-requirements) page.  What is covered here is what is sufficient to get started. Keep in mind that, since AWS has hyperthreading turned on, their \"2 cores\" means \"1 physical core\" (2 hyperthreads). For example, an AWS \"8 core\" instance type is actually only 4 physical cores. CASA does not make good use of virtual cores so if you want a system with 4 actual cores, select an AWS \"8 core\" system with \>= 16 GB of RAM. That should be sufficent to get started. As you use AWS more, you\'ll want to invest more time in optimizing the instance type based on the details of your processing case. If you are running only a few instances, such optimizations are not worth much effort, but if you plan to run hundreds of jobs, this can have a very significant impact on total run time and cost. The 4GB per physical core rule should be sufficient to get started, but more demanding imaging tasks will likely require 8GB or 16Gbyte per core.



##### AWS Storage for CASA

##### Root Volume

Starting an Instance with an NRAO AMI and accepting the storage defaults creates a suitable root volume for CASA.  If desired, exhaustive detail on root volumes is availabe at the AWS website: <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/RootDeviceStorage.html>.

##### Additional EBS Volumes

Additional EBS volumes can be added to an instance at any time during it\'s life cycle. See the following link for more information: <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html>.

 



***





### Citation 

Citation for use in publications

Please cite the following reference when using CASA for publications:

McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, Astronomical Data Analysis Software and Systems XVI (ASP Conf. Ser. 376), ed. R. A. Shaw, F. Hill, & D. J. Bell (San Francisco, CA: ASP), 127  ([ADS link](http://adsabs.harvard.edu/abs/2007ASPC..376..127M))