## Author: Etienne Bonnassieux
## Comments, questions? Contact etienne.bonnassieux@obspm.fr

# **I. Prerequisites**

In this jupyter notebook, we will learn how to install & run DDFacet on a LOFAR dataset. For this, you must first have the following installed (non-exhaustive list, I probably forgot at least one thing):

- the KERN suite http://kernsuite.info/installation/
- ddfacet - sudo apt-get install ddfacet; pip install ephem
- dysco (if using a LOFAR dataset) - sudo apt-get install dysco
- LOFAR softs, which include NDPPP - sudo apt-get install lofar

Note that all the installs recommended here only work after installing KERN. If you don't want to use KERN, that is fine! Just be prepared for a good week of tears and sorrow. Note that KERN also comes with a lot of other useful radio-astro tools; I personally recommend tigger as an image viewer (takes a while to get used to but has some very useful functions) and pybdsf if you wish to get deeper than this tutorial in interferometric imaging.

# **II. Introduction**


The aim of this tutorial is to calibrate & image a small LOFAR dataset. Here is our order of operations:
- Acquire the dataset (1.3GB disk space required)
- Calibrate the dataset using NDPPP
- Inspect the gain solutions
- Image the calibrated data
- Discuss self-calibration

To acquire the dataset, you would normally stage your files through the LOFAR Long-Term Archive (LTA). Here, because we want to work on a small dataset (30min), we will simply download it through the following link:  https://upload.obspm.fr/get?k=BMZQa9VRhz9U2p872MA

However, should you wish to acquire LTA data in the future, here are the main steps to perform.
1. Get an LTA account. This is often the most difficult part.
2. Find your dataset on the LTA interface. Also non-trivial.
3. Stage your dataset. "Staging" means that your dataset, which is usually stored on magnetic tapes to my knowledge, is physically moved to a location where you can access it online. You will receive an automatic notification telling you that your data is being staged, and another once it's successfully staged. You can then download it following the instructions given in your email and at the following webpage: https://www.astron.nl/lofarwiki/doku.php?id=public:lta_howto
4. Once your data is staged, you can untar the file. If in doubt, you can always use M.C. Toribio's untar script to untar all the files in a single command. This script can be found here: https://github.com/ebonnassieux/Scripts/blob/master/untar.py
5. Finally, because there can be issues with MS locks, it is often worth removing them altogether before doing any work on them. This can be done with a bash command from the directory where your MS are stored: 

In [1]:
mkdir DATA
mv L242400_SB095_uv.dppp.4h-4.5h.MS.tar.gz DATA
cd DATA
tar -xvf L242400_SB095_uv.dppp.4h-4.5h.MS.tar.gz
rm *MS/*lock
cd ..

rm: cannot remove '*MS/*lock': No such file or directory


We have thus created a DATA directory where we've moved our downloaded dataset, untarred it, removed the lock, and then we returned to our working directory. We can thus start the calibration.

# **III. Calibration**

In this tutorial, we will calibrate our data using the New Default Preprocessing Pipeline, or NDPPP. This is the standard tool used for calibration in PreFACTOR. Information on NDPPP can be found on the LOFAR documentation wiki: https://www.astron.nl/lofarwiki/doku.php?id=public:user_software:documentation:ndppp

Note that all the steps outlined here can be found in the LOFAR imaging cookbook, which is a very complete resource. If you run into a problem, which is likely to occur, someone else probably had the same issue before you: if so, the solution can likely be found in the cookbook. The most up-to-date version can be found here: https://www.astron.nl/radio-observatory/lofar/lofar-imaging-cookbook

A pdf version can also be found here: https://www.astron.nl/lofarwiki/lib/exe/fetch.php?media=public:lofar_imaging_cookbook_v18.pdf

To calibrate, we first need to acquire a model of the sky visible to our instrument during our pointing. There are a few ways to do this - the most reliable automated method is to extrapolate source positions and fluxes from other sky surveys. As a very basic starting point, however, one can use the Global Sky Model tool, gsm. Unfortunately it's not as easy as I had hoped to install...so instead, copy the following cell in a file which we will call tutorial.skymodel :

In [None]:
format = Name, Type, Ra, Dec, I, Q, U, V, MajorAxis, MinorAxis, Orientation, ReferenceFrequency='1.17577e+08', SpectralIndex='[]'

IMAGES/sb035VLApybdsmModel.firstcal.ssd1.int.restored_w0_i3_s1_g1, GAUSSIAN, 14:11:20.589732, +52.12.9.390453, 3.532e+01, 0.0, 0.0, 0.0, 1.80160e+00, 7.85124e-01, 1.57039e+02, 1.17577e+08, [-0.8]
IMAGES/sb035VLApybdsmModel.firstcal.ssd1.int.restored_w0_i3_s2_g2, GAUSSIAN, 14:11:20.717747, +52.12.7.867312, 2.395e+01, 0.0, 0.0, 0.0, 5.71713e-01, 3.66106e-01, 3.06168e+00, 1.17577e+08, [-0.8]
IMAGES/sb035VLApybdsmModel.firstcal.ssd1.int.restored_w0_i3_s0_g0, GAUSSIAN, 14:11:20.432337, +52.12.11.148637, 1.928e+01, 0.0, 0.0, 0.0, 7.40989e-01, 0.00000e+00, 1.33188e+02, 1.17577e+08, [-0.8]
IMAGES/sb035VLApybdsmModel.firstcal.ssd1.int.restored_w0_i3_s3_g3, GAUSSIAN, 14:11:20.799233, +52.12.6.145349, 1.926e+01, 0.0, 0.0, 0.0, 3.22666e+00, 9.18858e-01, 6.44905e+01, 1.17577e+08, [-0.8]


This is a model consisting of 4 clustered Gaussian, which together model 3C295, the brightest source by far in our field. It's not the best model, not by far - but it's the best I had to hand in this format, considering that gsm seems to be horribly broken now. Better practice would be to get a list of NVSS sources and use that as a model - but 3C295 is a double source at Dutch LOFAR resolutions, and dominates the field, so I am using this model in our example.

We thus have a starting sky model in the form of a list of Gaussians and points. However, before we can use NDPPP, we must perform one more step: turn this into a "modeldb" (model database). This is done through the following command in bash:

In [None]:
makesourcedb in=tutorial.skymodel out=DATA/L242400_SB095_uv.dppp.4h-4.5h.MS.sky format='<'

Obnoxiously, KERN's version of makesourcedb doesn't seem to work. Instead, just stick the sky folder inside the measurement set and be done with it - it was made with a working version of makesourcedb. Such as the dark magicks of LOFAR software.

To create the NDPPP parset we wish to use, simply copy and paste the contents of the next cell in a file we will call "ndppp.calibration.parset" :

In [None]:
msin  = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS
msout = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS
msout.datacolumn = CORRECTED_DATA
#msin.baseline=^CS*&  # ^ is the Not operator.
msout.overwrite  = true

# define the steps we use:
# gaincal calibrates
steps = [gaincal]

# if you wanted to average your dataset
#steps=[averager]
#averager.freqstep=8
#averager.timestep=10

gaincal.baseline       = [CR]*&&              # flag international stations
gaincal.caltype        = diagonal
gaincal.applysolution  = true
gaincal.solint         = 8
gaincal.nchan          = 4
gaincal.maxiter        = 800
gaincal.sourcedb       = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS/sky
gaincal.parmdb         =
gaincal.operation      = replace
gaincal.usebeammodel   = true
gaincal.usechannelfreq = false
gaincal.beammode       = default

And we're nearly done - all that's left is to actually run NDPPP! In our case, because we specify msin and msout in the parset, this is done by copying the following command into a bash terminal:

In [None]:
NDPPP ndppp.calibration.parset

Simply wait for NDPPP to do its thing. In the meantime, it pays to look at what NDPPP is telling you - almost 100% of the time, if there's a problem with your calibration, your solver will tell you! Somewhere under the deluge of information it outputs, anyway...


# **IV. Imaging**

Finally, we get to the meat of the matter: imaging using DDFacet. First, should you want to have a standalone install of DDFacet (and its companion calibration software, killMS - here, we used NDPPP to do what killMS would), go to the following two git repos:

* https://github.com/saopicc/DDFacet
* https://github.com/saopicc/killMS

and follow the instructions there. There are usually a host of problems with that sort of installation, as with all radio-astronomy software. Note that you can find the best "documentation" for DDFacet (and similarly for killMS, to a much lesser degree) in their respective parset.cfg file. For DDF, look here:

https://github.com/saopicc/DDFacet/blob/master/DDFacet/Parset/DefaultParset.cfg


Should you want a slightly more convenient install, Martin Hardcastle's DDF-pipeline installs both of the above for you:

* https://github.com/mhardcastle/ddf-pipeline

Let's start by taking a little look at the options of DDFacet. Run the following command:

In [None]:
DDF.py -h

As you can see, there's quite a bit!...in fact, the vast majority of them can be safely ignored. For starters, let's make a small image with our dataset. Since I'm running this on my laptop, I don't want to use too many processors, nor make too large an image (the defaults for NCPU and NPix are 10 and 5000 respectively).

In [None]:
 DDF.py --Data-MS DATA/L242400_SB095_uv.dppp.4h-4.5h.MS --Parallel-NCPU 4 --Image-NPix 500 --Output-Mode Dirty

If all went well, you should now have a restored image of our 30min data. As you can see, the dirty image isn't much to look at (which is expected - 30min of supersynthesis leaves much to be desired in terms of uv-coverage!) and the restored image even worse. Indeed, we don't seem to have signal in our field. Let's try to shift to 3C295 to see what's going on.

In [1]:
 DDF.py --Data-MS DATA/L242400_SB095_uv.dppp.4h-4.5h.MS --Parallel-NCPU 4 --Image-NPix 500 --Output-Mode Dirty --Image-PhaseCenterRADEC=["14:11:20.23","52:12:04.30"]

SyntaxError: invalid syntax (<ipython-input-1-8ce18d386b62>, line 1)

Now we're getting somewhere! Clearly we have a bright source in the middle. Time to clean it:

In [None]:
 DDF.py --Data-MS DATA/L242400_SB095_uv.dppp.4h-4.5h.MS --Parallel-NCPU 4 --Image-NPix 500 --Output-Mode Clean --Image-PhaseCenterRADEC=["14:11:20.23","52:12:04.30"]

Note that the default of --Output-Mode is actually Clean, so we don't need to put that in our command line, strictly speaking. Let's look at the output again - it should be a nice Gaussian blob! To see other things in the field, we need to subtract this source from our visibilities. To do this, we'll go back to NDPPP. Put the following in a parset called "ndppp.subtract.parset":

In [None]:
msin  = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS
msout = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS
msin.datacolumn	 = CORRECTED_DATA
msout.datacolumn = CORRECTED_DATA
#msin.baseline=^CS*&  # ^ is the Not operator.
msout.overwrite  = true

# define the steps we use:
# gaincal calibrates
steps = [predict]

predict.baseline       = [CR]*&&
predict.sourcedb       = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS/sky
predict.operation      = subtract
predict.usebeammodel   = true
predict.beammode       = default

Note that you can add "predict" to the steps of your first ndppp parset, specifying the parameters above. Now, we should have subtracted 3C295 from our visibilities - let's check!

In [None]:
 DDF.py --Data-MS DATA/L242400_SB095_uv.dppp.4h-4.5h.MS --Parallel-NCPU 4 --Image-NPix 500 --Output-Mode Dirty --Image-PhaseCenterRADEC=["14:11:20.23","52:12:04.30"] --Output-Name subtracted.image

All seems in order! Let's look at our target field again:

In [None]:
 DDF.py --Data-MS DATA/L242400_SB095_uv.dppp.4h-4.5h.MS --Parallel-NCPU 4 --Image-NPix 500 --Output-Mode Clean

We can now see a few sources in the field - excellent news. To summarise, we have:

- downloaded and opened a raw dataset
- calibrated it with NDPPP
- imaged it and saw we were dominated by a source in the field
- subtracted said source from the field
- achieved an image of our target field.

Let's close by looking at the contents of a DDF parset. Each DDF run creates an associated parset. This is useful for two reasons: firstly, it means that you know exactly what your last run of a given name used as parameters. Second, it makes it very easy to duplicate results: you can run DDF in the command line by giving commands, but can also set all the defaults to be those of an input parset as follows:

In [None]:
DDF.py whatever.parset --Output-Mode PSF --Image-NPix 2 ...

Let's look at the contents of our latest parset output, image.parset:

In [None]:
[Data]
MS = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS 
ColName = CORRECTED_DATA 
ChunkHours = 0.0 
Sort = False 

[Predict]
ColName = None 
MaskSquare = None 
FromImage = None 
InitDicoModel = None 
Overwrite = True 

[Selection]
Field = 0 
DDID = 0 
TaQL =  
ChanStart = 0 
ChanEnd = -1 
ChanStep = 1 
FlagAnts =  
UVRangeKm = [0, 2000] 
TimeRange =  
DistMaxToCore =  

[Output]
Mode = Clean 
Name = image 
ShiftFacetsFile = None 
RestoringBeam = None 
Also =  
Cubes =  
Images = DdPAMRIikz 
alphathreshold = 7 
alphamaskthreshold = 15 
StokesResidues = I 

[Image]
NPix = 500 
Cell = 5.0 
PhaseCenterRADEC = align 
SidelobeSearchWindow = 200 

[Facets]
NFacets = 3 
CatNodes = None 
DiamMax = 180.0 
DiamMin = 0.0 
PSFOversize = 1.0 
PSFFacets = 0 
Padding = 1.7 
Circumcision = 0 

[Weight]
ColName = WEIGHT_SPECTRUM 
Mode = Briggs 
MFS = True 
Robust = 0.0 
SuperUniform = 1.0 

[RIME]
Precision = S 
PolMode = I 
FFTMachine = FFTW 
ForwardMode = BDA-degrid 
BackwardMode = BDA-grid 
DecorrMode =  
DecorrLocation = Edge 

[CF]
OverS = 11 
Support = 7 
Nw = 100 
wmax = 0.0 

[Comp]
GridDecorr = 0.02 
GridFoV = Facet 
DegridDecorr = 0.02 
DegridFoV = Facet 
Sparsification = 0 
BDAMode = 1 
BDAJones = 0 

[Parallel]
NCPU = 4 
Affinity = 1 
MainProcessAffinity = 0 

[Cache]
Reset = True 
SmoothBeam = auto 
Weight = auto 
PSF = auto 
Dirty = auto 
VisData = auto 
LastResidual = True 
Dir =  
DirWisdomFFTW = ~/.fftw_wisdom 
ResetWisdom = False 
CF = True 
HMP = False 

[Beam]
Model = None 
At = facet 
LOFARBeamMode = AE 
NBand = 0 
CenterNorm = False 
Smooth = False 
SmoothNPix = 11 
FITSFile = beam_$(corr)_$(reim).fits 
FITSFeed = None 
DtBeamMin = 5.0 
FITSParAngleIncDeg = 5.0 
FITSLAxis = -X 
FITSMAxis = Y 
FITSVerbosity = 0 

[Freq]
BandMHz = 0.0 
DegridBandMHz = 0.0 
NBand = 1 
NDegridBand = 0 

[DDESolutions]
DDSols =  
SolsDir = None 
GlobalNorm = None 
JonesNormList = AP 
JonesMode = Full 
DDModeGrid = AP 
DDModeDeGrid = AP 
ScaleAmpGrid = 0 
ScaleAmpDeGrid = 0 
CalibErr = 10.0 
Type = Nearest 
Scale = 1.0 
gamma = 4.0 
RestoreSub = False 
ReWeightSNR = 0.0 

[Deconv]
Mode = HMP 
MaxMajorIter = 20 
MaxMinorIter = 20000 
AllowNegative = True 
Gain = 0.1 
FluxThreshold = 0.0 
CycleFactor = 0.0 
RMSFactor = 0.0 
PeakFactor = 0.15 
PrevPeakFactor = 0.0 
NumRMSSamples = 10000 
ApproximatePSF = 0 
PSFBox = auto 

[Mask]
External = None 
Auto = False 
SigTh = 10 
FluxImageType = ModelConv 

[Noise]
MinStats = [60, 2] 
BrutalHMP = True 

[HMP]
Alpha = [-1.0, 1.0, 11] 
Scales = [0] 
Ratios = [''] 
NTheta = 6 
SolverMode = PI 
AllowResidIncrease = 0.1 
MajorStallThreshold = 0.8 
Taper = 0 
Support = 0 
PeakWeightImage = None 
Kappa = 0.0 
OuterSpaceTh = 2.0 
FractionRandomPeak = None 

[Hogbom]
PolyFitOrder = 3 
MaxLengthScale = 5 
FreqMode = Poly 
NumBasisFuncs = 12 

[Montblanc]
TensorflowServerTarget =  

[SSDClean]
Parallel = True 
IslandDeconvMode = GA 
SSDSolvePars = ['S', 'Alpha'] 
SSDCostFunc = ['Chi2', 'MinFlux'] 
BICFactor = 0.0 
ArtifactRobust = False 
ConvFFTSwitch = 1000 
NEnlargePars = 0 
NEnlargeData = 2 
RestoreMetroSwitch = 0 
MinMaxGroupDistance = [10, 50] 

[GAClean]
NSourceKin = 50 
NMaxGen = 50 
MinSizeInit = 10 
InitType = HMP 
AlphaInitHMP = [-4.0, 1.0, 6] 
ScalesInitHMP = [0, 1, 2, 4, 8, 16, 24, 32] 
GainInitHMP = 0.1 
RatiosInitHMP = [''] 
NThetaInitHMP = 4 
MaxMinorIterInitHMP = 10000 
AllowNegativeInitHMP = False 
RMSFactorInitHMP = 3.0 
ParallelInitHMP = True 
NCPU = 0 

[MORESANE]
NMajorIter = 200 
NMinorIter = 200 
Gain = 0.1 
ForcePositive = True 
SigmaCutLevel = 1 

[MUFFIN]
mu_s = 0.1 
mu_l = 0.2 
nb = ['(8', '0)'] 
NMinorIter = 200 

[Log]
Memory = False 
Boring = False 
Append = False 

[Debug]
PauseWorkers = False 
FacetPhaseShift = [0.0, 0.0] 
PrintMinorCycleRMS = False 
DumpCleanSolutions = 0 
DumpCleanPostageStamps =  
CleanStallThreshold = 0.0 
MemoryGreedy = True 
APPVerbose = 0 
Pdb = auto 

[Misc]
RandomSeed = None 
ParsetVersion = 0.2 
ConserveMemory = False 

As we can see, quite a lot of options...here are the ones you would likely be concerned with. Note that the general command line syntax for an option in the pattern
[Category]
Option = True

is --Category-Option True   (the = sign is facultative)

In [None]:
[Data]
MS = DATA/L242400_SB095_uv.dppp.4h-4.5h.MS # name of your ms
ColName = CORRECTED_DATA                   # name of column to image
ChunkHours = 0.0                           # in case of memory error, set to 4, 2, 1...

[Selection]
FlagAnts =                                 # regular expression or strings for ants to flag
UVRangeKm = [0, 2000]                      # self evident

[Output]
Mode = Clean                               # can also be Dirty or PSF
Name = image                               # name of image you're making
RestoringBeam = None                       # set to 4 times cell size by default
Cubes =                                    # if you want freq. cubes. See -h
Images = DdPAMRIikz                        # see DDF.py -h | grep Images 

[Image]
NPix = 500                                 # number of pixels a side
Cell = 5.0                                 # pixel size in arcsec
PhaseCenterRADEC = align                   # where to put centre of image. If align, same as MS phase centre

[Weight]
ColName = WEIGHT_SPECTRUM                  # weight column to use
Mode = Briggs                              # self evident but maybe you want to use uniform or something
Robust = 0.0                               # see above

[RIME]
DecorrMode =                               # set to FT for wide fields of view. it's RIME magic.

[Parallel]
NCPU = 4                                   # number of CPUs used for parallelisation

[Cache]
Reset = True                               # good practice to set this to True when not changing image name.

[Beam]
Model = None                               # can be LOFAR
LOFARBeamMode = AE                         # can be A or AE

[Freq]
NBand = 1                                  # use this when you want to make cubes: get as many freq. slices as NBand

# after that, it's just clean options.

[Deconv]
Mode = HMP 
MaxMajorIter = 20 
MaxMinorIter = 20000 
AllowNegative = True 
Gain = 0.1 
FluxThreshold = 0.0 
CycleFactor = 0.0 
RMSFactor = 0.0 
PeakFactor = 0.15 
PrevPeakFactor = 0.0 
NumRMSSamples = 10000 
ApproximatePSF = 0 
PSFBox = auto 

[Mask]
External = None 
Auto = False 
SigTh = 10 
FluxImageType = ModelConv 


[HMP]
Alpha = [-1.0, 1.0, 11] 
Scales = [0] 
Ratios = [''] 
NTheta = 6 
SolverMode = PI 
AllowResidIncrease = 0.1 
MajorStallThreshold = 0.8 
Taper = 0 
Support = 0 
PeakWeightImage = None 
Kappa = 0.0 
OuterSpaceTh = 2.0 
FractionRandomPeak = None 