## Shane Bechtel

# Process To Split UV Files Into Smaller Time or Frequency Ranges

This process shows how .uv files can have their time ranges (generally ~ 10 minutes) split into smaller time ranges (~ 10 seconds at the smallest) to analyze more specific times of a certain observation in order to view sources at more positions in HERA's beam for example. Other components of the available data can also be split in a similar way, such as seperating the frequency into smaller channel ranges. This process is largely done using the UVData module from the pyuvdata package. 

In [19]:
from pyuvdata import UVData #Allows one to work with Miriad Files and the data within
import numpy as np #Provides some useful tools to work with numbers and arrays
from copy import deepcopy #Will help with object manipulation later


First, a UVData object is created. Calling this object 'uv', an existing .uv file is read into the object. This allows for the manipulation of the data from the file without affecting the original data.

In [24]:
uv = UVData()

uv.read_miriad('/data6/HERA/data/2458042/zen.2458042.12552.xx.HH.uv') #This is an example of a .uv file that can be split
                                                                      #Into smaller Time Ranges


There are many useful parameters for these UVData objects, however the ones that will primarily be focused on are the time, data, baseline, and ferquency arrays. Splitting the .uv file based on time will allow for more data in projects such as analyzing the beam. Splitting the frequency of the file will allow for analysis of specific ranges of frequencies, once again assisting in determing the frequency dependence of the beam. By printing the shape of all of these arrays, we can ensure that the select method will properly affect all of the data present in the .uv file.

In [14]:
print uv.time_array.shape #Number of Times
print uv.data_array.shape #[Number of Baseline Times, Number of Spectral Windows, Number of Frequencies, Visibility value]
print uv.baseline_array.shape #Number of Baselines
print uv.freq_array.shape #[Number of Spectral Windows, Number of Frequencies]


(67680,)
(67680, 1, 1024, 1)
(67680,)
(1, 1024)


In [26]:
print uv.time_array #We can compare this to the end results later in order to show that the full range is covered

[ 2458042.12551992  2458042.12551992  2458042.12551992 ...,
  2458042.13285218  2458042.13285218  2458042.13285218]


## Splitting a UV file into Time Ranges

In this example, we show how one would go about splitting the file shown above into smaller time ranges. 

In [27]:
num_ranges = 2 #Define the number of ranges we want (must be an integer)

time = np.split(np.unique(uv.time_array),num_ranges) #Split the unique time values from the time array equally into
                                                     #num_ranges number of arrays contained within time

time = np.array(time) #Define time to be an array for use later on


In [28]:
uv_holder = [] #Create an array to store the new UVData objects within

for i in np.arange(0,num_ranges): #Loop over all the new time ranges
    
    v = deepcopy(uv) #Create a local variable v to be set to a deepcopy of uv. Deepcopy prevents the uv variable from changing
    
    v.select(times = time[i]) #Use the select method to manually set the time range equal to one stored in the time array
    
    uv_holder.append(v) #Append the new UVData object to uv_holder
    

In [31]:
print(uv_holder[0].time_array.shape) #The size is now exactly half of its original
print(np.unique(uv_holder[0].time_array)) #The array contains the original starting value and connects to the second UVData object

print(uv_holder[1].time_array.shape)
print(np.unique(uv_holder[1].time_array)) #The array contains the priginal end value and connects to the first UVData object

(33840,)
[ 2458042.12551992  2458042.12564419  2458042.12576847  2458042.12589274
  2458042.12601702  2458042.1261413   2458042.12626557  2458042.12638985
  2458042.12651412  2458042.1266384   2458042.12676267  2458042.12688695
  2458042.12701123  2458042.1271355   2458042.12725978  2458042.12738405
  2458042.12750833  2458042.1276326   2458042.12775688  2458042.12788115
  2458042.12800543  2458042.12812971  2458042.12825398  2458042.12837826
  2458042.12850253  2458042.12862681  2458042.12875108  2458042.12887536
  2458042.12899964  2458042.12912391]
(33840,)
[ 2458042.12924819  2458042.12937246  2458042.12949674  2458042.12962101
  2458042.12974529  2458042.12986957  2458042.12999384  2458042.13011812
  2458042.13024239  2458042.13036667  2458042.13049094  2458042.13061522
  2458042.1307395   2458042.13086377  2458042.13098805  2458042.13111232
  2458042.1312366   2458042.13136087  2458042.13148515  2458042.13160942
  2458042.1317337   2458042.13185798  2458042.13198225  2458042.1321

There are now two UVData objects, the first of which having the first half of the time range present in the 12552 file, while the second half has the end half. This process does work for most files, though the last observation of a night may cause an exception as it will not always have exactly 60 unique time values like the others. This means that a divisor of 60 will not always work in this simplified method. To accomodate this, a file split_time_uv.py was created to accomodate any sized file. This can be seen below.

In [None]:
from pyuvdata import UVData
import numpy as np
from copy import deepcopy
import sys
import os

# This function will split a given folder's uv files into smaller time ranges based on a given arguments

def new_uvs(uv,time,polarization,folder):
    v = deepcopy(uv)
        v.select(times=time)
        v.phase_to_time(np.median(v.time_array))
        #idx = os.path.basename(folder).find(polarization)
        name = os.path.basename(folder)[:12] + str(time[0]).split('.')[1][:5]  + os.path.basename(folder)[17:]
        vis_file = os.path.join(os.path.dirname(folder),name)+'.uvfits'
        print 'Writing: ' + vis_file
    v.write_uvfits(vis_file,spoof_nonessential=True)
    del v

def split_time_uv(folder,n=1,path=None,polarization='xx'):

    uv = UVData()
    print 'Reading: ' + folder
    uv.read_miriad(folder)
    times = np.unique(uv.time_array)

    if path is not None:
        folder = os.path.join(path,os.path.basename(folder))

    if len(times)%n != 0:
        n_times = int(60/n)
        mod = int(len(times)%n_times)
        ts = times[-mod:]
        times = times[:-mod]
        new_uvs(uv,ts,polarization,folder)
    n = int(len(times)/n_times)	
    times = np.split(times,n)

    for time in times:
        new_uvs(uv,time,polarization,folder)

if __name__ == '__main__':
    n = input('How many segments would you like to split this file into? (Default is 1) ')
    n = int(n)
    try:
                folders = sys.argv[1:]
        for folder in folders:
                        split_time_uv(folder,n)
        except IndexError:
                print('No file specified for conversion from miriad to uvfits')	


One can see that this code can take any sized .uv file and split it into equally sized time ranges, with any files being less than 60 unique time values accomodating this by having the first few splits contain an equal amount of time values as a split of a file with 60 time values, and the last one containing whatever is left over. 

## Splitting a UV File into Frequency Ranges

This next example will show how to split .uv files into frequency ranges of specific size. This is different from the time example as we chose the number of ranges there, not the size of each array. For this example, we will first examine the python file made to do this, then take a closer look at what it is all doing.

In [None]:
from pyuvdata import UVData
import numpy as np
from copy import deepcopy
import sys
import os

# This function will split a given folder's uv files into smaller time ranges based on a given arguments

def new_uvs(uv,freq,polarization,folder,count):
        v = deepcopy(uv)
        v.select(frequencies=freq)
        v.phase_to_time(np.median(v.time_array))
        #idx = os.path.basename(folder).find(polarization)
        name = os.path.basename(folder)[:18] + 'freq_' + str(count)  + os.path.basename(folder)[17:]
        vis_file = os.path.join(os.path.dirname(folder),name)+'.uvfits'
        print 'Writing: ' + vis_file
        v.write_uvfits(vis_file,spoof_nonessential=True)
        del v


def freq_select_uv(folder,n=20,path=None,polarization='xx'):
        """

        Converts a single uvR file to uvfits format

        Parameters
        ----------
        folder : str
                File path of the uvR file to be converted to uvfits format
        path : str
                File path where the new uvfits file will be written.
                Default is the current working directory.

        """

        uv = UVData()
        print 'Reading: ' + folder
        uv.read_miriad(folder)
        freqs = uv.freq_array[0]

        if path is not None:
                folder = os.path.join(path,os.path.basename(folder))

    rang = n*(1e6)
    last = uv.freq_array[0][-1]
    end_freqs = []
    end = last - 1e8
    mod = end%rang
    n_times = int(end/rang)

    if mod != 0:
        count = 0
        for i in uv.freq_array[0]:
            if (i>=last-mod):
                first = i
                break
            count += 1
        end_freqs = uv.freq_array[0][count:]
    freqs = uv.freq_array[0][:count]

    count = 0
    l = 1e8
    m = 1e8 + rang
    freq_hold = []

    for i in np.arange(0,n_times):
        freq_hold.append([])
        
        for j in freqs:
            if (j>= l + i*rang) & (j<= m + i*rang):
                freq_hold[i].append(j)
            elif (j > m + i*rang):
                break
        freq_hold[i] = np.array(freq_hold[i])
    
    freqs = np.array(freq_hold)

    count = 1

        for freq in freqs:
        
                new_uvs(uv,freq,polarization,folder,count)

        count += 1

    if (len(end_freqs) != 0):
        new_uvs(uv,end_freqs,polarization,folder,count)



if __name__ == '__main__':
        n = input('What range would you like your frequencies to be divided into? (MHz, default is 20):  ')
        n = int(n)
        path = '/data6/HERA/data/2458042/HERA_imaging/LST_analysis'
    try:
                folders = sys.argv[1:]
                for folder in folders:
                        freq_select_uv(folder,n,path = path)
        except IndexError:
                print('No file specified for conversion from miriad to uvfits')



This code follows much the same structure of the time example, though a few extra steps are necessary. 

In [27]:
uv = UVData() #Create the UVData object "uv"

uv.read_miriad('/data6/HERA/data/2458042/zen.2458042.12552.xx.HH.uv') #Read in a uv file


Many arrays and variables are created in order to make future processes cleaner and more simple. The fact that the first channel is actually at a frequency of 100 MHz, means that adjustments are needed to easily seperate the frequencies into specific ranges.

In [28]:
freqs = uv.freq_array[0] #Store the frequency array into a local array "freqs"

rang = 20*(1e6) #Determine the range of each frequency set (it is set to 20 MHz in this example)

last = uv.freq_array[0][-1] #Find the last value of the original frequency array

end_freqs = [] #Instantiate an array for the final frequency range if necessary

offset = 1e8 #This is to account for the fact the first frequency is 100 MHz

true_end = last - offset #The relative end value is the difference between the original value and the offset

mod = true_end%rang #Find the modulus between the true end value and the range set

n_times = int(true_end/rang) #Find how many times the range divides the frequency set evenly

Moving on, the fact that the final frequency is not exactly 200 MHz, means that there will likely be a slight offset in the final range. Therefore, a for loop is constructed if the final value isn't divided into evenly. This loop finds the first frequency value that would be found in the final range as well as its index and seperates the entire frequency range into those part of full ranges, and those part of the final incomplete range. 

In [29]:
if mod != 0: #If the mod is not zero, then the final frequency range will not be the same size as the first ones
    
    count = 0 #Set a count variable
    
    for i in uv.freq_array[0]: #Loop through each of the frequency values in the original array
        
        if (i>=last-mod): #The first value of the last frequency range will be equivalent to the last - mod value 
            
            first = i #Store the value of this first frequency
            
            break #End the for loop
        
        count += 1 #Increment the count each loop. When the loop is broken, this will be the index value of the first value
    
    end_freqs = uv.freq_array[0][count:] #Store the frequencies from the first value onward in the array end_freqs

    freqs = uv.freq_array[0][:count] #Store the frequencies up until the first value in the array freqs

Now, we create an array to store arrays filled with each of the full frequency ranges. We loop through this code as many times as there are the aforementioned ranges. 

In [30]:
count = 0 #Reinitialize count
l = 1e8 #Set a value for the start boundary of the first range
m = 1e8 + rang #Set a value for the end boundary of the first range
freq_hold = [] #Instantiate an array to hold the different frequency ranges

for i in np.arange(0,n_times): #Loop through for each full frequency range as determined by n_times
    
    freq_hold.append([]) #Add an empty array to freq_hold in order to store the next range
        
    for j in freqs: #Loop through each frequency value
        
        if (j>= l + i*rang) & (j<= m + i*rang): #If the frequency value falls within this range...
            
            freq_hold[i].append(j) #Then append it to the ith freq_hold array
            
        elif (j > m + i*rang): #In order to save computation time, once we reach a frequency higher than the upper bound,
                               #Then break the for loop
                break
                
    freq_hold[i] = np.array(freq_hold[i]) #Convert the ith list of freq_hold into an array
    
freqs = np.array(freq_hold) #Convert the list freq_hold into the array freqs

Finally, with all necessary frequency arrays being constructed, we first loop through all of the seperate arrays containing ranges in freqs and store a uv file with that range in a holder array: uv_holder. Then, we check to see if end_freqs is nonempty, and if it is, we then create a uv file with the frequencies found in end_freqs and append it to uv_holder.

In [None]:

uv_holder = [] #Instantiate an array to contain our uv files

for freq in freqs: #loop through each frequency range
        
        v = deepcopy(uv) #Create a deepcopy of uv in v
        
        v.select(frequencies = freq) #Use the method select in order to set the frequency range to those found in freqs
        
        uv_holder.append(v) #Append the new uv file to uv_holder
    
if (len(end_freqs) != 0): #If end_freqs is non-empty, then...
    
    v = deepcopy(uv) #Create a deepcopy of uv in v
        
    v.select(frequencies = end_freqs) #Use the method select in order to set the frequency range to that of end_freqs
    
    uv_holder.append(v) #Append the new uv file to uv_holder
    

uv_holder = np.array(uv_holder) #Convert the uv_holder list into an array
    

We now have a holder array filled with uv files with seperate frequency ranges. The ones produced from the freqs array will all have nearly the exact same number of elements in them (if not the exact same), and the final range will have a varied amount of elements. There will likely be no shared frequencies, unless a complex range is chosen, and no frequency will be excluded from every uv file. 

In [40]:
print uv_holder[0].freq_array.shape #The first 4 uv files have 205 frequencies
print uv_holder[4].freq_array.shape #The final uv file has 204 frequencies

print uv_holder[0].freq_array[0] #Each range of frequencies is contained within 20 MHz and 
print uv_holder[1].freq_array[0] #Connects directly with the start of the next one. 
print uv_holder[2].freq_array[0] #No frequency is left out, and each range is a seperate uv file
print uv_holder[3].freq_array[0]
print uv_holder[4].freq_array[0]

(1, 205)
(1, 204)
[  1.00000000e+08   1.00097656e+08   1.00195312e+08   1.00292969e+08
   1.00390625e+08   1.00488281e+08   1.00585938e+08   1.00683594e+08
   1.00781250e+08   1.00878906e+08   1.00976562e+08   1.01074219e+08
   1.01171875e+08   1.01269531e+08   1.01367188e+08   1.01464844e+08
   1.01562500e+08   1.01660156e+08   1.01757812e+08   1.01855469e+08
   1.01953125e+08   1.02050781e+08   1.02148438e+08   1.02246094e+08
   1.02343750e+08   1.02441406e+08   1.02539062e+08   1.02636719e+08
   1.02734375e+08   1.02832031e+08   1.02929688e+08   1.03027344e+08
   1.03125000e+08   1.03222656e+08   1.03320312e+08   1.03417969e+08
   1.03515625e+08   1.03613281e+08   1.03710938e+08   1.03808594e+08
   1.03906250e+08   1.04003906e+08   1.04101562e+08   1.04199219e+08
   1.04296875e+08   1.04394531e+08   1.04492188e+08   1.04589844e+08
   1.04687500e+08   1.04785156e+08   1.04882812e+08   1.04980469e+08
   1.05078125e+08   1.05175781e+08   1.05273438e+08   1.05371094e+08
   1.05468750e+0