## Goal:
Create a function that reorganizes a text file to make a new text file that is plottable by python.

## Test:
* Time incruments are spaced equally
* Same number of coefficients across all time
* Data information not included
* The correct number of l and m coefficients at each interval


## Text file format:
Block format starting with the year and following coefficient in correct order
The location of each cofficient has to be the same within each block. The size of each line does not matter.

#### Example: ggf100k
$ year \quad g_{1}^{0} ... h_{4}^{4} $ &emsp; 25 elements <br>
$ g_{5}^{0} ... h_{6}^{6}   $ &emsp;  24 elements <br>
$ g_{7}^{0} ... h_{8}^{4}   $ &emsp; 24 elements<br>
$ g_{5}^{0} ... h_{6}^{6}   $ &emsp; 24 elements <br>
$ g_{8}^{9} ... h_{10}^{10} $ &emsp; 24 elements

$ year+1 \quad g_{1}^{0} ... h_{4}^{4} $ &emsp; 25 elements <br>
$ g_{5}^{0} ... h_{6}^{6}   $ &emsp;  24 elements <br>
$ g_{7}^{0} ... h_{8}^{4}   $ &emsp; 24 elements<br>
$ g_{5}^{0} ... h_{6}^{6}   $ &emsp; 24 elements <br>
$ g_{8}^{9} ... h_{10}^{10} $ &emsp; 24 elements <br>

### get_lines:
with open(): context manager, closes file when leaving unindented <br>
readlines: reads lines as individual elements in the list <br>

In [50]:
def get_lines(textdoc):
    """Reads in a text document.
    Input : textdoc needs to be in string format
    Output: lines comes out as a list, each element in the list is a
            in the text document
    """
    assert type(textdoc) == str, 'textdoc is not a string'
    
    with open(textdoc, mode='r') as f:
        lines = f.readlines()
        
    return lines

### get_eachline: Gets the number per line in a year block and total years
np.fromstrings: converts the list element string to array float <br>
for loop: Get the length of each line in a year block <br>
calculates the total lines in the text document

# Need to fill in output for help!!!!

In [51]:
def get_eachline(lines, blocklinelen):
    """Gets the number of elements in each line for a year block and
    calculate the total number of years.
    Input   : lines is a list, in each element is a line in the text document
            : blocklinelen is the number of lines in a year block in the text document
    Output  : 
    """
    
    assert type(lines)        == list, 'lines is not a list'
    assert type(blocklinelen) == int , 'block line length is not an integer'
    
    eachLine = np.zeros(blocklinelen)
    j = 0
    for index in range(0,blocklinelen):
        blockLine   = np.fromstring(lines[index], dtype=float, sep=' ')
        eachLine[j] = len(blockLine)
        j += 1
        
    linecount  = len(lines)
    totalYears = linecount/blocklinelen
    
    assert totalYears.is_integer(), 'An uneven number of block lines with total years'
    
    print('The total number of lines in the text document:', linecount)
    print('The length of each line is:', eachLine)
    print('The total year blocks:', totalYears)
    
    return linecount, eachLine

### get_years: Extracts the years
Similar to above cell script <br> 
for loop: collects the year from lines 0, blocklinelen, 2*blocklinelen, ... <br>

# fill in the output in the help!!!! and possilby change the year order assert

In [52]:
def get_years(lines, blocklinelen):
    """Extracts the years from the lines at the blocklinelen multiple
    Input   : lines is a list, in each element is a line in the text document
            : blocklinelen is the number of lines in a year block in the text document
    Output  : year is a 1-D array of the years
    """
    assert type(lines)        == list, 'lines is not a list'
    assert type(blocklinelen) == int , 'block line length is not an integer'
    
    linecount  = len(lines)
    totalYears = int(linecount/blocklinelen)
    
    year = np.zeros(int(totalYears))
    j=0
    
    for index, line in enumerate(lines, 0):
        if (index % blocklinelen) == 0:
            yearList = np.fromstring(lines[index], dtype=float, sep=' ')
            year[j] = yearList[0]
            j += 1
            
            if j>1:
                assert year[j-1] >= year[j-2], 'Years is not in chronological order'
        
    if len(np.unique(np.diff(year, n=1))) > 1: 
        print('THE YEARS ARE NOT EQUALLY SPACED!!!')
   
    assert len(np.unique(year)) == totalYears , 'There are duplicate years'
    
    return year

### get_degree: Calculates the degree and order
* Total Gauss Coefficients per year$ = 2l +1 $
* While loop: subtracts from the total coefficients in an increasing order until zero

In [53]:
def get_degree(eachLine):
    """ Calculates the degree and order from the total element length of a year block
    Input   : lines is a list, in each element is a line in the text document
            : blocklinelen is the number of lines in a year block in the text document
    Output  : l is the Gauss coefficient degree
            : totalGC is the total number of coefficients per year
    """
    
    assert type(eachLine) == np.ndarray
    
    totalGC = int(sum(eachLine)-1)
    countGC = totalGC
    
    l = 1
    while countGC > 0:
        countGC = countGC - (2*l+1)
        l+=1
    l = l-1
    
    assert countGC == 0, 'The number of cofficients does not equal 2l+1 amounts'
    
    print('Total coefficients per year:', totalGC)
    print('The degree and order is', l)
    
    return l, totalGC

### Put together the matrix
for loop: similar to the loops above but creates a 2-D array with every row containing on year of coefficients

#### Example: new 2-D array of ggf100k
year     $ \quad g_{1}^{0} ... g_{4}^{3}  h_{4}^{3} g_{4}^{4} h_{4}^{4} ...  h_{10}^{10} \quad $ 121 elements  <br>
year+1  $ g_{1}^{0} ... g_{4}^{3} h_{4}^{3} g_{4}^{4} h_{4}^{4} ...  h_{10}^{10} \quad $ 121 elements <br>



In [54]:
def create_2Darray(lines, blocklinelen, totalGC, eachLine):
    """
    Creates the 2D array
    Input   : lines is a list, in each element is a line in the text document
            : blocklinelen is the number of lines in a year block in the text document
    Output  : coeff is the coefficients of the 2D array
    """
    
    assert type(lines)        == list, 'lines is not a list'
    assert type(blocklinelen) == int , 'block line length is not an integer'
    assert type(totalGC)      == int , 'totalGC is not an integer'
     
    coeff = np.zeros((int(len(lines)/blocklinelen), int(totalGC+1)))
    row   = 0
    j     = 0

    for index, line in enumerate(lines, 0):
        
        yearList   = np.fromstring(lines[index], dtype=float, sep=' ')
        elementLen = len(yearList)
        
        assert type(yearList) == np.ndarray, 'One row line is not an array'

        if ((index % blocklinelen) == 0) and (index!=0):
            row += 1
            j    = 0
            
        if (index % blocklinelen) == 0:
            col=0
            coeff[row, col:elementLen] = yearList
        else:
            coeff[row, col:(elementLen+col)] = yearList
            
        col+=elementLen
        
        assert elementLen == eachLine[j], 'The length of the line is not the correct'
        j+=1
    
    assert coeff.shape == (int(len(lines)/blocklinelen), int(totalGC+1))
    
    return coeff

### main function:
Imports libraries
Calls all functions as needed and in order

In [None]:
import numpy as np



blocklinelen = 3
document = 'testPass.txt'

lines = get_lines(document)
[linecount, eachLine] = get_eachline(lines, blocklinelen)
year = get_years(lines, blocklinelen)
l_degree, totalGC = get_degree(eachLine)
coeff2D = create_2Darray(lines, blocklinelen, totalGC, eachLine)

# print std, mean, min, max number of values 

# np.savetxt('ggf100k_test.txt', coeff)

In [55]:
import numpy as np

blocklinelen = 3
document = 'testPass.txt'

lines = get_lines(document)
[linecount, eachLine] = get_eachline(lines, blocklinelen)
year = get_years(lines, blocklinelen)
l_degree, totalGC = get_degree(eachLine)
coeff2D = create_2Darray(lines, blocklinelen, totalGC, eachLine)

# print std, mean, min, max number of values 

# np.savetxt('ggf100k_test.txt', coeff)

The total number of lines in the text document: 15
The length of each line is: [16.  9. 11.]
The total years: 5.0
Total coefficients per year: 35
The degree and order is 5
