# Assemble RepeatMasker results

**Objective**: Assamble overlapping RepeatMasker matches that share similar classifications  
**Inputs**: `.out` files from the `RepeatMasker` directory  
**Output**: `.transposons`, `.ltr`, `.elem_stored` and `.copynumber` csv files for each contig in each assembly. The `.elem_stored` files will be used downstreem.  
**Strategy**: Use OneCodeToFindThemAll [program](http://doua.prabi.fr/software/one-code-to-find-them-all), [citation](http://www.mobilednajournal.com/content/5/1/13) with the 80 80 rule where matches have to be 80% the length and 80% identical. More relaxed filtering makes for many false positives.

In [1]:
import TE, os, glob

for code in TE.genome_codes_list('Genomes/', code_file='genome_assembly_files_v3.csv'):
    
    # Make output directory 
    if not os.path.exists('OneCodeToFIndThemAll/%s/'%code):
        os.mkdir('OneCodeToFIndThemAll/%s/'%code)
    
    # Check is a .out file exists
    if os.path.exists('RepeatMasker/%s/%s_coded.fasta.out'%(code,
                                                            TE.genomes_dict('Genomes/',
                                                                            code_file='genome_assembly_files_v3.csv')[code])):
        # Run the program
        TE.run_OneCodeToFindThemAll('RepeatMasker/%s'%code,
                                    'OneCodeToFIndThemAll/%s/ltr_dict.txt'%code,
                                    'OneCodeToFIndThemAll/%s/output'%code,
                                    'Genomes/%s'%TE.genomes_dict('Genomes/',
                                                                 code_file='genome_assembly_files_v3.csv')[code],
                                    build_dictionary='build_dictionary.pl',
                                    octfta='one_code_to_find_them_all.pl')
        
        # Outputs are written in the same path as the input files.
        # Move them to a new location
        files = glob.glob('RepeatMasker/%s/*.log.txt'%code)+glob.glob('RepeatMasker/%s/*.csv'%code)
        for f in files:
            new_file = 'OneCodeToFIndThemAll/%s/%s'%(code,f.split('/')[-1])
            os.rename(f, new_file)