# Theory

## Converting Multi-line FASTA to Single-line FASTA

FASTA files often contain sequences split across multiple lines, typically with 60 or 80 characters per line. This formatting, while human-readable, can make automated searching or processing difficult.
   1. A header line starting with > followed by sequence description.
   2. Sequence lines spanning multiple lines.
   3. Our input file: multiline_input.fasta :
      
      `>`1
      
      // some sequence
      
      `>`2
      
      // some sequence
      
       ... and so on
 ## Goal:
Convert each FASTA record so that:
- The header (`>...`) remains as is on one line.
- The corresponding nucleotide or protein sequence is merged into **a single line**.


In [3]:
def convert_fasta(input_file, output_file):
    with open(input_file, 'r') as f_in, open(output_file, 'w') as f_out:
        sequence = ""
        for line in f_in:
            line = line.strip()
            if line.startswith(">"):
                if sequence:
                    f_out.write(sequence + "\n")
                    sequence = ""
                f_out.write(line + "\n")
            else:
                sequence += line
        if sequence:
            f_out.write(sequence + "\n")

convert_fasta("multiline_input.fasta", "singleline_output.fasta")


### Output is in file named singleline_output.fasta"

### Another method using Library- We can directly use library(biopython) to convert  Multi-line FASTA to Single-line FASTA

#### Need to install biopython Libaray 

In [7]:
pip install biopython

Note: you may need to restart the kernel to use updated packages.


In [8]:
from Bio import SeqIO
def convert_fasta(input_file, output_file):
    with open(output_file, 'w') as f_out:
        for record in SeqIO.parse(input_file, "fasta"):
            f_out.write(f">{record.id}\n{record.seq}\n")

convert_fasta("multiline_input.fasta", "singleline(bio)_output.fasta")


### Output is in file named singleline(bio)_output.fasta"