# Recipe 1: Kmers parsing and counting

## Data

Sample light-weight data for running the examples.

[Click here to download](../../../assets/files/data.zip)

## Description

1. Create an empty kDataFrame with kmerSize = 21
2. Load a fasta file into a kDataFrame
3. Save the kDataFrame on disk

## Implementation

### Importing

In [1]:
import kProcessor as kp

### Create an empty kDataFrame

In [2]:
kf1 = kp.kDataFrameMQF(21)

### Parse the fastq file into the kf1 kDataFrame

In [10]:
# kp.parseSequencesFromFile(kDataFrame, mode, params, file_path, chunk size)
kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)

### Iterating over first 10 kmers

<div class="alert alert-info">

**Note:**

[kDataFrameIterator.next()](../py_api.html#kProcessor.kDataFrameIterator.kDataFrameIterator.next) is extremely important to move the iterator to the next kmer position.

</div>

In [11]:
it = kf1.begin()

for i in range(10):
    print(it.getKmer())
    it.next()

CCCAACAGAATTAAAAAGTCA
AAATTAAATAACTTTAGCGCA
CCAAATTACAACAAAATTTGG
TTAATCATTTGGTATAATTGC
ACCTCGTATAACTTCGTATAA
AACAATTCAACAGAGAAGGAC
AGGCTAATCGAACAAAACATC
AGGAAAAACTCCAGCCAGTAA
TACGGGTCGCAGTGACCAGGC
CCAGGTAGTACAGCAATCGTA


### Save the kDataFrame on disk with a name "kf1"


In [12]:
# This will save the file with the extension ".mqf"
kf1.save("kf1")

## Complete Script

```python

import kProcessor as kp

# Creating an empty kDataFrameMQF with kmer size 21
kf1 = kp.kDataFrameMQF(21)

kp.parseSequencesFromFile(kf1, "kmers", {"k_size" : 21}, "data/test.fastq", 1000)

kf1.save("kf1")

it = kf1.begin()

for i in range(10):
    print(it.getKmer())
    it.next()

```