# Generate the daily sales per state
We will need to do the same thing as we did with the weather data, but in this case we will create the  **daily_sales_per_state** view.

## Reading the data
Let's get started by loading the master sales data. We can do this in a similar way as we did with the weather data.

In [None]:
sales_data = sc.textFile("/data/master/sales")
sales_data.take(1)

So the sales data is structured a little bit different than the weather data. With the sales data, we already have all the data we need present and combined in a single record. However, it is not very clear which field contains what data, so we will convert the lines into dictionaries in order to add some clarity to our data.

As we will be using a lambda expression for running through the data and we can not write code blocks in a lambda expression, we will create a function we will call for that.

In [None]:
from datetime import datetime

def to_sales_record_map(line):
    fields = line.split("\t")
    res = {}
    res['transaction'] = fields[0]
    res['quantity'] = float(fields[1])
    res['price'] = float(fields[2])
    res['transaction_type'] = fields[3]
    res['tender_type'] = fields[4]
    res['date'] = datetime.strptime(fields[5], "%Y-%m-%dT%H:%M:%S.%fZ")
    res['store_key'] = fields[6]
    res['store_name'] = fields[7]
    res['store_state'] = fields[8]
    res['employee_key'] = fields[9]
    res['employee_name'] = fields[10]
    res['employee_gender'] = fields[11]
    res['employee_state'] = fields[12]
    res['employee_job_title'] = fields[13]
    res['product_key'] = fields[14]
    res['product_version'] = fields[15]
    res['product_description'] = fields[16]
    res['product_category'] = fields[17]
    res['product_department'] = fields[18]
    res['product_price'] = float(fields[19])
    res['customer_key'] = fields[20]
    res['customer_gender'] = fields[21]
    res['customer_age'] = int(fields[22])
    res['customer_marital_status'] = fields[23]
    res['customer_name'] = fields[24]
    res['customer_state'] = fields[25]
    return res

we can use the to_sales_record_map(line) function to convert a line of text into a dictionary that is much easier to  work with. You might wonder why we didn't do that for the weather data. Well, due to the structure of the weather data it didn't make sense to do it this way.

You can see some strange constructs in the to_sales_record_map function. Things like *int(fields[22]*, *float(fields[19])* and *datetime.strptime(fields[5], "%Y-%m-%dT%H:%M:%S.%fZ")*.
The first two are there to convert a text string into an integer or a float respectively. The latter is used to parse a text string representing a data into an actual datetime object we can work with and do calculations on.

Let's run that map function to transform our lines into dictionaries:

In [None]:
sales_records = 
sales_records.take(1)

The last thing to do is to transform our record in a Key/Value pair and convert the datetime object inside the value to a representation that can be stores later on. The default storage formats have an issue with the complex datetime object and require some simpeler representation.

For this we can use the **datetime.strftime(record, format)** function passing **%Y-%m-%dT%H:%M:%S.%fZ** as the format.

In [None]:
def min_two_digits(data):
    return('00' + str(data))[-2:];

def sales_key(record):
    res = str(record['date'].year);
    res += min_two_digits(record['date'].month);
    res += min_two_digits(record['date'].day);
    res += '-';
    res += record['store_state'];
    return res

def formatDate(record):
    record['date'] = datetime.strftime(record['date'], "%Y-%m-%dT%H:%M:%S.%fZ")
    return record;

In [None]:
sales = 
sales.take(1)

## Storing the result
Again a job well done! Let's just store that view in a sequence file as well so we can rely on it later on. The view is called **daily_sales_per_state** and it can be stored in the **/data/views** folder on HDFS