# Capture Raw Data from Event Hub into DataLake
Event hubs can automatically save data into a table in a Gen 2 data lake. The event data gets stored as binary in the data lake. This means that no validation of the event data is performed on write so you are required to validate the data on reading instead. Validating the data on read also allows for different data formats and shapes to all be sent through the 1 event hub into the same data lake table. Filtering on read can then be applied to retrieve only the events you require. The Data Stored in this table is immutable and is owned by the event hub. The only way this data can be modified is the table being deleted when you delete the raw data capture from an event hub.

The data gets saved to the data lake with the following schema.

| SequenceNumber | Offset | EnqueuedTimeUtc | SystemProperties | Properties | Body   |
|----------------|--------|-----------------|------------------|------------|--------|
| Int            | String | DateTime        | Map              | Map        | Binary |

The following optional columns may also exist depending on how you partition the data.

| Year(Optional) | Month(Optional) | Day(Optional) | Hour(Optional) | PartitionId(Optional) |
|----------------|-----------------|---------------|----------------|-----------------------|
| Int            | Int             | Int           | Int            | String                |

### Import libraries

In [None]:
from neuro_python.neuro_data import endpoint_manager as epm, datastore_manager as dsm, schema_manager as sm

### Get data stores

In [None]:
datastores=[datastore['StoreName'] for datastore in dsm.list_data_stores()]
datastores

### Create the raw data capture
Choose an appropriate datetime_partition_level to partition your data by datatime. For example DateTimeLevels.Day will cause data partitioning like Year=2020/Month=1/Day=19/... where as DateTimeLevels.Hour will result in Year=2020/Month=1/Day=19/Hour=13/... partition_id_level will allow your data to be grouped using the 32 partitions of the event hub (when sending data to an event hub you can choose the partition). For example DateTimeLevels.Day and PartitionIdLevels.Top will result in PartitionId=0/Year=2020/Month=1/Day=19/Hour=13/... where as bottom will result in Year=2020/Month=1/Day=19/Hour=13/PartitionId=0/... NOTE: PartitionIdLevels.Top is better suited if data in partition is normally looked at independently of each other while PartitionIdLevels.Bottom is better if multiple partitions are used in the same query.

In [None]:
namespace_name='sarikaTestworkspace1'
event_hub_name='Hub1'
datalake_name='LeeTestAdlsGen2v1'
datetime_partition_level=epm.DateTimeLevels.Day
partition_id_level=epm.PartitionIdLevels.Top
epm.create_update_event_hub_raw_data_capture(namespace_name,event_hub_name,datalake_name,datetime_partition_level,partition_id_level)

### See the table

In [None]:
next(table for table in sm.list_tables(datalake_name) if table['TableName']=='NvEventHub%sRawData'%event_hub_name)