# 2. Uploading multiple CSV files using SDK
This tutorial will show you how to upload many CSV files using **intdash SDK for Python** (hereafter called intdash SDK).
In this case, we will create one measurement for each CSV file. For the CSV file format, refer to the following "Preparation".

## 2.0 Preparation
Before starting this scenario, prepare the following.

- Edge for data upload
- Multiple CSVs to upload


### Data to be used
In this tutorial, the following data needs to be ready on the server side.
In this section, processing is performed with the following data names.

| Data item | Data name that appears in this scenario |
|:---|:---|
| Edge to register time series data | edge1|
| CSV that stores time series data| sampleX.csv (X = arbitrary number) |

#### Details of CSV data to upload

The CSV uploaded must meet the following conditions.
* The first line must store the name of each column as a character string
* The first column must contain the time stamp
* Data must be stored in the second and subsequent columns.

<img src="https://github.com/aptpod/aws-marketplace-tutorials/blob/master/img/img1.png?raw=true\">

The column names given in the first line are used as the names of the data (`data_id` ).

### Place CSV files to upload
CSV files that store the time series data are placed in the `csv` directory under the same directory as this Jupyter Notebook. In this tutorial, the sample CSV files are used.

### Import packages and create a client
For `url` given to `intdash.Client`, specify the environment of the intdash server. For `username` and `password`, specify the auth information issued for the edge you use.

In [59]:
import pandas as pd
import math

import intdash
from intdash import timeutils

# Create client
client = intdash.Client(
    url = "https://example.intdash.jp",
    username = "edge1",
    password="password_here"
)

The preparations are complete.

## 2.1 Get the edge used to upload data
First, get the edge to be used to upload CSV files.

In [20]:
edges = client.edges.list(name='edge1')
edge1 = edges[0]

In [21]:
edge1.name

'edge1'

## 2.2 Load CSV files
Load each CSV file as `pandas.DataFrame`. Here, the CSV files stored in the `csv/` directory are used.

In [22]:
import glob 

csv_files = glob.glob('./csv/*')

dfs = []

for csv_path in csv_files:
    df = pd.read_csv(csv_path, index_col=0).groupby("time").last()
    dfs.append(df)

## 2.3 Create a measurement for each CSV file and register the data
From here, we will focus on the processing for each CSV file. By repeating these processes, you can upload multiple CSV files.  
\* If you want to process multiple files immediately, skip this section.

In [23]:
# Pick first DataFrame.
df = dfs.pop(0)

Create a measurement. Use the first timestamp of the data as the measurement start time.

In [24]:
new_measurements = client.measurements.create(
    name='csv_data',  # Define name of measurement.
    basetime=timeutils.str2timestamp(df.index[0]), #  Use timestamp of the first datapoint as the basetime of the measurement.
    edge_uuid=edge1.uuid
)

After creating the measurement, convert the DataFrame to `DataPoint` format.

In [25]:
datapoints = []

for data_id, values in df.to_dict().items():
    for time, value in values.items():
        
        if math.isnan(value) or value is '':
            continue
            
        datapoints.append(
            intdash.DataPoint(
                elapsed_time= timeutils.str2timestamp(time) - new_measurements.basetime, # Time elapsed from the start of measurement.
                data_type=intdash.DataType.float,
                channel=1, # fixed at 1.
                data_payload=intdash.data.Float(data_id=data_id, value=value).to_payload()
            )
        )

When the conversion is completed, associate the time series data with the measurement created earlier.

In [26]:
client.data_points.store(
    measurement_uuid=new_measurements.uuid,
    data_points=datapoints
)

This completes the upload of one CSV data.

## 2.4 Register multiple CSV files at once
In the following, the processes in the previous section are combined into one. The same process is repeated for multiple DataFrames.

In [27]:
for df in dfs:
    # Create a measurement.
    new_measurements = client.measurements.create(
        name='csv_data',  # Define name of measurement.
        basetime=timeutils.str2timestamp(df.index[0]), #  Use timestamp of the first datapoint as the basetime of the measurement.
        edge_uuid=edge1.uuid
    )
    
    # Convert DataFrames to DataPoints.
    datapoints = []

    for data_id, values in df.to_dict().items():
        for time, value in values.items():
            
            if math.isnan(value) or value is '':
                continue

            datapoints.append(
                intdash.DataPoint(
                    elapsed_time= timeutils.str2timestamp(time) - new_measurements.basetime, # Time elapsed from the start of measurement.
                    data_type=intdash.DataType.float,
                    channel=1, # fixed at 1.
                    data_payload=intdash.data.Float(data_id=data_id, value=value).to_payload()
                )
            )
            
    # Store datapoints.
    client.data_points.store(
        measurement_uuid=new_measurements.uuid,
        data_points=datapoints
    )

By executing the above, you can upload multiple CSV files.
Confirm that the new measurements are displayed in [Stored Data] of Visual M2M Data Visualizer.
<img src="https://github.com/aptpod/aws-marketplace-tutorials/blob/master/img/img3.png?raw=true\">