# Tutorial for importing data into Python

To import a data set, it has to be selected and set up in the Data Intelligence Hub environment. You can choose between the upload of a data set from the Data Intelligence Hub or one of your own data sets. One or more data sets can be added to a project and used in Python.

For this tutorial data sets from “Luftmesswerte für Stickoxide - Zeitreihen der städtischen Messstellen” were used. They can be found here https://portal.dih.telekom.net/protected/marketplace/offer/21462ed0-5819-482c-bfcf-45f39a4adec6. You can add the data to the Workspace in which you want to use them, directly at the time of your order. To use your own data, you have to upload them first to Control Center/Data or Control Center/Storage (depending on your administrative rights). Next, start a project in your Workspace and allocate the data to this project. Once your data have been allocated to a project, they can be used with the following script.


In [1]:
#For further details, tutorials and documentations, please consult our Github account:  https://github.com/tsi-dih 
from dih.storage import read_storage_account

paths = ['NO2_values_duesseldorf_2015.csv', 'NO2_values_duesseldorf_2016.csv']

data, data1 = [read_storage_account(
    path=path,                             # Name of uploaded file 1 (will be filled in for the first dataset)
    project_id='962',                      # Identification for the project (also place where to put the files in the storage)
    workspace_id='427',                    # Identification for the workspace
    application_key='826a941c-25ac-4390-bb96-c3c172f76f35',  # Generated access key
    api_gateway_endpoint='https://api.dih.telekom.net/api/v1/storage/fs/readfile',
    account="padlsfreemium",
    username="example@yourprovider.com"    # Your Username (E-mail address)
) for path in paths]

print(data[:1000])
print(data1[:1000])

ModuleNotFoundError: No module named 'dih.storage'

### Using the data

The outcome of the import function is a simple string, that now can be transformed into a pandas dataframe for example.
This is shown in the following lines of code.

In [10]:
import pandas as pd
data_df = pd.read_csv(pd.compat.StringIO(data), sep = ';') # function to read the data into a dataframe, seperator might needs a change depending on the data

data_df.head(5)

Unnamed: 0,DATUM,Uhrzeit,Doro,Brinck
0,01.01.2015,1:00,986,364
1,01.01.2015,2:00,522,335
2,01.01.2015,3:00,466,263
3,01.01.2015,4:00,485,276
4,01.01.2015,5:00,486,316


In [11]:
data_df1 = pd.read_csv(pd.compat.StringIO(data1), sep = ';') # function to read the data into a dataframe, seperator might needs a change depending on the data

data_df1.head(5)

Unnamed: 0,DATUM,Uhrzeit,Doro,Brinck
0,01.01.2016,1:00,447,450
1,01.01.2016,2:00,594,430
2,01.01.2016,3:00,556,406
3,01.01.2016,4:00,509,379
4,01.01.2016,5:00,498,330


In [12]:
data_total = pd.concat([data_df, data_df1])

In [15]:
data_total.head(5)

Unnamed: 0,DATUM,Uhrzeit,Doro,Brinck
0,01.01.2015,1:00,986,364
1,01.01.2015,2:00,522,335
2,01.01.2015,3:00,466,263
3,01.01.2015,4:00,485,276
4,01.01.2015,5:00,486,316


In [16]:
data_total.tail(5)

Unnamed: 0,DATUM,Uhrzeit,Doro,Brinck
8779,31.12.2016,20:00,508,468
8780,31.12.2016,21:00,518,426
8781,31.12.2016,22:00,468,418
8782,31.12.2016,23:00,460,439
8783,31.12.2016,0:00,417,416
