# Import 

Data are contained in different `.csv` files according to the type fixture. 

**files** is a `list` that contains all the complete path objects for each `.csv`.

`os.path.splitext()` split the path name into a pair root and extension.

`os.path.join()` joins strings according to the path object syntax.

`os.listdir()` retrives all the file names contained in a folder by specifying its path as argument.

In the second `for loop`:

1. For each path object with `.split()` I extract the string that contains the name of the fixture and I save it iteratively in the variable **name**.


2. For each path object with `pd.read_csv` I convert the `.csv` files to `dataframes` and I save those iteratively in the variable **df**.

    - I implement also `conditional statements` since the `.csv` files, according to each fixture, have different headers and separators (Bidet, kitchenfaucet: ',' separators and ['Time', 'Flow'] headers; Shower, Washbasin: 'blank space' separators and None headers)
    

3. For each fixture `dataframe` I add the correspondent fixture as column.


4. I append each generated dataframe to a an empty `list` **df_list**.


5. I stack all the `dataframes` with `pd.concat`.

I set the parameter `ignore_index=True` since the concatenation axis does not have meaningful indexing information for our analysis (Ignore and reset the previous indexes).

In [1]:
%run setup.ipynb # Python config

In [2]:
# For take only '.csv' files inside the folder

files = []
extensions = ('.csv') # accepted extension

for file in os.listdir(DATA_PATH):
    if os.path.splitext(file)[-1] in extensions:
        files.append(os.path.join(DATA_PATH, file))
        
files

['../data\\feed_Bidet.MYD.csv',
 '../data\\feed_Kitchenfaucet.MYD.csv',
 '../data\\feed_Shower.MYD.csv',
 '../data\\feed_Washbasin.MYD.csv']

In [3]:
df_list = []
for f in files:
    
    name = f.split('_')[1]
    name = name.split('.')[0]
    
    if name == 'Shower' or name == 'Washbasin':
        df = pd.read_csv(f, sep = ' ', header = None)
        df.columns = ['Time', 'Flow']
    else:
        df = pd.read_csv(f, sep = ',')
    
    df['Fixture'] = name
    
    df_list.append(df)
      
df = pd.concat(df_list, ignore_index=True, axis = 0) # axis = 0 for stacking (axis = 1 for binding)
df.head()

Unnamed: 0,Time,Flow,Fixture
0,1550334979,92.0,Bidet
1,1550334980,6.0,Bidet
2,1550335912,38.0,Bidet
3,1550335914,58.0,Bidet
4,1550335915,56.0,Bidet


The resulting dataframe has a `timestamps` column **Time**, thus i decide to convert it in `datetime` and set it as index.

In [4]:
df['Time'] = pd.to_datetime(df['Time'], unit='s') # Datetime conversion
df.set_index('Time', inplace = True) # Datetime as index
df.head()

Unnamed: 0_level_0,Flow,Fixture
Time,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-02-16 16:36:19,92.0,Bidet
2019-02-16 16:36:20,6.0,Bidet
2019-02-16 16:51:52,38.0,Bidet
2019-02-16 16:51:54,58.0,Bidet
2019-02-16 16:51:55,56.0,Bidet
