# Let's begin to combine everything we have learned so far to create a diagnostic pipeline

In building this pipeline, our aim is to:
1. Load in all the data
2. Inspect the data by plotting it
3. Approve if the data looks correct

Before we begin, we will need a few more tools

## The package "glob"

This package matches the path given to the function with existing filenames and returns those that match.

This package has two functions inside which essentially do the same thing: take an input string from user and return all files with pathnames that match the given string literal.

In [1]:
import glob

The syntax is straightforward: `glob.glob(_pathname_)`

For this pipeline, we want all the files stored inside the folder "data_1" which is in our current working directory.

In [2]:
files = glob.glob("./data_1/*")
print(files)

['./data_1/1756.npz', './data_1/0737.npz', './data_1/1102.npz', './data_1/1534.npz']


The `*` character is a special character called a wildcard which matches with everything in a given directory. This syntax is part of "regular expressions" (https://docs.python.org/3.4/library/re.html)

### _Problem_

Use glob to inspect the files in the directory `data_2` and count the number of files in that directory.

***

The path to all files that hold our data is stored in the array *files*. We can open the data now with numpy load.

In [3]:
import numpy as np

In [4]:
example = np.load(files[0])

The `.npz` files we are using contain three numpy arrays, zipped together. We will need to separate them, which can be done as follows:

In [5]:
x_array = example['arr_0']
y_array = example['arr_1']
y_err = example['arr_2']

In [6]:
x_array

array([  10,  110,  210,  310,  410,  510,  610,  710,  810,  910, 1010,
       1110, 1210, 1310, 1410, 1510, 1610, 1710, 1810, 1910, 2010, 2110,
       2210, 2310, 2410, 2510, 2610, 2710, 2810, 2910, 3010, 3110, 3210])

In [10]:
len(y_array), len(y_err)

(33, 33)

### _Problem_

Open any *two* files in `data_2` and plot the x and y values

## Accepting input from user

The final tool we'll need is the ability to take input/feedback from the user. This can be done by using `input` function in python

In [14]:
name = input("What is your name: ")
age = input("What is your age: ")

print("Wow, " + name + ", you're already " + age + ' years old!')

What is your name: 1
What is your age: Nihan
Wow, 1, you're already Nihan years old!


**Note:** The type of variable returned by `input` is always `str`

In [15]:
type(name)

str

### _Problem_

Take the location of the data from the user and pass it to glob.glob and print the number of files and the names of the files themselves.