### Creating simple artificial intelligence in Python

In this tutorial, we will be classifying what type of Iris a set of values are. Is it setosa, versicolor, or virginica? Using sklearn, we will be doing this automatically, and hopefully be able to predict a certain species of flower *even if we don't know specifically what the species is*. 

#### Step 1: Data Preparation
First, we need to load our data from a file. We need to separate them into two arrays: 

- Training Array (often denoted as "X")
    - Your training array will contain a set of values. In our example, we will be passing the sepal length and width, and petal length and width.
- Target Array (often denoted as "y")
    - Your target array will contain the *answers* to the given training array.

For simplicity, I will be using the following English vs. German example:

|Training Array|Target Array|
|-----|-----|
|ANYONE|English|
|UPROAR|English|
|YELLOW|English|
|BÄRGET|German|
|ZURUFE|German|
|WÜSTEM|German|

Unlike the above example, we are going to be using Iris data instead. First, let's create our training array. We shall be using the traditional way of loading data in Python, in contrast to `pandas`, for reference and simplicity.

##### Traditionally loading files in Python
You can open files natively, without the need of libraries, using the `open()` function. `open()` returns a file-like object. In this case, it returns a TextIOWrapper object.

In [4]:
my_file = open("iris-setosa.csv", 'r')
print(type(my_file))

<class '_io.TextIOWrapper'>


If you want to read a TextIOWrapper object, you can use `read()` or `readlines()`.

Here, we will use `readlines()`. Readlines returns a list of strings, with each element being the corresponding line within the text file.

In [5]:
for text_line in my_file:
    print(text_line)

sepal_length,sepal_width,petal_length,petal_width,species

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

5.0,3.6,1.4,0.2,Iris-setosa

5.4,3.9,1.7,0.4,Iris-setosa

4.6,3.4,1.4,0.3,Iris-setosa

5.0,3.4,1.5,0.2,Iris-setosa

4.4,2.9,1.4,0.2,Iris-setosa

4.9,3.1,1.5,0.1,Iris-setosa

5.4,3.7,1.5,0.2,Iris-setosa

4.8,3.4,1.6,0.2,Iris-setosa

4.8,3.0,1.4,0.1,Iris-setosa

4.3,3.0,1.1,0.1,Iris-setosa

5.8,4.0,1.2,0.2,Iris-setosa

5.7,4.4,1.5,0.4,Iris-setosa

5.4,3.9,1.3,0.4,Iris-setosa

5.1,3.5,1.4,0.3,Iris-setosa

5.7,3.8,1.7,0.3,Iris-setosa

5.1,3.8,1.5,0.3,Iris-setosa

5.4,3.4,1.7,0.2,Iris-setosa

5.1,3.7,1.5,0.4,Iris-setosa

4.6,3.6,1.0,0.2,Iris-setosa

5.1,3.3,1.7,0.5,Iris-setosa

4.8,3.4,1.9,0.2,Iris-setosa

5.0,3.0,1.6,0.2,Iris-setosa

5.0,3.4,1.6,0.4,Iris-setosa

5.2,3.5,1.5,0.2,Iris-setosa

5.2,3.4,1.4,0.2,Iris-setosa

4.7,3.2,1.6,0.2,Iris-setosa

4.8,3.1,1.6,0.2,Iris-setosa

5.4,3.4,1.5,0.4,Iris-setosa

5.2,4.1,1.5,0

Now, using what we learnt about glob from yesterday, we can go ahead and load all of the files in a loop. Instead of using pandas, however, we will be using the same method that we used above.

#### Activity: Read multiple files using glob, and append all values into a list

So, using the notebook that we used yesterday, go ahead and create a cell that will load all `*.csv`s into an array and append each line into a *singular list*

The most succinct way to do this, without being to obtuse, is this way:

In [8]:
import glob

training = []
target = []
label_dict = {}

for item in glob.glob("*.csv"):
    lines = open(item, 'r').readlines()
    del lines[0]
    for line in lines:
        training_slice = line.split(",")[:-1]
        slice_to_float = [float(i) for i in training_slice]
        training.append(slice_to_float)
        label = line.split(",")[-1].replace("\n", '')
        try:
            target.append(label_dict[label])
        except KeyError:
            label_dict[label] = len(label_dict)
            target.append(label_dict[label])


Perfect! Now we have both of our training and target arrays prepared. Now, let's do some simple machine learning!

First, we need to import certain scripts/functions from the `sklearn` library, which stands for SciKit Learn. We can do that using the `from ... import ...` statement.

The above allows you to import only *specific* things within the library, instead of the entire thing. This can help in terms of memory management and performance of your script.

In [9]:
from sklearn.neural_network import MLPClassifier

Using `sklearn` is incredibly easy, and creating a neural network can be done in two lines.

In [10]:
# Create the MLPClassifier
mlp_nn = MLPClassifier()

# Fit (or train) the MLPClassifier with the training
# and corresponding target data.
mlp_nn.fit(training, target)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [11]:
mlp_nn.predict([training[145]])

array([3])