# University of Aberdeen

## Applied AI (CS5079)

### Lecture (Day 1) - Introduction to Applied AI

---

In the lecture, we introduced the several challenges/techniques of Artificial intelligence and the essential packages used in the course.



## Numpy

NumPy is a Python library for representing large, multi-dimensional arrays along with high-level mathematical functions to operate on these arrays.

### Importing and creating an array

In [1]:
#Importing the package
import numpy as np

#Creating an Numpy Array
a = np.array([[9.3, 8.1, 7.2, 3.2, 4.6,5.1], 
[6.0, 5.0, 4.0, 1.3, 2.7, 9.8]])

print(a)

[[9.3 8.1 7.2 3.2 4.6 5.1]
 [6.  5.  4.  1.3 2.7 9.8]]


In [2]:
#Getting the shape of the array
a.shape 

(2, 6)

In [3]:
#Getting the data type
a.dtype

dtype('float64')

### Basic Indexing

In [4]:
#Getting a specific element (last element of second row)
a[1,-1]

9.8

In [5]:
#Getting a subset of the array
a[:,1:6:2]

array([[8.1, 3.2, 5.1],
       [5. , 1.3, 9.8]])

In [6]:
# Reshaping
a[1,:].reshape(2,3)

array([[6. , 5. , 4. ],
       [1.3, 2.7, 9.8]])

### Boolean Indexing

In [7]:
persons = np.array(["john", "alice", "peter"])
person_data = np.random.randn(3,2)
print(person_data)

[[ 0.34206189 -0.87175256]
 [-0.69493669 -0.62761668]
 [-1.87086112  0.76290692]]


In [8]:
person_data[persons == "alice"]

array([[-0.69493669, -0.62761668]])

In [9]:
# Replace values
person_data[person_data > 0] = 0
print(person_data)

[[ 0.         -0.87175256]
 [-0.69493669 -0.62761668]
 [-1.87086112  0.        ]]


### Generating arrays

In [10]:
#All same number matrix (also np.ones or np.zeros)
np.full((2,3),0)

array([[0, 0, 0],
       [0, 0, 0]])

In [11]:
#Creating the Identity matrix of size n
b = np.identity(3)
print(b)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [12]:
#Creating a numpy array with a sequence of numbers from 0 to n
c = np.arange(6).reshape(2,3)
print(c)

[[0 1 2]
 [3 4 5]]


### Linear Algebra

In [13]:
#The Dot product
d = c.dot(person_data)
print(d)

[[ -4.43665894  -0.62761668]
 [-12.1340524   -5.12572441]]


In [14]:
#The transpose of a matrix
np.transpose(c)

array([[0, 3],
       [1, 4],
       [2, 5]])

In [15]:
#The SVD factorisation
u, s, vh = np.linalg.svd(d)

#The matrix d can be recomposed using the following code
print(u.dot(np.diag(s).dot(vh)))

[[ -4.43665894  -0.62761668]
 [-12.1340524   -5.12572441]]


In [16]:
#Solving equations
# 7x+5y−3z=16, 3x−5y+2z=−8 and 5x+3y=0
 
a = np.array([[7,5,-3], [3, -5, 2], [5, 3, 0]])
b = np.array([16,-8, 0])
np.linalg.solve(a,b)

array([ 0.25531915, -0.42553191, -5.44680851])

## Pandas

Pandas is a Python library for data manipulation and analysis. It offers data structures (Series & Dataframes) and operations for manipulating them.

### Data Retrieval

In [17]:
#Importing the package
import pandas as pd

#List of dictionaries to Dataframe
d = [{'city':'Delhi',"data":1000}, {'city':'Bangalore',"data":2000}]
pd.DataFrame(d)

Unnamed: 0,city,data
0,Delhi,1000
1,Bangalore,2000


In [18]:
#Numpy Array to Dataframe
c = pd.DataFrame(np.random.randn(2,3), columns=['A','B','C'])
print(c)

          A         B         C
0 -1.319606 -0.170999 -0.584205
1 -2.080703 -0.037099  0.272024


In [19]:
#CSV to Dataframe
cd = pd.read_csv("Resources/worldcities.csv")
print(cd)

              city   city_ascii      lat       lng        country iso2 iso3  \
0            Tokyo        Tokyo  35.6850  139.7514          Japan   JP  JPN   
1         New York     New York  40.6943  -73.9249  United States   US  USA   
2      Mexico City  Mexico City  19.4424  -99.1310         Mexico   MX  MEX   
3           Mumbai       Mumbai  19.0170   72.8570          India   IN  IND   
4        São Paulo    Sao Paulo -23.5587  -46.6250         Brazil   BR  BRA   
...            ...          ...      ...       ...            ...  ...  ...   
15488  Timmiarmiut  Timmiarmiut  62.5333  -42.2167      Greenland   GL  GRL   
15489  Cheremoshna  Cheremoshna  51.3894   30.0989        Ukraine   UA  UKR   
15490    Ambarchik    Ambarchik  69.6510  162.3336         Russia   RU  RUS   
15491      Nordvik      Nordvik  74.0165  111.5100         Russia   RU  RUS   
15492      Ennadai      Ennadai  61.1333 -100.8833         Canada   CA  CAN   

               admin_name  capital  population     

### Basic Access

In [20]:
#Head and tail
cd.head(2)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016


In [21]:
#iloc
cd.iloc[:2,:4]

Unnamed: 0,city,city_ascii,lat,lng
0,Tokyo,Tokyo,35.685,139.7514
1,New York,New York,40.6943,-73.9249


### Boolean Access

In [22]:
cd[cd.population > 20000000][["city", "population"]]

Unnamed: 0,city,population
0,Tokyo,35676000.0


In [23]:
#The where method will keep the shape of the DataFrame
e = cd[cd.population > 10000000].where(cd.population > 20000000).head(2)
print(e)

    city city_ascii     lat       lng country iso2 iso3 admin_name  capital  \
0  Tokyo      Tokyo  35.685  139.7514   Japan   JP  JPN      Tōkyō  primary   
1    NaN        NaN     NaN       NaN     NaN  NaN  NaN        NaN      NaN   

   population            id  
0  35676000.0  1.392686e+09  
1         NaN           NaN  


In [24]:
#Replace missing values
f = e.fillna("No value")
print(f)

       city city_ascii       lat       lng   country      iso2      iso3  \
0     Tokyo      Tokyo    35.685   139.751     Japan        JP       JPN   
1  No value   No value  No value  No value  No value  No value  No value   

  admin_name   capital  population           id  
0      Tōkyō   primary  3.5676e+07  1.39269e+09  
1   No value  No value    No value     No value  


### Descriptive Statistics Functions

Pandas gives access to several statistics functions such as `mean, sum, count, median, quantile, describe`, etc.

In [25]:
#Sum by columns
cd[["population", "lat"]].sum()

population    2.502672e+09
lat           4.591090e+05
dtype: float64

In [26]:
#Mean by rows
cd[cd.population > 19000000][["lng", "lat"]].mean(axis=1)

0    87.7182
1   -16.6153
2   -39.8443
dtype: float64

In [27]:
#The describe method will provide useful statistics measures for your DataFrames
cd[["lat", "lng", "population"]].describe()

Unnamed: 0,lat,lng,population
count,15493.0,15493.0,13808.0
mean,29.633315,-29.834189,181248.0
std,22.414727,76.340457,794798.9
min,-54.9333,-179.59,0.0
25%,22.305,-86.3242,9167.5
50%,37.7562,-71.9167,23496.5
75%,42.4442,25.5821,90306.25
max,82.4833,179.3833,35676000.0


### Concatenation

In [28]:
#Adding a new row
pd.concat([cd.sample(1),cd.head(1)])

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
10202,Ocean Springs,Ocean Springs,30.4082,-88.7861,United States,US,USA,Mississippi,,17682.0,1840015017
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764


In [29]:
#Adding a new column
pd.concat([f[["city","lat"]],pd.DataFrame({'NC' : [99]}, index = [999999])], axis = 1)

Unnamed: 0,city,lat,NC
0,Tokyo,35.685,
1,No value,No value,
999999,,,99.0


## Scikit-learn

Scikit-learn is a Python machine learning library that features classification, regression and clustering algorithms.

### A simple example

In [30]:
# Scikit-learn provides toy datasets, models and techniques for model selection
from sklearn import datasets
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV

#This dataset is already standardized (442 entries)
diabetes = datasets.load_diabetes()
feature_names=['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3','s4', 's5', 's6']

#The target attribute is the disease progression one year after baseline
y = diabetes.target
#10 numeric baseline variables 
X = diabetes.data

#We split the dataset into training and test datasets.
X_train = X[:300]
y_train = y[:300]
X_test = X[300:]
y_test = y[300:]

#Here we use the Lasso model and try different values for alpha
lasso = Lasso(random_state=0)
alphas = np.logspace(-4, -0.5, 30)

#We initialise the cross validation grid search and fit the training data
estimator = GridSearchCV(lasso, dict(alpha=alphas))
estimator.fit(X_train,y_train)

GridSearchCV(estimator=Lasso(random_state=0),
             param_grid={'alpha': array([1.00000000e-04, 1.32035178e-04, 1.74332882e-04, 2.30180731e-04,
       3.03919538e-04, 4.01280703e-04, 5.29831691e-04, 6.99564216e-04,
       9.23670857e-04, 1.21957046e-03, 1.61026203e-03, 2.12611233e-03,
       2.80721620e-03, 3.70651291e-03, 4.89390092e-03, 6.46167079e-03,
       8.53167852e-03, 1.12648169e-02, 1.48735211e-02, 1.96382800e-02,
       2.59294380e-02, 3.42359796e-02, 4.52035366e-02, 5.96845700e-02,
       7.88046282e-02, 1.04049831e-01, 1.37382380e-01, 1.81393069e-01,
       2.39502662e-01, 3.16227766e-01])})

In [31]:
#We can obtain the best score
estimator.best_score_

0.46809897288058816

In [32]:
#We can get also get the best estimator directly
estimator.best_estimator_

Lasso(alpha=0.05968456995122311, random_state=0)

In [33]:
#Finally, we use the best model to predict on the test dataset
estimator.predict(X_test)

array([223.08988292, 123.41342452, 204.53641304, 232.05395609,
       116.23429477, 126.22454464, 128.72347463, 148.15215352,
        88.09425404, 147.96569816, 201.0957727 , 176.29547129,
       122.64404762, 212.55070667, 171.7115962 , 116.61146604,
       202.17824782, 168.34580243, 164.10850349, 187.87535435,
       187.71610711, 278.61918586, 290.9539084 , 233.46552614,
       204.07225409, 226.28950633, 156.22670401, 223.76124122,
       189.85606422, 105.10776928, 168.84703905, 111.39786661,
       285.29363378, 177.03775515,  80.71930061,  86.19992756,
       249.73947158, 163.33405043, 120.89636522, 154.99248179,
       160.89653403, 181.22700837, 162.93296937, 155.06630485,
       141.99377691, 127.54907378, 183.25474062, 106.98288797,
       129.16590876,  88.58871585, 253.14504794,  86.9761671 ,
        61.36172839, 187.69368844, 205.8046072 , 129.92058339,
        92.75697308, 201.60402698,  55.72905539, 169.99251632,
       192.04308437, 123.01671763, 231.31260277, 157.53

## Keras & TensorFlow

Keras is an Python library that provides an easy interface for artificial neural networks. 
The default backend for Keras is the TensorFlow library although it supported multiple backends (Microsoft Cognitive Toolkit, R, Theano, and PlaidML) in the past (before version 2.3)

### A simple example

In [34]:
#We introduce the model and the layer types from Keras
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Dropout

#We introduce the example's dataset from Scikit-learn's dataset module
from sklearn.datasets import load_breast_cancer 
cancer = load_breast_cancer()

#We split the dataset into a training and test dataset
X_train = cancer.data[:340]
y_train = cancer.target[:340]
X_test = cancer.data[340:] 
y_test = cancer.target[340:]

#We create the model and add several layers
model = Sequential()
model.add(Dense(15, input_dim=30, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

#We compile the model by specifying the loss function and the optimizer
model.compile(loss='binary_crossentropy', optimizer='rmsprop',metrics=['accuracy'])

#We fit the model on the training dataset
model.fit(X_train, y_train, epochs=20, batch_size=50)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fefa81735c0>

In [37]:
#We use classification_report to evaluate the model
from sklearn.metrics import classification_report

print(classification_report(y_true=y_test,y_pred=model.predict_classes(X_test)))


              precision    recall  f1-score   support

           0       0.91      0.55      0.68        55
           1       0.87      0.98      0.92       174

    accuracy                           0.88       229
   macro avg       0.89      0.76      0.80       229
weighted avg       0.88      0.88      0.87       229

