# Artificial Intelligence CA1
## N00171313
## Craig Redmond
### 31/10/21

***

## 1. Introduction 
Artificial intelligence (AI) is the intelligence demonstrated by machines where they simulate human behaviour or thinking, and the machines can be trained to solve problems. AI has become very popular in today’s world. AI machines are capable of learning and performing tasks similar to humans and they have the potential to have a huge impact on our quality of life. 

## 2. Applications of AI
There are numerous applications of AI today. Some of the areas include AI in Robotics, healthcare, gaming, transportation, E-Commerce, social media and marketing. Each area of application can be split into sub-sections. For example, in the internet and e-commerce AI is used to create search engines, recommendation systems, automatic translation. In transportation there are the automotive, aviation and maritime sections of AI in transport, and so on. The applications of AI that I will discuss the strengths and limitations on is AI in healthcare and AI in transport. 

### 2.1 AI in Transport
#### Strengths and Limitations
AI is a technology that can be used for predicting and detecting traffic conditions and accidents. It is used to resolve control and optimization problems. For example, autonomous trucks have been introduced all over the world in recent years. These will save costs, lower emissions, and improve road safety compared to trucks driven by human drivers. The number of accidents at night involving truck drivers is an issue and can be improved with the use of smart unmanned vehicles. This would reduce the finance costs for the drivers and the labour costs will decrease over time. The same technology is used in self-driving cars, which can allow a car to continue driving safely if the driver falls asleep at the wheel – reducing the financial costs for the drivers. AI technology can now predict and improve the outcomes of data collected by humans. This will enable transportation operators to improve their services and operations (Soffar, 2019).

AI could have a significant impact on the transport industry. Developments of AI in transportation means that autonomous trucks, cars, ships aircraft and trains might become unmanned in the future. This would result in a major concern in the job flow for taxi drivers, truck drivers and other members of the industry. 

Implementation in third world countries and undeveloped regions presents many challenges. Their infrastructure is not stable enough of providing maintenance and repairs. Therefore, it would be a long time before AI can become a reality there (Anthony, 2017).

### 2.1 AI in Healthcare
#### Strengths and Limitations
AI has revolutionized the healthcare industry in recent years. it has made many innovations that seemed to be impossible years ago. According to research company Gartner in 2019, they assume that 75% of healthcare organisations will have invested in AI by 2021 to improve overall performance (Omale, 2019). 

![title](img/fig1.jpg)
_Figure 1. Source - Accenture_

AI in healthcare has plentiful benefits, however, also comes with some risks and challenges. For example, AI system errors could put patients at risk of injuries. The patient’s data for AI reference could put the patient at risk for privacy invasion. 

One of the main benefits of this sector of AI is the accessibility it provides. In many developing countries, they do not have the standard of the world’s technological progress and their healthcare does not hold a high standard. People that are living in some of these countries can be at risk of dying since it is practically impossible to get qualified for help in time. AI innovations can be applied here to create an improved healthcare ecosystem. This sort of infrastructure can help the patients comprehend their symptoms and receive necessary treatment (Ilchenko, 2020).  

AI in healthcare can increase speed and reduce costs. Thanks to AI algorithms, healthcare processes are faster and at a fraction of the original cost. From the patient’s examination and diagnosis, AI has improved the speed and efficiency of this process. For example, AI can identify the biomarkers that suggest disease in our bodies. The algorithms have significantly reduced the manual work in specifying the biomarkers, which ultimately has allowed us to save more lives by acting faster. 

Despite the progress with AI in healthcare, there are some notable issues and challenges that are yet to be sorted. 

One potential risk of AI in healthcare is the errors and injuries. AI systems are susceptible to errors, which could lead to patient injury or other problems. An example of this would be the AI system may recommend an incorrect medication to the patient. If eventually AI could perform radiological scans on patients, it could miss a tumour for example and lead to a defective diagnosis (Patel, 2020).

Another potential problem of AI in this area is the rise in unemployment that could occur among healthcare workers. Many activities that are carried out traditionally by humans could be done by machines. Robots and chatbots can provide mental help, analyse a patient’s condition and predict future problems for patients, as a consequence of this, many healthcare workers could lose their jobs (Ilchenko, 2020). 

## 3. Recommender Systems

Recommender systems are another section of AI that is used in a variety of areas, commonly used in in the form of product recommenders for online stores like Amazon, a playlist recommender on a music streaming service such as Spotify or content-based recommendation systems for social media platforms and content recommenders like on Netflix and TikTok. The systems analyse data from the user such as their history and behaviour. The systems take this data and can then generate recommendations to the user. There are a few types of filtering in recommender systems. Two examples are content-based and collaborative systems. 

![title](img/fig2.png)
_Figure 2. Recommendation Phases. Source - https://www.smarthint.co/wp-content/uploads/2019/01/collecting-data-1024x834.png_

### 3.1 Content-Based Approach 
One type of recommender systems is a content-based approach. This type of filtering is based on a user’s interactions and preference. The algorithms use the user’s history and interactions like the pages visited, time spent in specific categories on a site, the type of items that the user clicked on etc. the software is developed based on the description of the products that the user likes and then the recommendations are made based on a comparison of the user profiles and product catalogues (Schiavini, 2019). A typical example of this type of approach would be recommendations like ‘products similar to this/products you might like’. Overall, these recommendations are limited to the level of categorisation available and the amount of data the user has provided (Chua, 2019).

### 3.2 Collaborative Approach
A collaborative approach is another commonly used technique in recommender systems. With this type of approach, the algorithm collects information from the interactions of many other users to derive suggestions for you. The essence of collaborative filtering is: two users who have liked the same item before will like the one in the future (Schiavini, 2019). A ‘Next Buy’ recommendation is a typical usage of this method. This type of filtering usually has a higher accuracy than content-based, however can also produce some increased variability and less interpretable results. If there is a lack of previous collected data, this type of approach can be weak (Chua, 2019). 

![title](img/fig3.jpg)
_Figure 2. Collaborative and Content filtering. Source - https://medium.com/voice-tech-podcast/a-simple-way-to-explain-the-recommendation-engine-in-ai-d1a609f59d97


***

***

## Content-Based Recommender System: Games recommendation system
### Using the content based approach, I will create a system that recommends similar games to the user based on a name that they have entered

***

### First, we need to import the libraries required to run the system

In [1]:
# Import Required libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

### Now, i will load the data into memory and store it in a variable

In [2]:
# Load the data into memory
games = pd.read_csv('games.csv', encoding='unicode_escape', error_bad_lines = False)

In [3]:
#Show the data
games.head(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


### Next, i will create a variable to store data from the important columns that i will use to compare
### I am going to compare games based on their name, platform, genre and publisher

In [4]:
#Create list of important columns to keep
columns = ['Name', 'Platform', 'Genre', 'Publisher']

### Now i will create a function which will be used to combine the data from the important columns into one column called features
### This function loops over each row of the games data and appends the values from each of the specified columns into the features array

In [5]:
#Create func to combine important cols
def concat_features(data): 
    features = []
    for i in range(0, games.shape[0]):
        features.append(games['Name'][i]+ ' '+games['Platform'][i]+ ' '+ games['Genre'][i]+ ' '+str(games['Publisher'][i]))
    return features

### I am now creating the features column in the table by calling the concat_features function and passing in the games data
### As you can see in the table below, the name, platform, genre and publisher data has been combined into the column

In [6]:
#Column to store combined strings
games['features']=concat_features(games)

games

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,features
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74,Wii Sports Wii Sports Nintendo
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,Super Mario Bros. NES Platform Nintendo
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82,Mario Kart Wii Wii Racing Nintendo
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00,Wii Sports Resort Wii Sports Nintendo
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37,Pokemon Red/Pokemon Blue GB Role-Playing Nintendo
...,...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01,Woody Woodpecker in Crazy Castle 5 GBA Platfor...
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01,Men in Black II: Alien Escape GC Shooter Infog...
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01,SCORE International Baja 1000: The Official Ga...
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01,Know How 2 DS Puzzle 7G//AMES


In [7]:
# Show the game data shape - (rows, columns)
games.shape

(16598, 12)

### Here, i am going to make sure there are no duplicate rows in the data. To do this i am calling the drop duplicate function and passing in the subset, which is the column 'Name' and then re-assigning the games array
### As you can see below, there is now 11493 rows of games compared to the original 16598

In [8]:
games = games.drop_duplicates(subset=['Name'])
games.shape

(11493, 12)

### I then need to reset the index of the games so that it does not affect the recommendations

In [9]:
games = games.reset_index(drop=True)

### I will now convert the text from the features column to a matrix of word counts using the CountVectorizer library and its fit_transform function

In [10]:
#Convert text from column to matrix of word counts
cm = CountVectorizer().fit_transform(games['features'])

### Then i will get the cosine similarity from the count matrix using the cosine_similarity method on the object i created above
### This then prints the scores of similarity for each game in the database. The first book being the first book in the database 1. is compared to every row in the database - the next game gets a similarity score of 0.13608276 and so on. The score ranges from 0 to 1

In [11]:
#Get cosine similarity from the count matrix
cs = cosine_similarity(cm)

#Print scores
print(cs)

[[1.         0.13608276 0.58925565 ... 0.         0.         0.        ]
 [0.13608276 1.         0.28867513 ... 0.         0.14433757 0.        ]
 [0.58925565 0.28867513 1.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 1.         0.         0.20412415]
 [0.         0.14433757 0.         ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.20412415 0.         1.        ]]


### For the next step, i am going to get the name of the game that the user likes. In this case, i'm setting the title of the game to be 'Grand Theft Auto V'

In [12]:
#Get the title of game the user likes
#Title = games['Name'][101]
Title = "Grand Theft Auto V"
#Show title
Title

'Grand Theft Auto V'

### Now i will get the id of the game by looking at the game that has the title that matches the one in the 'Title' variable, once that is found i am looking at the value for it's 'Rank' attribute and getting its value

In [13]:
#find the ID of the game the user likes 
game_id = games[games.Name == Title]['Rank'].values[0]

game_id

17

### Next i am creating a variable called scores. This will contain a list of scores for the game that will be in the following form (game_id, similarity_score)
### This is a list that enumerates over the cosine similarities scores

In [14]:
#Create list of tuples in the form (game_id, similarity_score)

scores = list(enumerate(cs[game_id]))

scores

[(0, 0.0),
 (1, 0.0),
 (2, 0.0),
 (3, 0.0),
 (4, 0.0),
 (5, 0.0),
 (6, 0.0),
 (7, 0.0),
 (8, 0.0),
 (9, 0.0),
 (10, 0.0),
 (11, 0.0),
 (12, 0.0),
 (13, 0.0),
 (14, 0.0),
 (15, 0.0),
 (16, 0.7826237921249264),
 (17, 0.9999999999999999),
 (18, 0.0),
 (19, 0.0),
 (20, 0.0),
 (21, 0.0),
 (22, 0.0),
 (23, 0.7999999999999999),
 (24, 0.0),
 (25, 0.0),
 (26, 0.0),
 (27, 0.11180339887498948),
 (28, 0.0),
 (29, 0.0),
 (30, 0.0),
 (31, 0.0),
 (32, 0.0),
 (33, 0.0),
 (34, 0.0),
 (35, 0.8432740427115678),
 (36, 0.0),
 (37, 0.0),
 (38, 0.0),
 (39, 0.0),
 (40, 0.10540925533894598),
 (41, 0.0),
 (42, 0.11952286093343936),
 (43, 0.0),
 (44, 0.0),
 (45, 0.0),
 (46, 0.7378647873726218),
 (47, 0.0),
 (48, 0.0),
 (49, 0.0),
 (50, 0.0),
 (51, 0.0),
 (52, 0.0),
 (53, 0.0),
 (54, 0.0),
 (55, 0.0),
 (56, 0.0),
 (57, 0.0),
 (58, 0.0),
 (59, 0.0),
 (60, 0.0),
 (61, 0.0),
 (62, 0.0),
 (63, 0.0),
 (64, 0.0),
 (65, 0.0),
 (66, 0.0),
 (67, 0.0),
 (68, 0.0),
 (69, 0.0),
 (70, 0.0),
 (71, 0.0),
 (72, 0.0),
 (73, 0.119

### Now i am sorting the list of similar games in descending order, so the most similar game appears first
### The sorted() takes 3 params - an iterable (list or sequence), an optional key and reverse- if true, the list is reversed.
### For the key parameter, a lambda function is used. This is an inline function that will return the max element using x[1] as the key

***

### I use sorted_scores = sorted_scores[1:] to get rid of the first score as it has a 100% similarity - it is the game itself. 
### The sorted scores is an array with scores - 
### (game_id, similarity_score)

In [15]:
#Sort the list of similar games in desc order

sorted_scores = sorted(scores, key= lambda x:x[1], reverse = True)
#Get rid of first game as it is itself.
sorted_scores = sorted_scores[1:]

sorted_scores

[(35, 0.8432740427115678),
 (23, 0.7999999999999999),
 (16, 0.7826237921249264),
 (333, 0.7826237921249264),
 (585, 0.7826237921249264),
 (46, 0.7378647873726218),
 (1157, 0.7),
 (81, 0.6674238124719147),
 (173, 0.6674238124719147),
 (531, 0.6454972243679029),
 (1229, 0.6454972243679029),
 (1705, 0.6454972243679029),
 (2623, 0.6454972243679029),
 (2776, 0.6454972243679029),
 (942, 0.5976143046671968),
 (1803, 0.5976143046671968),
 (2778, 0.5976143046671968),
 (822, 0.5590169943749475),
 (4520, 0.5590169943749475),
 (7095, 0.5477225575051662),
 (1407, 0.5270462766947299),
 (6272, 0.5270462766947299),
 (6703, 0.5163977794943223),
 (7419, 0.5163977794943223),
 (7781, 0.5),
 (351, 0.47809144373375745),
 (666, 0.47809144373375745),
 (970, 0.47809144373375745),
 (977, 0.47809144373375745),
 (2001, 0.47809144373375745),
 (2522, 0.47809144373375745),
 (2853, 0.47809144373375745),
 (3269, 0.47809144373375745),
 (3352, 0.47809144373375745),
 (4028, 0.47809144373375745),
 (4635, 0.478091443733757

### The final step is to create a loop to print out the suggestions from the list 
### For every item in sorted_scores, i find the name of the game and set it to game_title, then in the print statement, i print the count value, the sorted score value at the count index and the name of the game

In [16]:
#Create loop to print suggestions from sorted list
count = 0
print('The 5 most recommended games to ' + Title + ' are :\n')

for item in sorted_scores:
    game_title = games.Name[item[0]]
    print(count+1, sorted_scores[count], game_title)
    
    count = count+1
    if count >= 5:
        break

The 5 most recommended games to Grand Theft Auto V are :

1 (35, 0.8432740427115678) Grand Theft Auto III
2 (23, 0.7999999999999999) Grand Theft Auto: Vice City
3 (16, 0.7826237921249264) Grand Theft Auto V
4 (333, 0.7826237921249264) Grand Theft Auto 2
5 (585, 0.7826237921249264) Grand Theft Auto
